data visualization NYC Subway

Plotting a map of the NYC subway system

I recently started to compute statistical properties of the NYC subway system. In order to create meaningful visualizations of the data, it is useful to be able to plot a map of the subway lines and stations in NYC. Using GeoViews, GeoPandas, and geographic data on subway lines and subway stations provided by the city of New York this daunting task becomes fairly trivial. GeoViews is an extension of HoloViews, enabling several geographic plot types.

First, download the geographic data as GeoJSON files and store them in a convenient folder. Then load them into GeoPandas data frames:

import geopandas as gpd
from cartopy import crs

lines = gpd.read_file('lines.geojson', crs = crs.LambertConformal())
stations = gpd.read_file('stations.geojson', crs = crs.LambertConformal())

Each row of the “lines” object now refers to a small portion of a subway track:

lines.head()
name url rt_symbol objectid id shape_len geometry
0 G http://web.mta.info/nyct/service/ G 753 2000393 2438.20024902 LINESTRING (-73.99487524803018 40.680203546062…
1 G http://web.mta.info/nyct/service/ G 754 2000394 3872.83441063 LINESTRING (-73.97957543205142 40.659930695530…
2 Q http://web.mta.info/nyct/service/ N 755 2000469 1843.36633108 LINESTRING (-73.97585637503069 40.575974505394…
3 M http://web.mta.info/nyct/service/ B 756 2000294 1919.5592029 LINESTRING (-73.92414355434533 40.752290926571…
4 M http://web.mta.info/nyct/service/ B 757 2000296 2385.69853589 LINESTRING (-73.91344685471373 40.756171576368…

The geographic location of each track segment is defined in the “geometry” column by a Shapely linestring; the other columns provide additional information, for example the column “name” lists the id of the subway lines that usually travel along each segment.

The stations object contains information on the subway stations in New York City:

stations.head()
name url line objectid notes geometry
0 Astor Pl http://web.mta.info/nyct/service/ 4-6-6 Express 1 4 nights, 6-all times, 6 Express-weekdays AM s… POINT (-73.99106999861966 40.73005400028978)
1 Canal St http://web.mta.info/nyct/service/ 4-6-6 Express 2 4 nights, 6-all times, 6 Express-weekdays AM s… POINT (-74.00019299927328 40.71880300107709)
2 50th St http://web.mta.info/nyct/service/ 1-2 3 1-all times, 2-nights POINT (-73.98384899986625 40.76172799961419)
3 Bergen St http://web.mta.info/nyct/service/ 2-3-4 4 4-nights, 3-all other times, 2-all times POINT (-73.97499915116808 40.68086213682956)
4 Pennsylvania Ave http://web.mta.info/nyct/service/ 3-4 5 4-nights, 3-all other times POINT (-73.89488591154061 40.66471445143568)

Let’s add new columns to the data frames to control the color of the plotted lines and points. We will set all stations to be drawn in blue and all lines in grey. The entries in these columns can later be manipulated to communicate additional information — for example, if a train is delayed at a station, then that station’s color code could be set to “red”.

stations['color'] = 'blue'
lines['color'] = 'grey'

We can then plot the data frame using GeoViews. The color is set by specifying the color column as a value dimension, and passing it to the color attribute of “opts”.

import geoviews as gv
gv.extension('bokeh')
lines = gv.Path(lines, vdims=['color']).opts(projection=crs.LambertConformal(), height=500, width=500, color='color')
lines

We can also overlay the subway stations over this map:

stations = gv.Points(stations, vdims=['color']).opts(color='color')
lines * stations

Finally, it would be nice if it were easy to identify individual stations in the system from the plot. We can do this by constructing a custom hover tool and passing it to the HoloViews plot. Try hovering your mouse over a subway station in the plot below and you should see a tool tip with the station’s name appear.

from bokeh.models import HoverTool
hover = HoverTool(tooltips=[("station", "@name")])

stations = gv.Points(stations, vdims=['color', 'name']).opts(tools=[hover],color='color')
lines * stations

To enable the tool tip, we had to specify an additional value dimension (‘name’), and instructed the hover tool to look up the value of this column in the data frame (‘@name’).