Analysis of one month of subway data – part 2: correlation between the total number of trains and the probability of delays

When I first decided to track trains in the NYC subway system I was sure I would find a correlation between the number of trains in the system and the percentage of trains running with delay. I was also sure it would be a positive relationship. The more trains there are in the system, the more traffic jams, and the more delays one should find — simple, right? Surprisingly, as we will see in the…

Continue reading

One month of subway tracking – part 1: an overview of the recorded data

What is the best time to catch a subway to beat rush-hour delays? What predictors have the largest impact on whether your train will get to its destination on time? To answer these questions I have been tracking the positions of all trains in the NYC subway system for several months in 20 s intervals. From these data, trains that run with delays can be identified, and, after some data-wrangling (not shown here), the following…

Continue reading