Re: [BikeHackNYC] Re: Anomaly in June 2015 Data?

62 views
Skip to first unread message

Frank Hebbert

unread,
Oct 12, 2015, 1:46:50 PM10/12/15
to citibike...@googlegroups.com
Dwight, can you share the query you're using to calculate the daily averages? 

On Sun, Oct 11, 2015 at 2:35 AM, Dwight Penny <dweeb...@gmail.com> wrote:
Sorry the image didn't work out. I'll post it again later.
Here are the monthly daily averages by month, based on startDate. Oddly, comparing it the stats they give in their monthly reports, only June 2015 seems to be spot on. In all the other cases, my figures are roughly half that in the report. I'll check again to see if my import is missing some records.

Month AvgDailyTrips
2013-07 13603
2013-08 16161
2013-09 17239
2013-10 16737
2013-11 11263
2013-12 7161
2014-01 4845
2014-02 4013
2014-03 7083
2014-04 11180
2014-05 13970
2014-06 15615
2014-07 15626
2014-08 15540
2014-09 15898
2014-10 13366
2014-11 8820
2014-12 6437
2015-01 9211
2015-02 7033
2015-03 11027
2015-04 10873
2015-05 15516
2015-06 31374
2015-07 17511
2015-08 19017
2015-09 21495

--
You received this message because you are subscribed to the Google Groups "BikeNYC and CitiBikeNYC Hackers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to citibike-hacke...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/citibike-hackers/74ec5de5-3c77-47d5-9782-a9b3d2d52f23%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dwight Penny

unread,
Oct 17, 2015, 10:08:12 PM10/17/15
to BikeNYC and CitiBikeNYC Hackers


Thanks for replying. I figured out the problem, and I actually thought I had deleted this post.

It turns out that I had messed up in importing the data into MySQL, using the wrong row delimiter, which somehow caused it to import every other row, except for June 2015. I reran the import today, and was more careful to make sure I imported every row. There are some inconsistencies across months in the posted data, with dates being formatted YYYY-MM-DD in the beginning, and m/d/YYYY beginning Sep. 2014, so that required some tweaking, too.

Here's the query:
select date_format(startTime,'%Y-%m') Month ,
    format(count(bikeId)/max(dayofmonth(startTime)),0) AvgDailyTrips
from vw_TripData
group by date_format(startTime,'%Y-%m')
order by date_format(startTime,'%Y-%m');

And here's the new data, which looks like I'd expect it to look:

Month    AvgDailyTrips
2013-07    27,207
2013-08    32,321
2013-09    34,479
2013-10    33,475
2013-11    22,526
2013-12    14,321
2014-01    9,690
2014-02    16,053
2014-03    14,165
2014-04    22,359
2014-05    27,939
2014-06    31,229
2014-07    31,253
2014-08    31,080
2014-09    31,796
2014-10    26,733
2014-11    17,640
2014-12    12,873
2015-01    9,211
2015-02    7,033
2015-03    11,027
2015-04    21,746
2015-05    31,032
2015-06    31,374
2015-07    35,022
2015-08    38,034
2015-09    42,990
Reply all
Reply to author
Forward
0 new messages