Another Data Problem: Is January data repeated twice in the csv files?

67 views
Skip to first unread message

Rob

unread,
Aug 20, 2008, 11:06:01 AM8/20/08
to ASA Data Expo 2009
I've done a summary count of the number of flights per day, and
plotted it, and the January counts look (suspiciously) about twice the
counts of the other months (I've checked this in the 2008, 2007, and
2001 data so far).

There are about 4,000 flights per day in January, but only about 2,000
flights per day for all the other months.


Here is a quick Unix "grep" that selects all the flights of the plane
with tailnum=N240WN, for on Jan 1, 2008 ... looks to me like each of
it's 5 flights is repeated twice in the data ...


$ grep "^2008,1,1," 2008.csv | grep N240WN|sort -n

2008,1,1,2,829,830,950,955,WN,762,N240WN,81,85,65,-5,-1,OAK,LAS,
407,9,7,,0,NA,NA,NA,NA,NA
2008,1,1,2,829,830,950,955,WN,762,N240WN,81,85,65,-5,-1,OAK,LAS,
407,9,7,,0,NA,NA,NA,NA,NA
2008,1,1,2,2207,1955,144,2340,WN,1558,N240WN,
157,165,140,124,132,MDW,RSW,1105,4,13,,0,0,24,0,0,100
2008,1,1,2,2207,1955,144,2340,WN,1558,N240WN,
157,165,140,124,132,MDW,RSW,1105,4,13,,0,0,24,0,0,100
2008,1,1,2,1029,1030,1534,1545,WN,762,N240WN,
185,195,169,-11,-1,LAS,MSY,1501,5,11,,0,NA,NA,NA,NA,NA
2008,1,1,2,1029,1030,1534,1545,WN,762,N240WN,
185,195,169,-11,-1,LAS,MSY,1501,5,11,,0,NA,NA,NA,NA,NA
2008,1,1,2,1617,1615,1735,1740,WN,762,N240WN,78,85,65,-5,2,MSY,BNA,
471,6,7,,0,NA,NA,NA,NA,NA
2008,1,1,2,1617,1615,1735,1740,WN,762,N240WN,78,85,65,-5,2,MSY,BNA,
471,6,7,,0,NA,NA,NA,NA,NA
2008,1,1,2,1941,1805,2117,1930,WN,762,N240WN,96,85,67,107,96,BNA,MDW,
395,20,9,,0,96,0,11,0,0
2008,1,1,2,1941,1805,2117,1930,WN,762,N240WN,96,85,67,107,96,BNA,MDW,
395,20,9,,0,96,0,11,0,0

Robert Allison

unread,
Aug 20, 2008, 11:16:42 AM8/20/08
to data-e...@googlegroups.com
Slight correction ... I meant to say 40,000 & 20,000 flights per day, rather than 4,000 and 2,000.

hadley wickham

unread,
Aug 20, 2008, 2:38:33 PM8/20/08
to data-e...@googlegroups.com
Hi Rob,

Looks like there's an error in my data import script and those months
are getting doubled up. I'll look into it in more depth tonight to
hopefully rerun the process and reupload tomorrow.

Regards,

Hadley

--
http://had.co.nz/

hadley wickham

unread,
Aug 21, 2008, 11:53:12 PM8/21/08
to data-e...@googlegroups.com
> Looks like there's an error in my data import script and those months
> are getting doubled up. I'll look into it in more depth tonight to
> hopefully rerun the process and reupload tomorrow.

Ok - I've found the rather dumb error in my code and I'm rerunning the
export now. Will hopefully be done in time to upload in the morning.

Thanks for finding all these problems!

Hadley


--
http://had.co.nz/

Marc A. Garrett

unread,
Aug 22, 2008, 8:52:09 PM8/22/08
to data-e...@googlegroups.com
Have the new data files been posted?

Thanks,
--
Marc A. Garrett
ma...@since1968.com
AIM: since1968

hadley wickham

unread,
Aug 22, 2008, 8:54:35 PM8/22/08
to data-e...@googlegroups.com
I've been uploading since this morning - up to (and including) 2001 is
done. I'll send an email when they're all up.

Hadley

--
http://had.co.nz/

Reply all
Reply to author
Forward
0 new messages