ASA Data - Any progress on airplane "tailnum"s?

89 views
Skip to first unread message

Robert Allison

unread,
Sep 30, 2008, 2:55:37 PM9/30/08
to data-e...@googlegroups.com
Lately, I'm trying to look for trends in flight delays, based on the airplane manufacturer, the age of the aircraft, etc - to do that, I have to be able to reliably take the "tailnum" from the flight data, and look up the airplane data in the plane-data...

Any progress on getting rid of those funky characters in some of the tailnums, in some of the flight data?

Also, I'm a bit suspicious of some of the other tailnum data also (not sure if it's wrong in the flight data, or in the supplemental data?)...

I went to the supplemental data page: http://stat-computing.org/dataexpo/2009/supplemental-data.html
And downloaded the plane-data.csv: http://stat-computing.org/dataexpo/2009/plane-data.csv

Notice that tailnum 'N401AA' is a "balloon" (I assume a blimp, or something similar?)

$ grep N401AA plane-data.csv
N401AA,Corporation,RAVEN,09/19/1977,S-50A,Valid,Balloon,None,None

I also verified this via a google search - this page says it's a balloon:
http://www.airport-data.com/aircraft/N401AA.html


I look up some recent (2008) flights of this tailnum between DFW (Dallas/Fort Worth TX) and BUR (Burbank CA) - the distance is 1,231 miles, and the ActualElapsedTime (field right after tailnum) is usually around 200 minutes ...

that's 1231mile/200min = 1231mile/3.33hr = 369.3 miles/hr ... suspiciously fast for a "balloon", eh?

$ grep N401AA 2008.csv|grep DFW|grep BUR
2008,1,1,2,853,855,1019,1015,AA,817,N401AA,206,200,183,4,-2,DFW,BUR,1231,4,19,0,,0,NA,NA,NA,NA,NA
2008,1,10,4,651,655,1145,1200,AA,828,N401AA,174,185,157,-15,-4,BUR,DFW,1231,7,10,0,,0,NA,NA,NA,NA,NA
2008,1,9,3,1938,1935,2057,2055,AA,1865,N401AA,199,200,182,2,3,DFW,BUR,1231,3,14,0,,0,NA,NA,NA,NA,NA
2008,1,1,2,1059,1100,1554,1600,AA,1874,N401AA,175,180,152,-6,-1,BUR,DFW,1231,10,13,0,,0,NA,NA,NA,NA,NA
2008,3,8,6,651,655,1212,1155,AA,828,N401AA,201,180,148,17,-4,BUR,DFW,1231,43,10,0,,0,0,0,17,0,0
2008,3,7,5,2020,1935,2132,2100,AA,1865,N401AA,192,205,178,32,45,DFW,BUR,1231,3,11,0,,0,23,0,0,0,9
2008,4,12,6,851,855,1014,1010,AA,817,N401AA,203,195,184,4,-4,DFW,BUR,1231,3,16,0,,0,NA,NA,NA,NA,NA
2008,4,12,6,1051,1050,1552,1550,AA,1978,N401AA,181,180,159,2,1,BUR,DFW,1231,4,18,0,,0,NA,NA,NA,NA,NA

Can we trust this data?

-------------------------------------
Robert Allison, PhD
Robert....@sas.com

hadley wickham

unread,
Sep 30, 2008, 3:18:49 PM9/30/08
to data-e...@googlegroups.com
> Any progress on getting rid of those funky characters in some of the tailnums, in some of the flight data?

Could you be a bit more specific?

> Also, I'm a bit suspicious of some of the other tailnum data also (not sure if it's wrong in the flight data, or in the supplemental data?)...
>
> I went to the supplemental data page: http://stat-computing.org/dataexpo/2009/supplemental-data.html
> And downloaded the plane-data.csv: http://stat-computing.org/dataexpo/2009/plane-data.csv
>
> Notice that tailnum 'N401AA' is a "balloon" (I assume a blimp, or something similar?)
>
> $ grep N401AA plane-data.csv
> N401AA,Corporation,RAVEN,09/19/1977,S-50A,Valid,Balloon,None,None
>
> I also verified this via a google search - this page says it's a balloon:
> http://www.airport-data.com/aircraft/N401AA.html
>
>
> I look up some recent (2008) flights of this tailnum between DFW (Dallas/Fort Worth TX) and BUR (Burbank CA) - the distance is 1,231 miles, and the ActualElapsedTime (field right after tailnum) is usually around 200 minutes ...
>
> that's 1231mile/200min = 1231mile/3.33hr = 369.3 miles/hr ... suspiciously fast for a "balloon", eh?
>
> $ grep N401AA 2008.csv|grep DFW|grep BUR
> 2008,1,1,2,853,855,1019,1015,AA,817,N401AA,206,200,183,4,-2,DFW,BUR,1231,4,19,0,,0,NA,NA,NA,NA,NA
> 2008,1,10,4,651,655,1145,1200,AA,828,N401AA,174,185,157,-15,-4,BUR,DFW,1231,7,10,0,,0,NA,NA,NA,NA,NA
> 2008,1,9,3,1938,1935,2057,2055,AA,1865,N401AA,199,200,182,2,3,DFW,BUR,1231,3,14,0,,0,NA,NA,NA,NA,NA
> 2008,1,1,2,1059,1100,1554,1600,AA,1874,N401AA,175,180,152,-6,-1,BUR,DFW,1231,10,13,0,,0,NA,NA,NA,NA,NA
> 2008,3,8,6,651,655,1212,1155,AA,828,N401AA,201,180,148,17,-4,BUR,DFW,1231,43,10,0,,0,0,0,17,0,0
> 2008,3,7,5,2020,1935,2132,2100,AA,1865,N401AA,192,205,178,32,45,DFW,BUR,1231,3,11,0,,0,23,0,0,0,9
> 2008,4,12,6,851,855,1014,1010,AA,817,N401AA,203,195,184,4,-4,DFW,BUR,1231,3,16,0,,0,NA,NA,NA,NA,NA
> 2008,4,12,6,1051,1050,1552,1550,AA,1978,N401AA,181,180,159,2,1,BUR,DFW,1231,4,18,0,,0,NA,NA,NA,NA,NA
>
> Can we trust this data?

I talked to my dot contact about this and he was a bit surprised too,
but didn't have any ideas why that might be the case.

Hadley


--
http://had.co.nz/

Reply all
Reply to author
Forward
0 new messages