No "Cancellation Code" for 1987-2002 ?

75 views
Skip to first unread message

Rob

unread,
Aug 21, 2008, 2:23:26 PM8/21/08
to ASA Data Expo 2009
I'm wanting to do some calculations on only the flights that were not
cancelled.

In the data description page:

http://stat-computing.org/dataexpo/2009/the-data.html

It says:

"CancellationCode = reason for cancellation (blank if not
cancelled)"

This seems to be true of the 2003-2008 data, but in the 1987-2002 data
the values for *every* flight seems to be
CancellationCode='N' (whether cancelled or not). Is this intentional/
expected?

And, if we can't use the CancellationCode variable to determine
whether a flight was cancelled, can we use some other variable(s) to
try to intuit this? (Hopefully some single variable that would be
easy to check.) For example, maybe if a flight has an "Actual Arrival
Time" then we can assume the flight was not cancelled? (but then
again, some flights in 2007 appear to have the Actual Arrival Time
field blank, even though the flight does not list a cancellation code,
for example).

Here's a simple query I ran, and the resulting list, showing the
cancellation codes present in each year's data:

-----

select unique year, CancellationCode from asadata.alldata;

-----


Cancellation
Year Code

1987 N
1988 N
1989 N
1990 N
1991 N
1992 N
1993 N
1994 N
1995 N
1996 N
1997 N
1998 N
1999 N
2000 N
2001 N
2002 N
2003
2003 A
2003 B
2003 C
2003 D
2003 N
2004
2004 A
2004 B
2004 C
2004 D
2005
2005 A
2005 B
2005 C
2005 D
2006
2006 A
2006 B
2006 C
2006 D
2007
2007 A
2007 B
2007 C
2007 D
2008
2008 A
2008 B
2008 C
2008 D


Robert Allison

unread,
Aug 21, 2008, 2:42:41 PM8/21/08
to data-e...@googlegroups.com
Ahh - I was reading the 'NA' as just an 'N' in the 1987-2002 data - Doh! ...

But my question still stands -- Is there a reliable way we can intuit if a flight was cancelled, by looking at other variables (such as arrival time)? Or, is it not safe to make any assumptions like that?

hadley wickham

unread,
Aug 21, 2008, 11:45:54 PM8/21/08
to data-e...@googlegroups.com
> Ahh - I was reading the 'NA' as just an 'N' in the 1987-2002 data - Doh! ...

Which is another problem I need to look into - those NA's should
really be converted into NULL's for the database.

> But my question still stands -- Is there a reliable way we can intuit if a flight was cancelled, by looking at other variables (such as arrival time)? Or, is it not safe to make any assumptions like that?

In the original data, there was an additional boolean field called
Cancelled - I dropped this out of the subset because it seemed
redundant with CancellationCode, but obviously I didn't look far
enough in the past.

I'll add it back in when I rerun the export after figuring out the
problem with the doubled Januarys.

Hadley


--
http://had.co.nz/

Reply all
Reply to author
Forward
0 new messages