I'm wanting to do some calculations on only the flights that were not
cancelled.
In the data description page:
http://stat-computing.org/dataexpo/2009/the-data.html
It says:
"CancellationCode = reason for cancellation (blank if not
cancelled)"
This seems to be true of the 2003-2008 data, but in the 1987-2002 data
the values for *every* flight seems to be
CancellationCode='N' (whether cancelled or not). Is this intentional/
expected?
And, if we can't use the CancellationCode variable to determine
whether a flight was cancelled, can we use some other variable(s) to
try to intuit this? (Hopefully some single variable that would be
easy to check.) For example, maybe if a flight has an "Actual Arrival
Time" then we can assume the flight was not cancelled? (but then
again, some flights in 2007 appear to have the Actual Arrival Time
field blank, even though the flight does not list a cancellation code,
for example).
Here's a simple query I ran, and the resulting list, showing the
cancellation codes present in each year's data:
-----
select unique year, CancellationCode from asadata.alldata;
-----
Cancellation
Year Code
1987 N
1988 N
1989 N
1990 N
1991 N
1992 N
1993 N
1994 N
1995 N
1996 N
1997 N
1998 N
1999 N
2000 N
2001 N
2002 N
2003
2003 A
2003 B
2003 C
2003 D
2003 N
2004
2004 A
2004 B
2004 C
2004 D
2005
2005 A
2005 B
2005 C
2005 D
2006
2006 A
2006 B
2006 C
2006 D
2007
2007 A
2007 B
2007 C
2007 D
2008
2008 A
2008 B
2008 C
2008 D