Task 1

1,089 views
Skip to first unread message

John Bashunov

unread,
Jul 25, 2013, 10:25:48 PM7/25/13
to final-assessment-...@googlegroups.com
Seems like this is a simple one to approach by using the ASOP #23 on Data Quality:



My person recommendation is not to use the data due to the inconsistency in redeemed miles, expiring miles, and earned miles.

John Bashunov

unread,
Jul 26, 2013, 2:26:56 AM7/26/13
to final-assessment-...@googlegroups.com
There seems to be a lot of issues with this data, I wouldn't recommend transferring it to the new system. 

Other peculiarities not yet mentioned are:
--The redeemed miles are all to the nearest 1k, that seems silly.. records regarding redeemed miles already make up 25%+ of the sample
--The record with the very low, precise earned mileage.. who would fly less than 100 miles? 
--The records with the middle seat as a preference; I suppose it's possible
--Some records have transaction dates are a year before the membership date; that's okay if BBA allows you to redeem past flights like SW used to do

Christie Ritten

unread,
Jul 26, 2013, 10:47:42 AM7/26/13
to final-assessment-...@googlegroups.com
I agree, data quality is awful. I've found issues with over a quarter of the records. Below are some of the checks I did:
-- Miles amount unreasonable (for earned or expired... hard to judge redeemed because they're on a different order of magnitude or something, even though supposed to be in miles, according to the Scenario Background, 3rd paragraph)
-- Dates in bad format or unreasonable (but in some cases you can make a good guess at the right date)
-- Inconsistency in dates, e.g. transaction date before membership date, or expiration transaction date before 36 months of membership
-- Among multiple records w/ same ID, inconsistent info or obvious duplicate records
-- Transaction code "E" not useful b/c could either be expired or earned
-- Blank member IDs: Not ideal, but could consider them all unique from each other and from other records

Besides data quality, big thing to consider is appropriateness. My initial thoughts:
-- Old. Yes, we might have 10 years of data to add, but we already have "several" years' worth on the new system. (We don't know when the assignment is set, but we could say 2013, in which case we have a good number of years under our belt with the new system.)
-- Earned miles records (about 45% of sample) are probably not useful anyway, because earning miles would be very different under the CDL program.

Other considerations:
-- Will take time and effort to review entire data set and make exclusions/modifications as necessary. Probably not worth it, even if transferring the data is not particularly expensive.
-- Under ASOP 23, will need to document/disclose flaws/modifications/limitations of data.

daki...@gmail.com

unread,
Jul 26, 2013, 12:50:50 PM7/26/13
to final-assessment-...@googlegroups.com

I have looked through the data and found the same types of data issues that John and Christie have mentioned.  I also think that this data should not be used.  However, the directions at end of tak 1 state the following: 

             

"Your memo should note any flaws in the data and recommend how these limitations should be addressed, if the data is to be used. This should be done even if your recommendation is that the data not be used."

I was thinking of listing each data issue found and then a recommendation on how to address each limitation.  For example,  I would have a bullet for the 5 member IDs that are missing and my recommendation would be to exclude these entries since we can not match to member.

Just wondering if it is overkill to do this for every single data issue but based on directions we probably should to play it safe I am thinking.  

Thoughts??

daki...@gmail.com

unread,
Jul 26, 2013, 1:40:02 PM7/26/13
to final-assessment-...@googlegroups.com
In the redeemed records provided all the miles are at least 10,000 or more.  My thought is that you can not redeem miles unless you have at least 10K miles?  Not sure if we need to figure out why the redeemed miles are all at least 10,000.  Many round trip flights will not have mileage total of at least 10,000 miles.  Anyone planning on addressing the redeemed miles all at least 10,000?

John Bashunov

unread,
Jul 26, 2013, 1:41:26 PM7/26/13
to final-assessment-...@googlegroups.com
This might be overkill in my opinion. I think it's best to stick to a case where you assume the data is good and chip away until you find the feather that breaks the camel's back. 

Basically do a checklist of ASOP 23 requirements and rate it Meets or Does Not Meet Standard.

Then weight the good and the bad and out comes the "No Way Jose" 

Christie Ritten

unread,
Jul 26, 2013, 1:48:32 PM7/26/13
to final-assessment-...@googlegroups.com
I agree, a 10,000 mile redemption makes no sense. In my memo I put that as a problem with the data, and mentioned that if we were to use the data, we would have to research what's going on here. Are we missing a decimal point? How have these numbers been rounded?

Salman Shah

unread,
Jul 27, 2013, 2:31:54 PM7/27/13
to final-assessment-...@googlegroups.com
the biggest problem is Redeemed Units are all in Thousands.  Don't know why is this so.  I will prefer using it subject to this flaw.
The rest of the issues can be corrected or data entries discarded.

Salman Shah

unread,
Jul 27, 2013, 2:33:38 PM7/27/13
to final-assessment-...@googlegroups.com
Yes this is the biggest flaw in the data.  The rest is manageable.

John Bashunov

unread,
Jul 27, 2013, 2:35:43 PM7/27/13
to final-assessment-...@googlegroups.com
Before you put that answer down, this that this is a RANDOM SAMPLE OF 200, so removing records won't do you any good since the rest of the data on the mainframe is bound to be flawed. 

Salman Shah

unread,
Jul 27, 2013, 2:40:31 PM7/27/13
to final-assessment-...@googlegroups.com
yes the whole data is to be treated like the sample.  Whatever check or corrections we apply to the sample is to be done on the whole data. 
It should be used for experience purpose.  Better to use modified data for experience than not to use any data.  And also it is mentioned in the Task that the project lead would like to gather as much data as possible.
In my opinion, it is to be used but concern is rounded to thousand weird figure for redeemed units.

Christie Ritten

unread,
Jul 27, 2013, 4:21:09 PM7/27/13
to final-assessment-...@googlegroups.com
For me, so much time/effort would be needed to pore over the data (make corrections, figure out which records to exclude), that it wouldn't be worth it. You have data from the new system that's been up and running for "several years," it's not like it's the old data or nothing.

Salman Shah

unread,
Jul 27, 2013, 4:23:26 PM7/27/13
to final-assessment-...@googlegroups.com
Agreed to this now.  Actually the biggest problem is redeemed miles in thousands.  If that is wrong then the whole exercise is worthless.

Salman Shah

unread,
Jul 27, 2013, 4:32:19 PM7/27/13
to final-assessment-...@googlegroups.com

Just one more thing on it.  Have you guys addressed each and every transaction for flaw or just the generalized flaw in the data.
Im going about it like not going to each and every transaction but discussing general limitations and recommendation to address it.  Then finally, the conclusion.

Christie Ritten

unread,
Jul 27, 2013, 4:41:12 PM7/27/13
to final-assessment-...@googlegroups.com
I went with a general list (similar to what I had in my first post in this thread), definitely not a record-by-record list of flaws. I can't imagine the project lead would want to read that.
Reply all
Reply to author
Forward
0 new messages