The MongoDB schema is one collection with 40 million records, each
looks like:
{ "batchId" : 0, "listingKey" : "cea42fca6b7cb082", "lastUpdateYear" :
2005, "lastUpdateMonth" : 9, "lastUpdateDay" : 20,
"earliestReportedYear" : 2003, "earliestReportedMonth" : 3,
"earliestReportedDay" : 12, "privacyIndicatorEnum" : 3,
"listingTypeEnum" : 3, "serviceProvider" : "23", "dataProvider" :
"WC", "phoneNPA" : "503", "phoneNXX" : "848", "phoneLINE" : "9476",
"firstName" : "Chris", "lastName" : "Kessel", "deliveryPointBarCode" :
"36", "checkDigit" : "0", "congressionalDistrict" : "1",
"carrierRouteSortZone" : "D", "state" : "OR", "addressType" : "S",
"latitude" : "45.463616", "longitude" : "-122.89142",
"preDirectional" : "SW", "zip4" : "7542", "MSA" : "6440", "CMSA" :
"79", "FIPSCode" : "41067", "carrierRoute" : "R018", "zip5" : "97007",
"houseNumber" : "20836", "suffix" : "Ln", "streetName" : "Vicki",
"city" : "Beaverton", "fullAddress" : "20836 SW Vicki Ln",
"matchKey" : [ "2392798418644328169" , "2469488646933425020" ,
"2559616267708426042" , "2634775134312296806" ,
"2817462582529963327" , "2715342769193125472" ,
"2746945466986027237" , "2305843014252183428"] }
The SQL Schema has multiple tables. The matchKeys are in one table,
the address in another, the rest (name, lat, long, various dates, etc)
are in a 3rd table. The SQL query is a pickier about the return
values, but only a little (it wouldn't return the matchKeys).