Rajagopal Iyengar
unread,Nov 4, 2013, 8:02:33 PM11/4/13You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to
We have a slow changing dimension table with over 4 million rows. Instead of caching that during our ETL process, we want to output valid records (around 3M+) records to file system twice a day and look it up to pick up the internal identifier based on cusip/sedol/isin/exchange+ticker/ric. Each record has a begin and end date. we want to pick up internal identifier at point of time based on the asof_dt in the file.
It is always true to pick up the current record based on asof_dt between begin and end date.
There could be multiple traversal:
1. North American stock and cusip is not null.
2. European stock and sedol is not null,
3. Asian stocks and isin is not null.
4. exchange+ticker.
What would be best approach to achive this? Raw files can be upto 100K rows, but there could be over 500 files a day.
Thanks a lot.
Cheers.