Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

searching a large file

10 views

Skip to first unread message

Rajagopal Iyengar

unread,

Nov 4, 2013, 8:02:33 PM11/4/13

We have a slow changing dimension table with over 4 million rows. Instead of caching that during our ETL process, we want to output valid records (around 3M+) records to file system twice a day and look it up to pick up the internal identifier based on cusip/sedol/isin/exchange+ticker/ric. Each record has a begin and end date. we want to pick up internal identifier at point of time based on the asof_dt in the file.

It is always true to pick up the current record based on asof_dt between begin and end date.

There could be multiple traversal:

1. North American stock and cusip is not null.
2. European stock and sedol is not null,
3. Asian stocks and isin is not null.
4. exchange+ticker.

What would be best approach to achive this? Raw files can be upto 100K rows, but there could be over 500 files a day.

Thanks a lot.
Cheers.

0 new messages