Hi,
I tried to look for similar requirement but I did not manage to find a suitable answer so I am sorry in advance if I am re-asking same questions as others :)
I would like to use Rhino ETL for my files processing, I have like 200K csv files zipped which correspond to a daily prices for stocks and have to
1 - unzip
2 - process
3 - validate (?)
4 - save in db
In total there must be more then 400 million rows worth of data. Rhino ETL seems to be very suitable for this operation as it allows me to apply the pipeline pattern "easily" but right now I don't understand how can I separate my steps because from examples that i see left right, the only type returned by an operation is an IEnumerable<Row>.
I also would like to Massively insert the data, right now I am using SqlBulkCopy but some files contain duplicate which bother me. I would like to remove it before inserting but when ever I try to input this duplicate existence check, it slows down too much the process.
Right now my pipeline is composed by the steps I mentioned.
Is it possible for me to return some stream or something else then row ?
Again sorry if this is a repeated subject, please do not hesitate to redirect me,.
Thanks!