Il giorno ven, 11/11/2016 alle 02.11 -0800, Enrico Bergamini ha
scritto:
Hi Enrico,
yes, in your case chunksize=n can risk creating two or more rows for
the same id. What you could do if the ids in the original file are
ordered is
1) read a chunk
2) read the id in the last row of it
3) process all of it rows except for those having such id (which will
be the last ones)
4) read another chunk and concatenate it to the rows left from the
previous one
5) if the original file is not over, restart from 2), otherwise process
what's left (the last id)
Pietro