CSVdist

14 views

Skip to first unread message

The Beez

unread,

Feb 3, 2023, 5:19:14 PM2/3/23

to 4tH-compiler

Hi 4tH-ers!

You might know for work I handle and analyse a lot of data. That's why you find so many conversion stuff in 4tH - especially on CSV files.

For a job I needed a program that gave my insight in the distribution of data within a column. If the values differ wildly it tells me something about consistency. I didn't have one in my toolbox, so I decided to make one.

Now, when collecting all data within a spreadsheet you need a LOT of space - any constantly looking up data takes a lot of time.

So first, I decided to rewind the file and take on every column individually. After the report, the data for that column could be discarded. In order to quickly lookup data I decided to use the BS-table library. That one does a quick binary scan.

In order to fit a string in a key, I used my faithful FNV1a hash. I needed two of them, one for the count and one for the name. For the name I used the dynamic string library.

Once a value was entered, I only needed to update the "count" BS table. After all rows were done I sorted that table using the "in place" Combsort routine - then looked up the name in the other BS-table. Yes, I needed the count in descending order.

Finally, like CSVscan, I allowed for the results to be dumped in FODS format. And yes, we got a lib for that.

I'm very happy with my new program. I analyzed the entire database in about 15s. Which is great. I would have taken me hours to do that manually.

That's it for now.

Hans Bezemer

Reply all

Reply to author

Forward

0 new messages