Hi 4tH-ers!
You might know for work I handle and analyse a lot of data. That's why you find so many conversion stuff in 4tH - especially on CSV files.
For a job I needed a program that gave my insight in the distribution of data within a column. If the values differ wildly it tells me something about consistency. I didn't have one in my toolbox, so I decided to make one.
Now, when collecting all data within a spreadsheet you need a LOT of space - any constantly looking up data takes a lot of time.
So first, I decided to rewind the file and take on every column individually. After the report, the data for that column could be discarded. In order to quickly lookup data I decided to use the BS-table library. That one does a quick binary scan.
In order to fit a string in a key, I used my faithful FNV1a hash. I needed two of them, one for the count and one for the name. For the name I used the dynamic string library.
Once a value was entered, I only needed to update the "count" BS table. After all rows were done I sorted that table using the "in place" Combsort routine - then looked up the name in the other BS-table. Yes, I needed the count in descending order.
Finally, like CSVscan, I allowed for the results to be dumped in FODS format. And yes, we got a lib for that.
I'm very happy with my new program. I analyzed the entire database in about 15s. Which is great. I would have taken me hours to do that manually.
That's it for now.
Hans Bezemer