A system for importing the data dump

39 views
Skip to first unread message

Steve Canham

unread,
Jan 30, 2025, 12:29:21 PMJan 30
to ROR Technical Forum

Hi

I hope this forum is the most appropriate place for this post - it is largely technical!

I recently developed a small program to automatically import the ROR data dump (schema v2) into a Postgres database, I hope allowing the data to be more easily integrated with other systems. The system also does some basic analysis of the data and automatically generates a report (and / or matching csv files) that summarises some key statistics, as part of the import process.  The report for the latest version is attached (‘v1.59 summary at 01-29 232614.txt’) to illustrate the data produced.

The system stores the key characteristics of the data in different versions, allowing versions to be compared and changes tracked over time. While most changes in the 9 months since v2 was introduced have been relatively modest, I have also attached a short Word document (‘ROR changes  1.45 – 1.59.docx’) with a few graphs showing some trends. These are produced manually, using data from Postgres exported as CSV and then imported into Excel for graphing.

The system (‘ror1’) is still a work in progress. Most of the key functionality is present, but unfortunately it is currently available only as source code rather than a simple downloadable .exe or library file.  I hope to create either or both of those soon. The source code is available on my GitHub at https://github.com/steve-canham/ror1. It is written in Rust, (partly because I wanted to start to learn Rust), which does help to make it quick – the full import / processing / summarising / reporting cycle for a data dump takes about 45 seconds. The ReadMe file on the GitHub page provides detailed information on installing and running the program, as well as further details on what it does. The system is entirely free (it has an MIT licence).

I would be interested in any comment feedback, or suggestions. If anyone is doing something similar - I'm sure people have - it would be useful to compare notes. If anyone would be willing to help develop or test the system that would also be great. It is developed on a Windows machine, but eventually I would like to see if an .exe can be run on Linux or Mac machines as well – the problem is I don’t have easy access to either of those operating systems.

Doing this exercise has indicated that there are a few technical issues with the data that might need addressing – I will try to describe them in another email. It has also raised some more basic questions, about for instance the use cases of language codes, which I am sure have already been discussed at length within ROR. It would be good, however, to clarify what the current thinking is on these issues.

Best wishes

Steve

 

ROR changes 1-45 – 1-59.docx
v1.59 summary at 01-29 232614.txt
Reply all
Reply to author
Forward
0 new messages