Hi Rainer,
Do you mean the complete SQL schema for the data? You can grab that here: https://github.com/ourresearch/openalex-documentation-scripts/blob/main/openalex-pg-schema.sql
You can find this schema and more details in the docs here: https://docs.openalex.org/download-all-data/upload-to-your-database/load-to-a-relational-database
I'm not sure what you're using as a backend exactly -- most JSON stores are document stores without strict schema validation so this shouldn't matter much. If you're using a relational database of some sort the schema above should be enough! Maybe the docs for ingesting data through de fit your usecase better though: https://docs.openalex.org/download-all-data/upload-to-your-database/load-to-a-data-warehouse
Cheers,
Samuel
--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openalex-community/C6F63269-251A-448E-ACAF-D30EBCB7195B%40krugs.de.
To view this discussion on the web visit https://groups.google.com/d/msgid/openalex-community/CANB9rguWXjboY6oS0QOoPw%3DR1DLxVPnm55rhSeS699RemGgjHA%40mail.gmail.com.
Hi Rainer,
I think DuckDB would be a great fit here; you can use the R package for it (https://duckdb.org/docs/api/r.html). It can also read in & export parquet files, so it should fit right into your workflow with minimal changes.
DuckDB will handle schema creation for you, and you can set it up to combine schemas or easily add or change columns while ingesting data.
You could also import the sql schema from OpenAlex directly into DuckDB to get the full set of columns, but that might be overkill for your usecase; and it will definitely slow things down because of the restrictions in the schema.
DuckDB is free, open source, very performant & efficient; and its extremely portable -- just like SQLite the entire DB is stored as a single file. It has client APIs for every populair programming language, plus apis for interaction with various webservices and other databases like PostgreSQL. Check out the docs, they're pretty good!
Cheers,To view this discussion on the web visit https://groups.google.com/d/msgid/openalex-community/CANB9rguse5E%3D3d3Twp7RUHZPX49C0XwGc9k4N%2BaQmU7Wd%2BM-yQ%40mail.gmail.com.