Hi
I am using DuckDB to read in the raw jsons and put them into a parquet file, which works nicely - but I struck a wall with the work
https://openalex.org/W2741809807 as it has in the abstract_inverted_index the term `open` and `Open as well.
I think it is a DuckDB problem, as the terms are considered, when parsing for import, by DuckDB as identical as the parsing is case insensitive.
I understand why they are duplicated (beginning of abstract), and I understand that JSON is case sensitive, but this makes the handling in e./g. DuckDB (and I assume other databases as well?).
Has anybody any solution to this problem, or is the conversion using DuckDB a dead end in this case?
Any other suggestions?
Thanks,
Rainer
Any opinions?
Thanks
Rainer
---
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany)
Orcid ID: 0000-0002-7490-0066
Department of Geography
University of Zürich
Winterthurerstrasse 190
8075 Zürich
Switzerland