Using rdflib for CSV ETL

16 views
Skip to first unread message

Boris Pelakh

unread,
Jan 13, 2020, 12:35:56 PM1/13/20
to rdflib-dev
I have utilized TARQL (https://tarql.github.io/) fairly extensively to transform CSV data to RDF. However, it is built on top of Jena and comes with Java dependencies, which makes less than optimal for light-weight pipelines deployed in AWS Lambda or Glue. I frequently drive the TARQL process from Python scripts, requiring me to have both a Java and Python environment present, which is not ideal.

I would like to build a Python-native version on top of rdflib that would provide similar functionality. Since the SPARQL engine is already there, hooking it up to a CSV data source instead of the internal graph should be possible. Has anyone done anything like this? And if not, is there anything in the rdflib query evaluator logic that would it make it impossible (or very difficult)? 

Thanks!
Reply all
Reply to author
Forward
0 new messages