|
|||||||||||||||||||||||||||
|
|
--
http://github.com/RDFLib
---
You received this message because you are subscribed to the Google Groups "rdflib-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rdflib-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh19yjpwB8EoHVqs5QzKug_rSq1X%2BfFHfnFtOJBdZ1RwYg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/CADyR_r1Q_hvfnufYVD0YLYhP%3DwEXnjsi5ucpjzWK_owyYfsfnQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/905F1E60-396C-4320-88D1-5A0BCB15B785%40gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWs6jG-f5HWqew0iqdpqObab3ft-L%3DNyvS7p%2By%2BGAV4RoQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh06aApBfyx11L0_uik_BMFA563qXUFLYYOaonumyP4A5g%40mail.gmail.com.
Hi Nick,TBH, it's pretty much a function that converts a Dict or a JSON file in a streaming fashion: https://github.com/viaacode/construction-site/blob/main/construction_site/parse_functions.py. I think it's a stand-alone thing; I don't plan anything extra on that specifically, with maybe the exception of a cmd interface (hence the proposed refactoring of csv2rdf)
I do plan to develop more components that assist scalable ETL, data-to-rdf like tasks. This includes a plugin for Apache Airflow ("provider"), which would be good as a RDFLib family repository.
Best,MielOp wo 28 jul. 2021 om 06:09 schreef Nicholas Car <nichol...@surroundaustralia.com>:Hi Meil,Yes, all offers of contribution are of interest! The CSV 2 RDF stuff is very old and many tools related to it, such as pyTARQL (https://github.com/RDFLib/pyTARQL), are missing. Are you planning on presenting JSON2RDF as a new plugin to RDFlib? that may be an option, however remember that another option is also just to present your tool's repository within RDFlib's family of repositories (i.e. within https://github.com/RDFLib) and the choice will depend on how stable the tool is and how you see it's future development going.But perhaps you have other things in mind? Whatever the case, we'd love to hear your plans.Cheers,NickOn Tue, Jul 27, 2021 at 5:56 PM Miel Vander Sande <miel.van...@meemoo.be> wrote:Hi all,little late to the party, but what a great effort this is! Congrats with the release and thank you; this library is super essential to my work and it makes RDF usable in ways other libraries can't.Sidenote: I have a streaming direct json-to-rdf mapping implementation (port of https://github.com/AtomGraph/JSON2RDF) that I'd like to contribute, possibly in combination with a refactoring of https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.tools.html#rdflib.tools.csv2rdf.CSV2RDF. Would that be of interest?
To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWtCCa-b7Q%3DycdsxaoTRd%2BHALfWK9rAeNS0JTZcO4pUX9w%40mail.gmail.com.
On Wed, Jul 28, 2021 at 4:49 AM Miel Vander Sande <miel.van...@meemoo.be> wrote:Hi Nick,TBH, it's pretty much a function that converts a Dict or a JSON file in a streaming fashion: https://github.com/viaacode/construction-site/blob/main/construction_site/parse_functions.py. I think it's a stand-alone thing; I don't plan anything extra on that specifically, with maybe the exception of a cmd interface (hence the proposed refactoring of csv2rdf)Profiling / [comparative] benchmarks with e.g. Scalene [1][2] and/or perfplot [3] (%timeit) [4][5] could be worthwhile.
Other methods for CSV + transforms => RDF?
- #rdflib csv2rdf
- COW
- #CSVW https://github.com/cldf/csvw/blob/master/README.md#see-also
- @kidehen sponger / rdfm_yq_parse_csv()?
- #tarql
- #csv2rdf GH topic: https://github.com/topics/csv2rdf
Adding columnar & dataset-level metadata *with URIs* is the value add here, IMHO #LR
* URIs for columns (RDF)
* Document Metadata* CSV -> JSON (-> JSON-LD -> RDF)* CSV -> RDFCould there be a file naming convention for specifying the extra CSVW header to apply_to or transform zero or more CSV files with?filename.csvfilename.csv.csvwfilename.csv.csvwheader.jsonld.jsonfilename.csv.csvw.jsonld.json
To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/CACfEFw9FEM42_E8QATaP%3DDPiQsDD4nbPzi-XzNDfurkZ0P9Efg%40mail.gmail.com.
On Wed, Jul 28, 2021 at 4:49 AM Miel Vander Sande <miel.van...@meemoo.be> wrote:Hi Nick,TBH, it's pretty much a function that converts a Dict or a JSON file in a streaming fashion: https://github.com/viaacode/construction-site/blob/main/construction_site/parse_functions.py. I think it's a stand-alone thing; I don't plan anything extra on that specifically, with maybe the exception of a cmd interface (hence the proposed refactoring of csv2rdf)Profiling / [comparative] benchmarks with e.g. Scalene [1][2] and/or perfplot [3] (%timeit) [4][5] could be worthwhile.ijson [6] looks like it has some interesting features; iterative, asyncio, push. How does the performance compare?
I do plan to develop more components that assist scalable ETL, data-to-rdf like tasks. This includes a plugin for Apache Airflow ("provider"), which would be good as a RDFLib family repository.- The datasette and dogsheep projects have a bunch of *-to-sqlite utils and an interface that a number of projects on PyPI have implemented:- parse datetimes in CSVs- xsd:datetime (and schema.org/Date and schema.org/dateCreated and schema.org/dateModified) specifies that time will be specified in ISO8601 formats
What are the solutions for generating RDFS schema from CSVs and SQL tables?- doesn't do anything with datatypes FWICSdef suggest_column_types:
> PyRDB2RDF provides RDFLib with an interface to relational databases as RDF stores. The underlying data is accessed via SQLAlchemy. It is mapped to RDF according to the specifications of RDB2RDF. The corresponding RDF graph is represented as an RDFLib graph.>
> Translating from relational data to RDF via direct mapping is currently supported. Translating in the other direction and mapping with R2RML are planned but not yet implemented.
- Does this handle datetimes?- Generate JSONschema from JSON and SHACL from JSON-Schema:- https://pypi.org/project/genson/ has been recently updated> JSON-LD Schema defines a simple 'semantics' JSON-Schema vocabulary (effectively a JSON-Schema meta-schema) that reuses the official JSON Schema for JSON-LD to provide definitions for @context and @type properties. These annotations can be used to provide JSON-LD context for a JSON-Schema document. Provided this JSON-LD context, constraints over named 'properties' in a JSON Schema document can be understood as constraints over CURIES of JSON-LD documents following the context rules defined in the JSON-LD specification.
To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/CACfEFw_4aE2CHTCjnB_Yy5spB%2Bn3WQYOKoKUZ6oy74jUFYDiTw%40mail.gmail.com.