Importing RDF data from S3

220 views
Skip to first unread message

Nicholas Car

unread,
Feb 17, 2022, 12:24:24 AM2/17/22
to topbrai...@googlegroups.com
Dear TQ,

I need to be able to import RDF data from RDF files stored in S3 and elsewhere on the public internet into EDG (7.1).


1. Is it possible to parse RDF in ADS and write it to a graph or perhaps bulk load it?

Your video introducing reading data into a graph using Active Data Shapes ("Importing Spreadsheet Data into TopBraid EDG using Active Data Shapes": https://www.youtube.com/watch?v=Dn7O8siZpTc) and your written examples in documentation (https://www.topquadrant.com/doc/7.1/scripting/services.html#example-creating-a-data-importer) show reading data from CSV files using your "asSpreadsheet" function. Also, the videos mention that data from other sources such as XML data can be parsed but no RDF file reading is shown.

For public RDF data on the internet, can some ADS method like this work:

let s = IO.http({"url": "http://some-web-address.com/file/rdf-file-1.ttl"}) // get the content of the RDF file
s.data // somehow write that content to graph


2. Can the AWS S3 configuration that is used to restore from a backup in an S3 bucket be used so that ADS scripts can read from a bucket too?

Currently the per-system or per-asset AWS configuration is for writing backups and restoring from them only, not reading data from them.


3. Is watching of objects in AWS be implemented so that Assets (Datagraphs) can be synchronised with RDF files in a configured S3 folder?

Thanks,

Nick

Holger Knublauch

unread,
Feb 17, 2022, 1:29:58 AM2/17/22
to topbrai...@googlegroups.com

On 17 Feb 2022, at 9:35 am, Nicholas Car <nichol...@surroundaustralia.com> wrote:

Dear TQ,

I need to be able to import RDF data from RDF files stored in S3 and elsewhere on the public internet into EDG (7.1).


1. Is it possible to parse RDF in ADS and write it to a graph or perhaps bulk load it?

Your video introducing reading data into a graph using Active Data Shapes ("Importing Spreadsheet Data into TopBraid EDG using Active Data Shapes": https://www.youtube.com/watch?v=Dn7O8siZpTc) and your written examples in documentation (https://www.topquadrant.com/doc/7.1/scripting/services.html#example-creating-a-data-importer) show reading data from CSV files using your "asSpreadsheet" function. Also, the videos mention that data from other sources such as XML data can be parsed but no RDF file reading is shown.

For public RDF data on the internet, can some ADS method like this work:

let s = IO.http({"url": "http://some-web-address.com/file/rdf-file-1.ttl"}) // get the content of the RDF file
s.data // somehow write that content to graph

I cannot think of a simple way to do that with 7.1, which is why for 7.2 we are introducing a function tbs.importRDFFile that can be used to parse a file (e.g. downloaded with IO.http() directly into a target asset collection or working copy. 7.2 includes a much larger API with the prefix tbs for all kinds of basic functionality.

A complex solution for 7.1 would be to write a helper SWP service and call it using graph.swp(). That SWP component could take the (Turtle) text as input, use sml:ConvertTextToRDF to parse that and finally add the triples.

I cannot answer the other two questions below but hope my colleagues will respond when their day starts.

Holger




2. Can the AWS S3 configuration that is used to restore from a backup in an S3 bucket be used so that ADS scripts can read from a bucket too?

Currently the per-system or per-asset AWS configuration is for writing backups and restoring from them only, not reading data from them.


3. Is watching of objects in AWS be implemented so that Assets (Datagraphs) can be synchronised with RDF files in a configured S3 folder?

Thanks,

Nick

--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/CAP7nqh0c%3DKp5%3DorSLxQLjuihkuiqAFcHWMHEGV_riQTv-ehZQQ%40mail.gmail.com.

Nicholas Car

unread,
Feb 17, 2022, 2:12:08 AM2/17/22
to topbrai...@googlegroups.com
Thanks Holger,

An easy interim handling of RDF data in ADS that I've just worked out is to use the HexTuples RDF exchange format which is new line-delimited JSON, e.g. from your SHACL person example data:
["http://example.org/ns#Bob", "http://schema.org/givenName", "Robert", "http://www.w3.org/2001/XMLSchema#string", "", ""]
["http://example.org/ns#Bob", "http://schema.org/deathDate", "1968-09-10", "http://www.w3.org/2001/XMLSchema#date", "", ""]
["http://example.org/ns#Bob", "http://schema.org/birthDate", "1971-07-07", "http://www.w3.org/2001/XMLSchema#date", "", ""]
["http://example.org/ns#BobsAddress", "http://schema.org/streetAddress", "1600 Amphitheatre Pkway", "http://www.w3.org/2001/XMLSchema#string", "", ""]
["http://example.org/ns#BobsAddress", "http://schema.org/postalCode", "9404", "http://www.w3.org/2001/XMLSchema#integer", "", ""]
["http://example.org/ns#Bob", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://schema.org/Person", "globalId", "", ""]
["http://example.org/ns#Bob", "http://schema.org/address", "http://example.org/ns#BobsAddress", "globalId", "", ""]
["http://example.org/ns#Bob", "http://schema.org/familyName", "Junior", "http://www.w3.org/2001/XMLSchema#string", "", ""]
You can easily see that this is complete RDF and extremely easy to import into an ADS graph being just simple JSON (much simpler than JSON-LD).

I can export most external RDF data to HexTuples since a HexTuples exporter was added to the RDFLib Python library I use for other processing (added by me!).


I am still very keen to find out about the questions 2. & 3. They are actually more important I suppose.

2. Can the AWS S3 configuration that is used to restore from a backup in an S3 bucket be used so that ADS scripts can read from a bucket too?

3. Is watching of objects in AWS implemented so that Assets (Datagraphs) can be synchronised with RDF files in a configured S3 folder?
Thanks,

Nick

Pat Doyle

unread,
Feb 18, 2022, 9:00:51 AM2/18/22
to TopBraid Suite Users
On Thu, Feb 17, 2022 at 2:12 AM Nicholas Car <nichol...@surroundaustralia.com> wrote:
Thanks Holger,

An easy interim handling of RDF data in ADS that I've just worked out is to use the HexTuples RDF exchange format which is new line-delimited JSON, e.g. from your SHACL person example data:
["http://example.org/ns#Bob", "http://schema.org/givenName", "Robert", "http://www.w3.org/2001/XMLSchema#string", "", ""]
["http://example.org/ns#Bob", "http://schema.org/deathDate", "1968-09-10", "http://www.w3.org/2001/XMLSchema#date", "", ""]
["http://example.org/ns#Bob", "http://schema.org/birthDate", "1971-07-07", "http://www.w3.org/2001/XMLSchema#date", "", ""]
["http://example.org/ns#BobsAddress", "http://schema.org/streetAddress", "1600 Amphitheatre Pkway", "http://www.w3.org/2001/XMLSchema#string", "", ""]
["http://example.org/ns#BobsAddress", "http://schema.org/postalCode", "9404", "http://www.w3.org/2001/XMLSchema#integer", "", ""]
["http://example.org/ns#Bob", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://schema.org/Person", "globalId", "", ""]
["http://example.org/ns#Bob", "http://schema.org/address", "http://example.org/ns#BobsAddress", "globalId", "", ""]
["http://example.org/ns#Bob", "http://schema.org/familyName", "Junior", "http://www.w3.org/2001/XMLSchema#string", "", ""]
You can easily see that this is complete RDF and extremely easy to import into an ADS graph being just simple JSON (much simpler than JSON-LD).

I can export most external RDF data to HexTuples since a HexTuples exporter was added to the RDFLib Python library I use for other processing (added by me!).


I am still very keen to find out about the questions 2. & 3. They are actually more important I suppose.

2. Can the AWS S3 configuration that is used to restore from a backup in an S3 bucket be used so that ADS scripts can read from a bucket too?

Not directly.  But you can manipulate the S3 attachments feature to do this.  If you tie an item in an S3 bucket to a resource in an asset collection, you can construct a request and call the s3export servlet to download the file for use in ADS, similar to how you're looking to pull arbitrary ttl files from the web.

I did a quick test to pull a .ttl file that I had uploaded as an attachment to a resource, and the contents of the attachment became available to me in ADS:

image.png
 

3. Is watching of objects in AWS implemented so that Assets (Datagraphs) can be synchronised with RDF files in a configured S3 folder?

I'd recommend looking at our git integration features for this.  We don't have any autosync mechanisms built in, but have capabilities to import/export asset collections to/from git.  
 

Nicholas Car

unread,
Feb 22, 2022, 1:51:25 AM2/22/22
to topbrai...@googlegroups.com
OK, thanks for that Pat. Marcus did think it would be possible to hack the Bucket attachment somehow but we didn't know how to access its content in ADS. We still have an issue to do with reading TTL and other RDF content but if we use HexT content for the Bucket files, then reading that in ADS is straightforward.

I'll reply back when we have this working.

Cheers,

Nick

Reply all
Reply to author
Forward
0 new messages