RDFLIB with external DB Such as Neptune

130 views
Skip to first unread message

Cory Casanave

unread,
Jan 7, 2022, 5:50:32 PM1/7/22
to rdflib-dev
We are interested in experience using AWS Neptune from Python with or without RDFLIB. 

With RDFLIB, how transparent is the access, e.g. can all of the functions be used or only query? Are you doing updates? Using transactions? How is the performance? Is there a performance of feature "hit" for going through RDFLIB?

Without RDFLIB, what are you using to access Neptune? Same questions as above.

Please feel free to add any pointers or context.
Thanks in advance for any thoughts!

Jim Amsden

unread,
Jan 7, 2022, 10:49:51 PM1/7/22
to rdfli...@googlegroups.com
Cory,
I’ve made extensive use of the Python rdflib and have developed OSLC interfaces to do data science and analytics on jazz.net data. I found it full featured and easy to use. I have done many updates with it. 

--
http://github.com/RDFLib
---
You received this message because you are subscribed to the Google Groups "rdflib-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rdflib-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/2da5501e-b548-45c5-abcd-00a16d30576bn%40googlegroups.com.

Nicholas Car

unread,
Jan 8, 2022, 2:53:08 AM1/8/22
to rdfli...@googlegroups.com
> With RDFLIB, how transparent is the access,

There are several ways to use Neptune with RDFlib, for example as a back-end Store which is perhaps the most obvious/likely, but you could also be using SPARQLWrapper [1] and sending SPARQL queries to Neptune that way and just using RDFlib to parse RDF responses.

> Are you doing updates?

Store back-ends and SPARQLWrapper should allow for updates, both by using SPARQL's UPDATE query.

> Using transactions?

RDFLib's SPARQLStore back-end supports transactions.

> How is the performance? Is there a performance of feature "hit" for going through RDFLIB?

I've used RDFlib with Neptune but only to test things out and not enough to make sensible comment on this. I guess also I'd need to know performance as compared with what? Just lodging SPARQL queries in the Neptune native system perhaps?


Cheers,

Nick

Wes Turner

unread,
Jan 8, 2022, 11:58:04 AM1/8/22
to rdfli...@googlegroups.com
Are there e.g. Jupyter notebooks with executable snippets demonstrating use of at least every store?
(Jupyter-book wraps Sphinx to generate docs from  {.rst, .md, and .ipynb (with output cells)})

Which RDFlib tests/ directory should be the most comprehensive reference of what does and doesn't work with a {given RDFlib store,  SPARQL endpoint, }?

Nicholas Car

unread,
Jan 8, 2022, 10:04:54 PM1/8/22
to rdfli...@googlegroups.com
> Are there e.g. Jupyter notebooks with executable snippets demonstrating use of at least every store?

Not that I'm aware of, but some Jupyter demos would be useful!

I've used Jupyter to demo aspects of RDFlib previously for university students (https://github.com/nicholascar/comp7230-training) and I know others have done similar things.

We might consider a stand-alone documentation repo in the RDFlib GitHub organization with Jupyter examples?

> Which RDFlib tests/ directory should be the most comprehensive reference of what does and doesn't work with a {given RDFlib store,  SPARQL endpoint, }?

We've slowly started to improve the Store documentation with, for example, a beter listing of multiple stores at https://rdflib.readthedocs.io/en/stable/plugin_stores.html but I agree that the documentation is disjointed, for example, there is no link to that Store listing page from the Persistence page (https://rdflib.readthedocs.io/en/stable/persistence.html).

I think we should:
  • try and improve the documentation of Stores
  • perhaps prepend all store examples with store_, so we should have:
    • store_berkeleydb.py (not berkeleydb_example.py)
    • store_starqlstore.py (not sparqlstore_example.py)
  • grow the number of store examples
    • HDT
    • SQLAlchemy
    • LevelDB
    • SPARQL ReadWrite


Wes Turner

unread,
Jan 9, 2022, 12:30:39 AM1/9/22
to rdfli...@googlegroups.com
On Sat, Jan 8, 2022, 10:04 PM Nicholas Car <nichol...@surroundaustralia.com> wrote:
> Are there e.g. Jupyter notebooks with executable snippets demonstrating use of at least every store?

Not that I'm aware of, but some Jupyter demos would be useful!

I've used Jupyter to demo aspects of RDFlib previously for university students (https://github.com/nicholascar/comp7230-training) and I know others have done similar things.

We might consider a stand-alone documentation repo in the RDFlib GitHub organization with Jupyter examples?

- [ ] choose a project url and name

  - rdflib/notebooks
    - pypi:notebooks is no good
  - rdflib/rdflib_notebooks
  - rdflib/rdflibnotebooks
  - rdflib/TBD  # ~ RDF bnodes

- [ ] generate an rdflib/TBD org project from the/a GitHub project template?
  - [ ] create an rdflib/ project template

(To be clear, I'm often only good for *suggesting* tasks.)

- [ ] DOC,BLD,ENH: Create a Jupyter Book folder and configuration for RDFlib demo/testing/store_benchmark notebooks
  `jupyter-book create TBD/`

  - [ ] DOC,ENH: can and shouldn't the sphinx-apidocs for rdflib.* -- maybe even gh:rdflib/* --  be included in a Jupyter-Book (Sphinx) JAMstack static HTML website, just like they are in the rdflib docs (Sphinx)?

- [ ] BLD: Configure the repo for CI
  - [ ] GitHub Actions / Drone
    - Comprehensive RDFlib Store benchmarks (in notebooks) could cause a prohibitively slow build; which may be justified given the expected contribution frequency for this project


> Which RDFlib tests/ directory should be the most comprehensive reference of what does and doesn't work with a {given RDFlib store,  SPARQL endpoint, }?

We've slowly started to improve the Store documentation with, for example, a beter listing of multiple stores at https://rdflib.readthedocs.io/en/stable/plugin_stores.html but I agree that the documentation is disjointed, for example, there is no link to that Store listing page from the Persistence page (https://rdflib.readthedocs.io/en/stable/persistence.html).

- [ ] DOC: Distill store test guidelines for QA and performance purposes
  - `ls store_*/tests?/* | basename | sort -u`
  - [ ] TST,ENH,PERF: rdflib/rdflib:  ITestRDFlibStore, ITestSPARQLendpoint; 

   - [ ] DOC,ANN,REQ: @RDFlib/store_maintainers/*: add an e.g.  tests/test_store_performance.py for e.g. the rdflib/rdflib_notebooks Jupyter Book benchmarks and better

  - re: benchmarks and python [web] app performance:
    - In context to this existing (Python) fast open source webapp for hosting trained ML models, is RDFlib like an ML framework for *predictive* inferencing?


- [ ] DOC,ENH: One or more [nbgrader-able]  notebooks; for +training @online +selfpaced

  - https://jupyterhub-deploy-teaching.readthedocs.io/en/latest/  describes how to host a jupyterhub for everyone, so they can run your notebooks *AND* which other code on your costed resources.

- [ ] BLD,ENH: generate a Jupyter Lite (WASM) build that includes rdflib, store plugins, and a recent build of JupyterLab




I think we should:
  • try and improve the documentation of Stores
  • perhaps prepend all store examples with store_, so we should have:
    • store_berkeleydb.py (not berkeleydb_example.py)
    • store_starqlstore.py (not sparqlstore_example.py)
  • grow the number of store examples
    • HDT
    • SQLAlchemy
    • LevelDB
    • SPARQL ReadWrite
Good plan.

What are some good DRY ways to include actual test cases from tests/ as rdflib API usage demos in Jupyter notebooks?

- IPython / Jupyter magics:
  - %pdoc
  - %pfile
  - %psource


- fastai/nbdev is another way to be DRY (Don't Repeat Yourself) about tests and notebook demo examples with output and docs. 

  > Automatically generate docs from Jupyter notebooks. These docs are searchable and automatically hyperlinked to appropriate documentation pages by introspecting keywords you surround in backticks


- Copy/paste and adapt from ~ demo examples in tests/ 
  - Copying and pasting is forking; and who will continue to merge RDFlib api changes back to the docs notebooks when the CI tests of the demo notebooks (`jupyter-book build`) fail due to Exceptions caused by justified breaking refactorings?

Reply all
Reply to author
Forward
0 new messages