Finding and splitting by base URI

d.c.so...@gmail.com

unread,

Dec 28, 2019, 9:27:22 AM12/28/19

to rdflib-dev

Hello,

I'd like to search a large data set for all terms of a base URI. Can this be done with rdflib?

Something like

for s,p,o in g.triples( (None, DEFAULT['None'], None) ):
    print("{} {} {}".format(s, p, o))

?

And then to split the term, Instead of

<http://some.default.base/#InterestingTerm>

just

InterestingTerm

Nicholas Car

unread,

Jan 7, 2020, 12:13:26 AM1/7/20

to rdfli...@googlegroups.com

Hi D.C,

Can I just reflect back the question at you to see if it is understood? Are you asking for something like this:

import rdflib

# 1
base_uri = 'http://www.w3.org/ns/dx/prof/‘

# 2
g = rdflib.Graph().parse('prof.ttl', format='turtle’)

# 3
for s, p, o in g.triples((None, None, None)):

# 4
  if str(s).startswith(base_uri):
  print('s {} - {}'.format(str(s).split('/')[-1], s))
  elif str(p).startswith(base_uri):
  print('p {} - {}'.format(str(p).split('/')[-1], p))
  elif str(o).startswith(base_uri):
  print('o {} - {}'.format(str(o).split('/')[-1], o))

So here, following numbered comments in the code:

1. Identifying a base URI

- which is the base URI of the W3C’s Profiles Vocabulary, see https://www.w3.org/TR/dx-prof/

2. Reading an RDF source

- the file prof.ttl which contains the Profiles Vocabulary

3. Looping through all triples in theta source

- which is the Profiles Vocabulary in a file, but could be a huge dataset

4. Testing to see if each part (here a Subject s) starts with a base_uri

- splits and grits it if it does

Are there other things you want to be doing or is this it?

Cheers,

Nick

--
http://github.com/RDFLib
---
You received this message because you are subscribed to the Google Groups "rdflib-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rdflib-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/f7951ae6-d82d-4eab-8b7e-cf078cda6bd2%40googlegroups.com.

Boris Pelakh

unread,

Jan 13, 2020, 3:02:39 PM1/13/20

to rdflib-dev

If you are just looking to strip the leading namespace from any URI string, you could do the following:

import re

from rdflib import URIRef

def simplify(term):

return re.sub(r'^.*[#/]', '', str(term)) if isinstance(term, URIRef) else str(term)