Finding and splitting by base URI

26 views
Skip to first unread message

d.c.so...@gmail.com

unread,
Dec 28, 2019, 9:27:22 AM12/28/19
to rdflib-dev
Hello,

I'd like to search a large data set for all terms of a base URI. Can this be done with rdflib?
Something like
for s,p,o in g.triples( (None, DEFAULT['None'], None) ):
    print("{} {} {}".format(s, p, o))
?

And then to split the term, Instead of

just

InterestingTerm

Nicholas Car

unread,
Jan 7, 2020, 12:13:26 AM1/7/20
to rdfli...@googlegroups.com
Hi D.C,

Can I just reflect back the question at you to see if it is understood? Are you asking for something like this:


import rdflib

# 1
base_uri = 'http://www.w3.org/ns/dx/prof/‘

# 2
g = rdflib.Graph().parse('prof.ttl', format='turtle’) 

# 3
for s, p, o in g.triples((None, None, None)):
    # 4
    if str(s).startswith(base_uri):
        print('s {} - {}'.format(str(s).split('/')[-1], s))
    elif str(p).startswith(base_uri):
        print('p {} - {}'.format(str(p).split('/')[-1], p))
    elif str(o).startswith(base_uri):
        print('o {} - {}'.format(str(o).split('/')[-1], o))



So here, following numbered comments in the code:

1. Identifying a base URI
  - which is the base URI of the W3C’s Profiles Vocabulary, see https://www.w3.org/TR/dx-prof/
2. Reading an RDF source 
  - the file prof.ttl which contains the Profiles Vocabulary
3. Looping through all triples in theta source
  - which is the Profiles Vocabulary in a file, but could be a huge dataset
4. Testing to see if each part (here a Subject s) starts with a base_uri
  - splits and grits it if it does


Are there other things you want to be doing or is this it?

Cheers,

Nick





-- 
http://github.com/RDFLib
--- 
You received this message because you are subscribed to the Google Groups "rdflib-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rdflib-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/f7951ae6-d82d-4eab-8b7e-cf078cda6bd2%40googlegroups.com.

Boris Pelakh

unread,
Jan 13, 2020, 3:02:39 PM1/13/20
to rdflib-dev
If you are just looking to strip the leading namespace from any URI string, you could do the following:

import re
from rdflib import URIRef

def simplify(term):
  return re.sub(r'^.*[#/]', '', str(term)) if isinstance(term, URIRef) else str(term)

for s,p,o in g.triples( (None, DEFAULT['None'], None) ):
    print("{} {} {}".format(simplify(s), simplify(p), simplify(o)))
Reply all
Reply to author
Forward
0 new messages