Dealing with fetching/parsing the preferred/alternative labels list

38 views
Skip to first unread message

sanjeev devireddy

unread,
Feb 19, 2018, 8:59:06 AM2/19/18
to TopBraid Suite Users
Hi,
       We have the following two challenges while fetching/parsing the list of preferred/alternative labels received from the the SPARQL Endpoint service response.

1)To get the preferred/alternative label(s) without language tags. We just want to check that is there a way to get the labels without language tags?
   


2)When there is a comma in the labels of preferred/alternative labels (please check the below example).

     Assume that there a single concept named Latin America, North America. As shown below, we can see that there are two labels of the languages English & French. Here we can observe that in each language label there is a comma. In this case, using comma as a separator will fail to get the English & French language labels of a single concept Latin America, North America.
    "prefLabel_0": { "type": "literal" , "value": "Latin America, North America(en), Amérique latine, Amérique du Nord(fr)" }

   So we just want to check what could be the best way in this kind of scenario?



Thanks,
Sanjeev
Auto Generated Inline Image 1

Holger Knublauch

unread,
Feb 19, 2018, 6:41:53 PM2/19/18
to topbrai...@googlegroups.com


On 19/02/2018 23:59, sanjeev devireddy wrote:
Hi,
       We have the following two challenges while fetching/parsing the list of preferred/alternative labels received from the the SPARQL Endpoint service response.

1)To get the preferred/alternative label(s) without language tags. We just want to check that is there a way to get the labels without language tags?

You can convert a language-tagged string literal to a simple xsd:string literal using the built-in str function, e.g.

SELECT *
WHERE {
    ?concept skos:prefLabel|skos:altLabel ?label .
    BIND (str(?label) AS ?stringLabel) .

}

   


2)When there is a comma in the labels of preferred/alternative labels (please check the below example).

     Assume that there a single concept named Latin America, North America. As shown below, we can see that there are two labels of the languages English & French. Here we can observe that in each language label there is a comma. In this case, using comma as a separator will fail to get the English & French language labels of a single concept Latin America, North America.
    "prefLabel_0": { "type": "literal" , "value": "Latin America, North America(en), Amérique latine, Amérique du Nord(fr)" }

   So we just want to check what could be the best way in this kind of scenario?

So you want the query to return all sub-strings as separated by commas? The following example will produce an iteration over all substrings, and trim them off extra spaces in case the string is irregular.

SELECT *
WHERE {
    BIND ("Latin America, North America(en), Amérique latine, Amérique du Nord(fr)" AS ?str)
    ?sub spif:split (?str ",") .
    BIND (spif:trim(?sub) AS ?trimmed) .
}

To get rid of the language tags, you'd need to do further string processing using functions like spif:indexOf and SUBSTR.

HTH
Holger

sanjeev devireddy

unread,
Feb 20, 2018, 5:42:05 AM2/20/18
to TopBraid Suite Users
Hi Holger,
      Your suggestion of using built-in str function works only when all the language labels are queried. As per our requirement, we wrote a SPARQL (please check below) to get  English language labels only and in this case the built-in str function fails(please check below screenshot) to convert a language-tagged string literal to a simple xsd:string literal. Could you please check the below SPARQL to see if there is any change that can be done to the SPARQL to convert a language-tagged string literal to a simple xsd:string literal?



 SPARQL:
SELECT DISTINCT ?preferredlabel ?stringPrefLabel ?result
WHERE {
    GRAPH <urn:x-evn-master:geo> {
        {
            ?result a <http://topquadrant.com/ns/examples/geography#Continent> .
        } .
        BIND (search:nestedObjectsList(?result, skos:prefLabel, "result", ?none, "en") AS ?preferredlabel) .
        BIND (str(?preferredlabel) AS ?stringPrefLabel) .
    }
}
ORDER BY (LCASE(?label))


Coming to the other question on dealing with the label that has comma in it,it seems in the above post I was referring to a bad example so below is another example that I wan t to share.
Our taxonomy has a concept and it's preferred labels in English & French languages are Government, Central/Federal (en) and Gouvernement, Central/Fédéral (fr). Here the thing to observe is that each label contains comma in it. Now, when a SPARQL is written to get the above preferred labels of the concept and use SPARQL Endpoint service then the response contains the two language labels separated by comma as shown below. Now the challenge is that the labels contains a comma in them and the separator for the English and French labels is also a comma.So, using comma as separator to get labels from the below json will give the 4 values i) Government, ii)Central/Federal (en), iii) Gouvernement , iv)Central/Fédéral (fr). But the requirement is to get the actual English and French language labels correctly. 
 
e.g.: "prefLabel_0":{"value":"Government, Central/Federal (en), Gouvernement, Central/Fédéral (fr)","type":"literal"}

So, we just want to check that what could be the best way in this kind of scenarios to get the labels correctly? Is there way to specify a separator(like pipe (|) ) for the preferred/ alternate labels list in the above json through the SPARQL Endpoint service or so?


Thanks,
Sanjeev
Auto Generated Inline Image 1

Holger Knublauch

unread,
Feb 20, 2018, 8:48:45 PM2/20/18
to topbrai...@googlegroups.com
Hi Sanjeev,



On 20/02/2018 20:42, sanjeev devireddy wrote:
Hi Holger,
      Your suggestion of using built-in str function works only when all the language labels are queried. As per our requirement, we wrote a SPARQL (please check below) to get  English language labels only and in this case the built-in str function fails(please check below screenshot) to convert a language-tagged string literal to a simple xsd:string literal. Could you please check the below SPARQL to see if there is any change that can be done to the SPARQL to convert a language-tagged string literal to a simple xsd:string literal?



 SPARQL:
SELECT DISTINCT ?preferredlabel ?stringPrefLabel ?result
WHERE {
    GRAPH <urn:x-evn-master:geo> {
        {
            ?result a <http://topquadrant.com/ns/examples/geography#Continent> .
        } .
        BIND (search:nestedObjectsList(?result, skos:prefLabel, "result", ?none, "en") AS ?preferredlabel) .
        BIND (str(?preferredlabel) AS ?stringPrefLabel) .
    }
}
ORDER BY (LCASE(?label))

in your example data, the nestedObjectsList function produces strings such as "Asia (en)" that use a non-standard format of saving language tags. In contrast, the official way looks (in Turtle) like "Asia"@en - the string literal itself does not include the language but the tag is a special attachment to the literal. In the former case, the str function already operates on an xsd:string without language tag - it doesn't use the ... (en) naming convention. If you want to get rid of the " (en) sub-strings, you could for example use a REPLACE, e.g.

    BIND (REPLACE(?preferredlabel, " \\(en\\)", "") AS ?s)





Coming to the other question on dealing with the label that has comma in it,it seems in the above post I was referring to a bad example so below is another example that I wan t to share.
Our taxonomy has a concept and it's preferred labels in English & French languages are Government, Central/Federal (en) and Gouvernement, Central/Fédéral (fr). Here the thing to observe is that each label contains comma in it. Now, when a SPARQL is written to get the above preferred labels of the concept and use SPARQL Endpoint service then the response contains the two language labels separated by comma as shown below. Now the challenge is that the labels contains a comma in them and the separator for the English and French labels is also a comma.So, using comma as separator to get labels from the below json will give the 4 values i) Government, ii)Central/Federal (en), iii) Gouvernement , iv)Central/Fédéral (fr). But the requirement is to get the actual English and French language labels correctly. 
 
e.g.: "prefLabel_0":{"value":"Government, Central/Federal (en), Gouvernement, Central/Fédéral (fr)","type":"literal"}

So, we just want to check that what could be the best way in this kind of scenarios to get the labels correctly? Is there way to specify a separator(like pipe (|) ) for the preferred/ alternate labels list in the above json through the SPARQL Endpoint service or so?

Why do you need to go through nestedObjectsList at all? This would give you only the french labels:

SELECT *
WHERE {
    ?result a g:Continent .
    ?result skos:prefLabel ?label .
    FILTER (lang(?label) = "fr")
}

Holger


--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

sanjeev devireddy

unread,
Feb 21, 2018, 10:15:49 AM2/21/18
to TopBraid Suite Users
Hi Holger,
   The only reason for using the nestedObjectsList is to get all the preferred labels of a concept as a single list and in a single json object instead of getting a different json object for each label. Basically getting all preferred labels as a list separated by a delimiter helps to reduce the dependent application's processing time. Let's say there are 50 preferred labels then after getting the SPARQL Endpoint service response, splitting those 50 labels by a delimiter takes less time than calling the get() method 50 times for the 50 json objects. It is not only about preferred labels. We have alternate labels, ambiguous alternate labels, etc for our concepts.

     When using the nestedObjectsList to get all the labels of a concept as list then since the delimiter is comma for the list of labels so when a label itself contains comma then splitting the labels by the delimiter comma fails to give the correct labels.

SPARQL response with nestedObjectsList in the SPARQL

Below SPARQL Endpoint service response contains single json object for all the preferred labels of the concept Africa.



 
SPARQL response without nestedObjectsList in the SPARQL

Below SPARQL Endpoint service response contains different json objects for the preferred labels of the concept Arfica.





Thanks,
Sanjeev
Auto Generated Inline Image 1
Auto Generated Inline Image 2

Holger Knublauch

unread,
Feb 21, 2018, 5:22:44 PM2/21/18
to topbrai...@googlegroups.com
As an alternative to nestedObjectsList (which was really just built for display purposes, not further machine processing), look at the SPARQL built-in GROUP_CONCAT. See for example

    https://stackoverflow.com/questions/18212697/aggregating-results-from-sparql-query

Holger

sanjeev devireddy

unread,
Feb 23, 2018, 4:21:40 AM2/23/18
to TopBraid Suite Users
Thanks Holger. It works.

Thanks,
Sanjeev
Reply all
Reply to author
Forward
0 new messages