Querying the CSV meta data with TARQL query

peter.l...@inova8.com

unread,

Feb 1, 2015, 4:42:31 PM2/1/15

to ta...@googlegroups.com

Is it possible to extract the metadata of the CSV, in particular the columns of the CSV, with a query something like this:

SELECT ?column ?dataType
FROM <file:dataset.csv>
WHERE {}

My guess is that such pseudo-values do not exists, so do you think it feasible to append something like the following VALUES clause into generated SPARQL?

SELECT ?column ?dataType
FROM <file:dataset.csv>
WHERE {

...

VALUES(?column ?dataType){(<a> xsd:string)(<b> xsd:float)(<c> xsd:string) .. (<z> xsd:URI)}

}

Richard Cyganiak

unread,

Feb 1, 2015, 5:41:35 PM2/1/15

to peter.l...@inova8.com, ta...@googlegroups.com

Hi Peter,

CSV doesn't have datatypes. Everything in CSV is of the same type, string. One can certainly try to auto-detect the type, but this can often go wrong. 1,000 may be the integer 1000 or the decimal 1.0, depending on the locale were the CSV was produced. Reliably detecting things like say dates or phone numbers is hard (and CSV doesn't guarantee that everything in a column has the same type).

For these reasons, Tarql doesn't do auto-detection of datatypes. If a Tarql user wants typed literals in the output, they have to specify the target datatype in the mapping.

So, I'm afraid Tarql doesn't have the information you want.

Best,

Richard

--
You received this message because you are subscribed to the Google Groups "Tarql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tarql+un...@googlegroups.com.
To post to this group, send email to ta...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/tarql/e2c19891-dac6-4b8e-87ce-f0efa97eaf6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

peter.l...@inova8.com

unread,

Feb 1, 2015, 5:46:53 PM2/1/15

to ta...@googlegroups.com, peter.l...@inova8.com

Understood, I got overly ambitious suggesting the types. However the column names would be useful alone.

Consider the use-case where tarql is part of a CSV uploader: the first thing a user would want to see are the columns of the CSV. Hence the motive for my question.

Richard Cyganiak

unread,

Feb 2, 2015, 9:22:11 AM2/2/15

to peter.l...@inova8.com, ta...@googlegroups.com

Peter,

Would a mode where Tarql simply returns the first few rows of the CSV file, without the need to provide a mapping file, address that use case?

It could be the same as running a query like this:

SELECT * { } LIMIT 5

Best,

Richard

To view this discussion on the web, visit https://groups.google.com/d/msgid/tarql/01652289-2d63-409b-b151-9bc786f4ae6b%40googlegroups.com.

Peter Lawrence

unread,

Feb 2, 2015, 10:10:04 AM2/2/15

to Richard Cyganiak, ta...@googlegroups.com

Richard

It depends. If the first few rows were returned as a SPARQL select response then it would be useful because the application I am envisaging would want to see TARQL as a SPARQL endpoint. Would that be the case?

Thanks for your attention,

Peter

Peter J. Lawrence

inova8

Providing answers for users' information questions

Mobile: +1 330 631 3772 | Phone: +1 330 342 0582 | UK Cell:+44 795 563 2824 |Skype: PeterJLawrence

Email: peter.l...@inova8.com | Web: www.inova8.com

LinkedIn: http://www.linkedin.com/in/peterjohnlawrence

Richard Cyganiak

unread,

Feb 2, 2015, 11:57:23 AM2/2/15

to Peter Lawrence, ta...@googlegroups.com

> On 2 Feb 2015, at 10:09, Peter Lawrence <peter.l...@inova8.com> wrote:
>
> Richard
>
> It depends. If the first few rows were returned as a SPARQL select response then it would be useful because the application I am envisaging would want to see TARQL as a SPARQL endpoint. Would that be the case?

I’m trying to understand the model you have in mind. What would be the query that is sent to the SPARQL endpoint by the client? Would that query be the Tarql mapping? Or would the Tarql mapping “live” on the server, and the endpoint gives access to the result of applying the mapping to the input CSV? Or something else?

Thanks,
Richard

Peter Lawrence

unread,

Feb 2, 2015, 12:13:37 PM2/2/15

to Richard Cyganiak, ta...@googlegroups.com

It might be better to describe the use-story.

A user wants to load another CSV into the triplestore, so they select the CSV.
The system responds with a display of the column names (header row) in the CSV, with perhaps one row of data
For each column of the CSV

the user selects the actual property to which the column will be mapped from an existing vocabulary.
the user enters the formula for mapping the column value, such as URI(CONCAT(http://corporate//sample/id/', ?<col 3>

Once complete the user can upload the CSV at which time the following happens:

The application generates the corresponding TARQL query/mapping (CONSTRUCT)
The TARQL is executed
The response is then loaded into the triplestore.

Does that help?

Thanks

Peter

Peter J. Lawrence

inova8

Providing answers for users' information questions

Mobile: +1 330 631 3772 | Phone: +1 330 342 0582 | UK Cell:+44 795 563 2824 |Skype: PeterJLawrence

Email: peter.l...@inova8.com | Web: www.inova8.com

LinkedIn: http://www.linkedin.com/in/peterjohnlawrence

Peter Lawrence

unread,

Feb 2, 2015, 12:16:23 PM2/2/15

to Richard Cyganiak, ta...@googlegroups.com

BTW the generated mapping would always be of this form:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX prov: <http://www.w3.org/ns/prov#>
CONSTRUCT{
?Observation a qb:Observation ;
qb:dataSet ?Dataset ;
#repeat for each column
?Component_n_property ?Value_n .
}
FROM <file:dataset.csv>
WHERE {
BIND(<dataset> as ?Dataset)
BIND(<dataset/observation/rownum> as ?Observation)
#repeat for each column
BIND (<property_n> as ?Component_n_property)
BIND (<mappingformula>(<value_n> )as ?Value_n)
}

Peter J. Lawrence

inova8

Providing answers for users' information questions

Mobile: +1 330 631 3772 | Phone: +1 330 342 0582 | UK Cell:+44 795 563 2824 |Skype: PeterJLawrence

Email: peter.l...@inova8.com | Web: www.inova8.com

LinkedIn: http://www.linkedin.com/in/peterjohnlawrence

Richard Cyganiak

unread,

Feb 2, 2015, 4:41:19 PM2/2/15

to Peter Lawrence, ta...@googlegroups.com

Peter,

Thanks for explaining this. I understand the story and it makes sense to me. You said earlier: “The application I am envisaging would want to see TARQL as a SPARQL endpoint.” Can you describe where and how that happens in the story? Where is Tarql seen as a SPARQL endpoint?

I keep asking these questions to understand what should be added to Tarql to find a good compromise between supporting the use case and staying simple and flexible. I appreciate your patience!

Thanks,
Richard

> On 2 Feb 2015, at 12:12, Peter Lawrence <peter.l...@inova8.com> wrote:
>
> It might be better to describe the use-story.
> • A user wants to load another CSV into the triplestore, so they select the CSV.
> • The system responds with a display of the column names (header row) in the CSV, with perhaps one row of data
> • For each column of the CSV
> • the user selects the actual property to which the column will be mapped from an existing vocabulary.
> • the user enters the formula for mapping the column value, such as URI(CONCAT(http://corporate//sample/id/', ?<col 3>
> • Once complete the user can upload the CSV at which time the following happens:
> • The application generates the corresponding TARQL query/mapping (CONSTRUCT)
> • The TARQL is executed
> • The response is then loaded into the triplestore.

> <image.png>

Peter Lawrence

unread,

Feb 2, 2015, 5:05:16 PM2/2/15

to Richard Cyganiak, ta...@googlegroups.com

Richard

A user wants to load another CSV into the triplestore, so they select the CSV.
The system responds with a display of the column names (header row) in the CSV, with perhaps one row of data
For each column of the CSV

the user selects the actual property to which the column will be mapped from an existing vocabulary.
the user enters the formula for mapping the column value, such as URI(CONCAT(http://corporate//sample/id/', ?<col 3>

Once complete the user can upload the CSV at which time the following happens:

The application generates the corresponding TARQL query/mapping (CONSTRUCT)
The TARQL is executed
The response is then loaded into the triplestore.

If TARQL was an endpoint then 2~4 could be executed via that endpoint:

2: The client executes a query something like this against the TARQL endpoint:

SELECT ?column
FROM <file:dataset.csv>
WHERE {}

This would provide enough information to initialize the grid

3: Not really interacting with the TARQL endpoint, but another that is publishing the vocabulary.

4: After the client generates the appropriate SPARQL/TARQL it would execute it against the TARQL endpoint:

CONSTRUCT{
?Observation a qb:Observation ;
qb:dataSet ?Dataset ;
#repeat for each column
?Component_n_property ?Value_n .
}
FROM <file:dataset.csv>
WHERE {
BIND(<dataset> as ?Dataset)
BIND(<dataset/observation/rownum> as ?Observation)
#repeat for each column
BIND (<property_n> as ?Component_n_property)
BIND (<mappingformula>(<value_n> )as ?Value_n)
}

Since the intention would be to directly load this into another triple store, one could speculate on a combined insert and service call to TARQL endpoint (not syntactically correct)

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX prov: <http://www.w3.org/ns/prov#>

INSERT {

?Observation a qb:Observation ;
qb:dataSet ?Dataset ;
#repeat for each column
?Component_n_property ?Value_n .
}

WHERE {
SERVICE <TARQL Endpoint>{
FROM <file:dataset.csv>

BIND(<dataset> as ?Dataset)
BIND(<dataset/observation/rownum> as ?Observation)
#repeat for each column
BIND (<property_n> as ?Component_n_property)
BIND (<mappingformula>(<value_n> )as ?Value_n)
}
}

Does this help

Peter

Peter J. Lawrence

inova8

Providing answers for users' information questions

Mobile: +1 330 631 3772 | Phone: +1 330 342 0582 | UK Cell:+44 795 563 2824 |Skype: PeterJLawrence

Email: peter.l...@inova8.com | Web: www.inova8.com

LinkedIn: http://www.linkedin.com/in/peterjohnlawrence

Richard Cyganiak

unread,

Feb 2, 2015, 5:17:54 PM2/2/15

to Peter Lawrence, ta...@googlegroups.com

Hi Peter,

I think step 2 would already be possible by sending this query to the endpoint:

SELECT *
FROM <file:dataset.csv>
WHERE {}
LIMIT 1

This would return one row of data including all the column headings, and seems sufficient to support the user interaction you describe for steps 3 and 4.

Does this make sense?

The idea of combining a SPARQL Update query with a SERVICE call to the Tarql endpoint is quite interesting! I don’t think this will work with vanilla SPARQL 1.1 unfortunately, because SERVICE only accepts a basic graph pattern, so there isn’t anywhere to put the FROM <file.csv> clause. Maybe it works with a SELECT query as subquery.

I certainly like the idea of combining Tarql with a lightweight server component that provides a SPARQL endpoint!

Best,
Richard

> On 2 Feb 2015, at 17:04, Peter Lawrence <peter.l...@inova8.com> wrote:
>
> Richard

> • A user wants to load another CSV into the triplestore, so they select the CSV.

Peter Lawrence

unread,

Feb 2, 2015, 5:43:58 PM2/2/15

to Richard Cyganiak, ta...@googlegroups.com

Richard

<<Does this make sense?>>

Yes it does, especially as a sample row of data is inevitably required.

<<Maybe it works with a SELECT query as subquery.>>

The FROM <file.csv> would create a problem. However maybe one could do something with the service endpoint URL, as follows, which I think is legal SPARQL (at least ARQ)

INSERT {
?Observation a qb:Observation ;
qb:dataSet ?Dataset ;
#repeat for each column
?Component_n_property ?Value_n .
}
WHERE {

SERVICE<tarql#dataset.csv> {
SELECT ?Dataset ?Observation ?Component_n_property ?Value_n

WHERE {
BIND(<dataset> as ?Dataset)
BIND(<dataset/observation/rownum> as ?Observation)
#repeat for each column
BIND (<property_n> as ?Component_n_property)
BIND (<mappingformula>(<value_n> )as ?Value_n)

} LIMIT 10
}
}

Thanks

Peter

Peter J. Lawrence

inova8

Providing answers for users' information questions

Mobile: +1 330 631 3772 | Phone: +1 330 342 0582 | UK Cell:+44 795 563 2824 |Skype: PeterJLawrence

Email: peter.l...@inova8.com | Web: www.inova8.com

LinkedIn: http://www.linkedin.com/in/peterjohnlawrence

Richard Cyganiak

unread,

Feb 2, 2015, 9:12:32 PM2/2/15

to Peter Lawrence, ta...@googlegroups.com

Thanks Peter.

I’ve created two issues that speak to the use case you describe:

#39 Option to only show column headers and sample row(s)
https://github.com/tarql/tarql/issues/39

#40 Tarql Server
https://github.com/tarql/tarql/issues/40

Let’s split off any discussion that’s specific to these issues and continue it on GitHub. And pull requests would of course be most welcome :-)

All the best,
Richard

> To view this discussion on the web, visit https://groups.google.com/d/msgid/tarql/CAG%2BR6Wmhs8d1DYKm%3DwGvQSeQzm022jxh8HAHd0aQQ4VvWSjNPQ%40mail.gmail.com.

Reply all

Reply to author

Forward