Querying the CSV meta data with TARQL query

183 views
Skip to first unread message

peter.l...@inova8.com

unread,
Feb 1, 2015, 4:42:31 PM2/1/15
to ta...@googlegroups.com
Is it possible to extract the metadata of the CSV, in particular the columns of the CSV, with a query something like this: 

SELECT ?column ?dataType
FROM <file:dataset.csv>
WHERE {}

My guess is that such pseudo-values do not exists, so do you think it feasible to append something like the following VALUES clause into generated SPARQL?

SELECT ?column  ?dataType
FROM <file:dataset.csv>
WHERE {
...
VALUES(?column ?dataType){(<a> xsd:string)(<b> xsd:float)(<c> xsd:string) .. (<z> xsd:URI)}
}

Richard Cyganiak

unread,
Feb 1, 2015, 5:41:35 PM2/1/15
to peter.l...@inova8.com, ta...@googlegroups.com
Hi Peter,

CSV doesn't have datatypes. Everything in CSV is of the same type, string. One can certainly try to auto-detect the type, but this can often go wrong. 1,000 may be the integer 1000 or the decimal 1.0, depending on the locale were the CSV was produced. Reliably detecting things like say dates or phone numbers is hard (and CSV doesn't guarantee that everything in a column has the same type).

For these reasons, Tarql doesn't do auto-detection of datatypes. If a Tarql user wants typed literals in the output, they have to specify the target datatype in the mapping.

So, I'm afraid Tarql doesn't have the information you want.

Best,
Richard
--
You received this message because you are subscribed to the Google Groups "Tarql" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tarql+un...@googlegroups.com.
To post to this group, send email to ta...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/tarql/e2c19891-dac6-4b8e-87ce-f0efa97eaf6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

peter.l...@inova8.com

unread,
Feb 1, 2015, 5:46:53 PM2/1/15
to ta...@googlegroups.com, peter.l...@inova8.com
Understood, I got overly ambitious suggesting the types. However the column names would be useful alone. 

Consider the use-case where tarql is part of a CSV uploader: the first thing a user would want to see are the columns of the CSV. Hence the motive for my question.

Richard Cyganiak

unread,
Feb 2, 2015, 9:22:11 AM2/2/15
to peter.l...@inova8.com, ta...@googlegroups.com
Peter,

Would a mode where Tarql simply returns the first few rows of the CSV file, without the need to provide a mapping file, address that use case?

It could be the same as running a query like this:

SELECT * { } LIMIT 5

Best,
Richard

Peter Lawrence

unread,
Feb 2, 2015, 10:10:04 AM2/2/15
to Richard Cyganiak, ta...@googlegroups.com
Richard

It depends. If the first few rows were returned as a SPARQL select response then it would be useful because the application I am envisaging would want to see TARQL as a SPARQL endpoint. Would that be the case?

Thanks for your attention,

Peter

Peter J. Lawrence
inova8
Providing answers for users' information questions
Mobile: +1 330 631 3772Phone: +1 330 342 0582 | UK Cell:+44 795 563 2824 |Skype: PeterJLawrence

Richard Cyganiak

unread,
Feb 2, 2015, 11:57:23 AM2/2/15
to Peter Lawrence, ta...@googlegroups.com

> On 2 Feb 2015, at 10:09, Peter Lawrence <peter.l...@inova8.com> wrote:
>
> Richard
>
> It depends. If the first few rows were returned as a SPARQL select response then it would be useful because the application I am envisaging would want to see TARQL as a SPARQL endpoint. Would that be the case?

I’m trying to understand the model you have in mind. What would be the query that is sent to the SPARQL endpoint by the client? Would that query be the Tarql mapping? Or would the Tarql mapping “live” on the server, and the endpoint gives access to the result of applying the mapping to the input CSV? Or something else?

Thanks,
Richard

Peter Lawrence

unread,
Feb 2, 2015, 12:13:37 PM2/2/15
to Richard Cyganiak, ta...@googlegroups.com
It might be better to describe the use-story.
  • A user wants to load another CSV into the triplestore, so they select the CSV.
  • The system responds with a display of the column names (header row)  in the CSV, with perhaps one row of data
  • For each column of the CSV 
    • the user selects the actual property to which the column will be mapped from an existing vocabulary. 
    • the user enters the formula for mapping the column value, such as URI(CONCAT(http://corporate//sample/id/', ?<col 3> 
  • Once complete the user can upload the CSV at which time the following happens:
    • The application generates the corresponding TARQL query/mapping (CONSTRUCT)
    • The TARQL is executed
    • The response is then loaded into the triplestore.
Inline image 1
Does that help?

Thanks

Peter

Peter J. Lawrence
inova8
Providing answers for users' information questions
Mobile: +1 330 631 3772Phone: +1 330 342 0582 | UK Cell:+44 795 563 2824 |Skype: PeterJLawrence

Peter Lawrence

unread,
Feb 2, 2015, 12:16:23 PM2/2/15
to Richard Cyganiak, ta...@googlegroups.com
BTW the generated mapping would always be of this form:

CONSTRUCT{

?Observation a qb:Observation ;

                qb:dataSet  ?Dataset ;

#repeat for each column

               ?Component_n_property   ?Value_n .

}

FROM <file:dataset.csv>

WHERE {

 BIND(<dataset> as ?Dataset)

 BIND(<dataset/observation/rownum> as ?Observation)

#repeat for each column

 BIND (<property_n> as ?Component_n_property)

 BIND (<mappingformula>(<value_n> )as ?Value_n)

}


Peter J. Lawrence
inova8
Providing answers for users' information questions
Mobile: +1 330 631 3772Phone: +1 330 342 0582 | UK Cell:+44 795 563 2824 |Skype: PeterJLawrence

Richard Cyganiak

unread,
Feb 2, 2015, 4:41:19 PM2/2/15
to Peter Lawrence, ta...@googlegroups.com
Peter,

Thanks for explaining this. I understand the story and it makes sense to me. You said earlier: “The application I am envisaging would want to see TARQL as a SPARQL endpoint.” Can you describe where and how that happens in the story? Where is Tarql seen as a SPARQL endpoint?

I keep asking these questions to understand what should be added to Tarql to find a good compromise between supporting the use case and staying simple and flexible. I appreciate your patience!

Thanks,
Richard


> On 2 Feb 2015, at 12:12, Peter Lawrence <peter.l...@inova8.com> wrote:
>
> It might be better to describe the use-story.
> • A user wants to load another CSV into the triplestore, so they select the CSV.
> • The system responds with a display of the column names (header row) in the CSV, with perhaps one row of data
> • For each column of the CSV
> • the user selects the actual property to which the column will be mapped from an existing vocabulary.
> • the user enters the formula for mapping the column value, such as URI(CONCAT(http://corporate//sample/id/', ?<col 3>
> • Once complete the user can upload the CSV at which time the following happens:
> • The application generates the corresponding TARQL query/mapping (CONSTRUCT)
> • The TARQL is executed
> • The response is then loaded into the triplestore.
> <image.png>

Peter Lawrence

unread,
Feb 2, 2015, 5:05:16 PM2/2/15
to Richard Cyganiak, ta...@googlegroups.com
Richard

  1. A user wants to load another CSV into the triplestore, so they select the CSV.
  2. The system responds with a display of the column names (header row)  in the CSV, with perhaps one row of data
  3. For each column of the CSV 
    1. the user selects the actual property to which the column will be mapped from an existing vocabulary. 
    2. the user enters the formula for mapping the column value, such as URI(CONCAT(http://corporate//sample/id/', ?<col 3> 
  4. Once complete the user can upload the CSV at which time the following happens:
    1. The application generates the corresponding TARQL query/mapping (CONSTRUCT)
    2. The TARQL is executed
    3. The response is then loaded into the triplestore.
If TARQL was an endpoint then 2~4 could be executed via that endpoint:

2:  The client executes a query something like this against the TARQL endpoint:
SELECT ?column
FROM <file:dataset.csv>
WHERE {}
This would provide enough information to initialize the grid

3: Not really interacting with the TARQL endpoint, but another that is publishing the vocabulary.

4: After the client generates the appropriate SPARQL/TARQL it would execute it against the TARQL endpoint:

CONSTRUCT{

?Observation a qb:Observation ;

                qb:dataSet  ?Dataset ;

#repeat for each column

               ?Component_n_property   ?Value_n .

}

FROM <file:dataset.csv>

WHERE {

 BIND(<dataset> as ?Dataset)

 BIND(<dataset/observation/rownum> as ?Observation)

#repeat for each column

 BIND (<property_n> as ?Component_n_property)

 BIND (<mappingformula>(<value_n> )as ?Value_n)

}


Since the intention would be to directly load this into another triple store, one could speculate on a combined insert and service call to TARQL endpoint (not syntactically correct)

INSERT {
?Observation a qb:Observation ;
                qb:dataSet  ?Dataset ;
#repeat for each column
               ?Component_n_property   ?Value_n .
}
WHERE {
   SERVICE <TARQL Endpoint>{
  FROM <file:dataset.csv>
   BIND(<dataset> as ?Dataset)
   BIND(<dataset/observation/rownum> as ?Observation)
 #repeat for each column
   BIND (<property_n> as ?Component_n_property)
   BIND (<mappingformula>(<value_n> )as ?Value_n)
 }
}

Does this help

Peter

Peter J. Lawrence
inova8
Providing answers for users' information questions
Mobile: +1 330 631 3772Phone: +1 330 342 0582 | UK Cell:+44 795 563 2824 |Skype: PeterJLawrence

Richard Cyganiak

unread,
Feb 2, 2015, 5:17:54 PM2/2/15
to Peter Lawrence, ta...@googlegroups.com
Hi Peter,

I think step 2 would already be possible by sending this query to the endpoint:

SELECT *
FROM <file:dataset.csv>
WHERE {}
LIMIT 1

This would return one row of data including all the column headings, and seems sufficient to support the user interaction you describe for steps 3 and 4.

Does this make sense?

The idea of combining a SPARQL Update query with a SERVICE call to the Tarql endpoint is quite interesting! I don’t think this will work with vanilla SPARQL 1.1 unfortunately, because SERVICE only accepts a basic graph pattern, so there isn’t anywhere to put the FROM <file.csv> clause. Maybe it works with a SELECT query as subquery.

I certainly like the idea of combining Tarql with a lightweight server component that provides a SPARQL endpoint!

Best,
Richard



> On 2 Feb 2015, at 17:04, Peter Lawrence <peter.l...@inova8.com> wrote:
>
> Richard
> • A user wants to load another CSV into the triplestore, so they select the CSV.

Peter Lawrence

unread,
Feb 2, 2015, 5:43:58 PM2/2/15
to Richard Cyganiak, ta...@googlegroups.com
Richard

<<Does this make sense?>>

Yes it does, especially as a sample row of data is inevitably required.

<<Maybe it works with a SELECT query as subquery.>>

The FROM <file.csv> would create a problem. However maybe one could do something with the service endpoint URL, as follows, which I think is legal SPARQL (at least ARQ)

INSERT {
?Observation a qb:Observation ;
                qb:dataSet  ?Dataset ;
#repeat for each column
               ?Component_n_property   ?Value_n .
}
WHERE {
SERVICE<tarql#dataset.csv>   {
    SELECT ?Dataset ?Observation ?Component_n_property ?Value_n
    WHERE  {
        BIND(<dataset> as ?Dataset)
        BIND(<dataset/observation/rownum> as ?Observation)
        #repeat for each column
          BIND (<property_n> as ?Component_n_property)
          BIND (<mappingformula>(<value_n> )as ?Value_n)
    } LIMIT 10
  }
}


Thanks

Peter

Peter J. Lawrence
inova8
Providing answers for users' information questions
Mobile: +1 330 631 3772Phone: +1 330 342 0582 | UK Cell:+44 795 563 2824 |Skype: PeterJLawrence

Richard Cyganiak

unread,
Feb 2, 2015, 9:12:32 PM2/2/15
to Peter Lawrence, ta...@googlegroups.com
Thanks Peter.

I’ve created two issues that speak to the use case you describe:

#39 Option to only show column headers and sample row(s)
https://github.com/tarql/tarql/issues/39

#40 Tarql Server
https://github.com/tarql/tarql/issues/40

Let’s split off any discussion that’s specific to these issues and continue it on GitHub. And pull requests would of course be most welcome :-)

All the best,
Richard
> To view this discussion on the web, visit https://groups.google.com/d/msgid/tarql/CAG%2BR6Wmhs8d1DYKm%3DwGvQSeQzm022jxh8HAHd0aQQ4VvWSjNPQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages