Better ways to store and search tabular data?

24 views
Skip to first unread message

Chris Stubben

unread,
Sep 3, 2014, 1:33:03 PM9/3/14
to plos-api-...@googlegroups.com
I often return tables while searching the PLoS api, for example a search for "YPO0986" returns this confusing snippet

http://api.plos.org/search?q=body:YPO0986&fl=id,publication_date,title&rows=50&hl=true&hl.fl=body&hl.snippets=100&hl.fragsize=300&api_key=KEY

proteins containing putative Tat motifs.      Motif *  Protein  Product      SRRSFLQ  TauA  taurine transporter substrate binding subunit **    SRRSFLQ  SufI  repressor protein for FtsI **    TRRKFLM  <em>YPO0986</em>  hypothetical protein **    SRRLALL  YPO2150  LysR family transcriptional regulator    SRREFIQ


I was just wondering if you have ever considered other ways to store tables, perhaps in a separate multivalued field with tab or other delimiters so users could read matching tables into data.frames in R or other applications?  Most of the times, I just want the row containing the matching value, so maybe another  multivalued field with rows containing column name and cell value pairs like...

Table 5 row 3 of 9; Motif* = TRRKFLM; Protein = <em>YPO0986</em>; Product = hypothetical protein**

There seems to be other alternatives to the unstructured text blob that if incorporated into the PloS schema could be very useful to developers and search applications.

Thanks,
Chris

Martin Fenner

unread,
Sep 3, 2014, 1:39:54 PM9/3/14
to plos-api-...@googlegroups.com
Chris, you can specify the format of the response with the „&wt=„ parameter. You can for example as for JSON or CSV, e.g.
http://api.plos.org/search?q=body:YPO0986&fl=id,publication_date,title&rows=50&hl=true&hl.fl=body&hl.snippets=100&hl.fragsize=300&api_key=KEY&wt=csv

Making the tables within papers machine-readable is a different story, and will not be possible in the near-term. Your best bet is to use the XML of the article and parse the table.

Best, Martin
--
You received this message because you are subscribed to the Google Groups "PLOS API Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plos-api-develo...@googlegroups.com<mailto:plos-api-develo...@googlegroups.com>.
For more options, visit https://groups.google.com/d/optout.

Chris Stubben

unread,
Sep 3, 2014, 5:59:40 PM9/3/14
to plos-api-...@googlegroups.com
Martin,
Thanks, I think highlighting only works with wt=xml and  I was hoping to use the PloS api directly to return text snippets  for some microbial genes, but I don't think it will work if text and table blobs are mixed into the same field.  Here's another example using tauA below that  returns 8 papers and 10 snippets, but 5 of those are mostly unreadable.

http://api.plos.org/search?q=body:tauA&fl=id,publication_date,title&rows=50&hl=true&hl.fl=body&hl.snippets=100&hl.fragsize=400&api_key=KEY

Chris
Reply all
Reply to author
Forward
0 new messages