Reading RDF file in OR

11 views
Skip to first unread message

Parthasarathi Mukhopadhyay

unread,
May 3, 2022, 3:07:21ā€ÆPM5/3/22
to openr...@googlegroups.com
Dear all

I'm trying to open the AGRIS rdf file (https://agris.fao.org/agris_ods/) in OR 3.5.2 after downloading it.

An example of record structure is like this -

<bibo:Article rdf:about="http://agris.fao.org/aos/records/AG9400130">
<dct:identifier>AG9400130</dct:identifier>
<dct:dateSubmitted>1994</dct:dateSubmitted>
<dct:source rdf:resource="http://aims.fao.org/node/122475"/>
<dct:title xml:lang="eng"><![CDATA[Central Marketing Corporation (CMC): a strategic plan for the 1990's]]></dct:title>
<dct:creator>
<foaf:Person>
<foaf:name><![CDATA[Weston, L.O.]]></foaf:name>
</foaf:Person>
</dct:creator>
<dct:creator>
<foaf:Organization>
<foaf:name><![CDATA["Ministry of Agriculture, St. John's (Antigua and Barbuda)"]]></foaf:name>
</foaf:Organization>
</dct:creator>
<dct:publisher>
<foaf:Organization>
<foaf:name><![CDATA[MOA]]></foaf:name>
</foaf:Organization>
</dct:publisher>
<dct:language><![CDATA[eng]]></dct:language>
<bibo:authorList><rdf:Seq>
<rdf:li><![CDATA[Weston, L.O.]]></rdf:li>
</rdf:Seq></bibo:authorList>
<dct:type><![CDATA[Summary]]></dct:type>
<dc:subject><![CDATA[marketing boards]]></dc:subject>
<dc:subject><![CDATA[cooperative marketing]]></dc:subject>
<dc:subject><![CDATA[marketing techniques]]></dc:subject>
<dc:subject><![CDATA[antigua and barbuda]]></dc:subject>
<dc:subject><![CDATA[office de commercialisation]]></dc:subject>
<dc:subject><![CDATA[vente en cooperation]]></dc:subject>
<dc:subject><![CDATA[technique de vente]]></dc:subject>
<dc:subject><![CDATA[antigua et barbuda]]></dc:subject>
<dc:subject><![CDATA[juntas de comercializacion]]></dc:subject>
<dc:subject><![CDATA[comercializacion cooperativa]]></dc:subject>
<dc:subject><![CDATA[tecnicas de mercadeo]]></dc:subject>
<dc:subject><![CDATA[antigua y barbuda]]></dc:subject>
<dct:extent><![CDATA[39 p.]]></dct:extent>
<dct:description><![CDATA[Summary (En)]]></dct:description>
<dct:description><![CDATA[Appendices p.38-39]]></dct:description>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_505"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_7848"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_4625"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_1860"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_4621"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_4620"/>
</bibo:Article rdf:about="http://agris.fao.org/aos/records/AG9400130">
<bibo:Article rdf:about="http://agris.fao.org/aos/records/AG9400134">
<dct:identifier>AG9400134</dct:identifier>
<dct:dateSubmitted>1994</dct:dateSubmitted>
<dct:source rdf:resource="http://aims.fao.org/node/122475"/>
<dct:title xml:lang="eng"><![CDATA[Standard package of inputs/outputs for selected crops]]></dct:title>
<dct:creator>
<foaf:Person>
<foaf:name><![CDATA[Ameen, I.]]></foaf:name>
</foaf:Person>
</dct:creator>
<dct:creator>
<foaf:Organization>
<foaf:name><![CDATA["Caribbean Agricultural Research and Development Inst., St. John's (Antigua and Barbuda)"]]></foaf:name>
</foaf:Organization>
</dct:creator>
<dct:publisher>
<foaf:Organization>
<foaf:name><![CDATA[CARDI]]></foaf:name>
</foaf:Organization>
</dct:publisher>
<dct:language><![CDATA[eng]]></dct:language>
<bibo:authorList><rdf:Seq>
<rdf:li><![CDATA[Ameen, I.]]></rdf:li>
</rdf:Seq></bibo:authorList>
<dct:type><![CDATA[Numerical data]]></dct:type>
<dc:subject><![CDATA[crop yield]]></dc:subject>
<dc:subject><![CDATA[income]]></dc:subject>
<dc:subject><![CDATA[expenditure]]></dc:subject>
<dc:subject><![CDATA[input output analysis]]></dc:subject>
<dc:subject><![CDATA[vegetables]]></dc:subject>
<dc:subject><![CDATA[cabbages]]></dc:subject>
<dc:subject><![CDATA[carrots]]></dc:subject>
<dc:subject><![CDATA[cucumbers]]></dc:subject>
<dc:subject><![CDATA[onions]]></dc:subject>
<dc:subject><![CDATA[okras]]></dc:subject>
<dc:subject><![CDATA[rendement des cultures]]></dc:subject>
<dc:subject><![CDATA[revenu]]></dc:subject>
<dc:subject><![CDATA[depense]]></dc:subject>
<dc:subject><![CDATA[analyse input output]]></dc:subject>
<dc:subject><![CDATA[legume]]></dc:subject>
<dc:subject><![CDATA[chou pomme]]></dc:subject>
<dc:subject><![CDATA[carotte]]></dc:subject>
<dc:subject><![CDATA[concombre]]></dc:subject>
<dc:subject><![CDATA[oignon]]></dc:subject>
<dc:subject><![CDATA[gombo]]></dc:subject>
<dc:subject><![CDATA[rendimiento de cultivos]]></dc:subject>
<dc:subject><![CDATA[renta]]></dc:subject>
<dc:subject><![CDATA[gastos]]></dc:subject>
<dc:subject><![CDATA[analisis de insumo-producto]]></dc:subject>
<dc:subject><![CDATA[hortalizas]]></dc:subject>
<dc:subject><![CDATA[repollo]]></dc:subject>
<dc:subject><![CDATA[zanahoria]]></dc:subject>
<dc:subject><![CDATA[pepino]]></dc:subject>
<dc:subject><![CDATA[cebolla]]></dc:subject>
<dc:subject><![CDATA[ocra]]></dc:subject>
<dct:extent><![CDATA[17 p.]]></dct:extent>
<dct:description><![CDATA[Summary (En)]]></dct:description>
<dct:description><![CDATA[3 ill; 12 tables]]></dct:description>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_10176"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_1173"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_12920"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_2757"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_8174"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_12934"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_26808"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_3820"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_9640"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_3884"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_10195"/>
<dct:subject rdf:resource="http://aims.fao.org/aos/agrovoc/c_2981"/>
</bibo:Article rdf:about="http://agris.fao.org/aos/records/AG9400134">


The issue is that OR has reported an error during parsing like this -

The end-tag for element type "bibo:Article" must end with a '>' delimiter.

When I manually edit </bibo:Article rdf:about="http://agris.fao.org/aos/records/AG9400134"> to </bibo:Article> [after stripping rdf:about="http://agris.fao.org/aos/records/AG9400134" part] at the end of each record of a small set then it can read the file happily and can create the project.

The entire set has more than 100K+ records, and automatic replacement of </bibo:Article with </bibo:Article> is not going to work as each record differs in unique record id like AG9400134 in the example given above.

What is the way out? Plz guide.

Regards


Parthasarathi Mukhopadhyay

Professor, Department of Library and Information Science,

University of Kalyani, Kalyani - 741 235 (WB), India

Owen Stephens

unread,
May 4, 2022, 9:52:18ā€ÆAM5/4/22
to OpenRefine
Unfortunately that's not valid XML - so the best outcome here would be to get the data set publisher to fix.

I'd suggest doing a search/replaceĀ  using a tool that can use regular expressions to be able to fix all instances of attributes in the closing tags - e.g. search for regular expression like:
<\/bibo:Article rdf:about="http:\/\/agris.fao.org\/aos\/records\/[^"]+">
and replace with
</bibo:Article>

There are lots of tools you can do this with - either a good text editor or via a command line tool like `sed`

That's the only option I can think of

Owen

Parthasarathi Mukhopadhyay

unread,
May 4, 2022, 11:41:51ā€ÆAM5/4/22
to openr...@googlegroups.com
Thanks Owen, it has worked as advised in the gedit editor.

Thanks and regards

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/0b58a4a3-a0d1-4ab3-a649-83cbaa1fbc5en%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages