rdfa/microdata branch

23 views
Skip to first unread message

Gunnar Aastrand Grimnes

unread,
Nov 6, 2012, 10:08:26 AM11/6/12
to rdfli...@googlegroups.com, Dan Brickley
Hi all,

I've just merged the rdfa branch into master! (and the branch has been deleted )

Ivan, to make it work I had to change pyrdfa to have one more relative import:

https://github.com/RDFLib/rdflib/commit/604ea435afbe98e4366c6afd798ec5a906eea2d3

I've added an explicit dependency on html5lib for py2.x and I've added a check for html5lib in structuredparsers.py:

https://github.com/RDFLib/rdflib/commit/8ef695df1e41e0a4d23df7f47a0de1f7796c4b23


Currently the old rdfa tests are partially run, and some of them fail.
I assume there is a rdfa1.1 test-case collection - does anyone feel inspired to add this to rdflib/pyrdfa? Ditto for microdata?

Enjoy your new rdflib with rdfa! :)

Cheers,

- Gunnar

--
http://gromgull.net

Ivan Herman

unread,
Nov 6, 2012, 2:21:31 PM11/6/12
to Gunnar Aastrand Grimnes, Dan Brickley, rdfli...@googlegroups.com
Wow. That is a major step... I have refreshed the repository on my machine so, from this point on, if I do any changes, that would go directly into the main branch!

Thanks Gunnar

Ivan
> --
> You received this message because you are subscribed to the Google Groups "rdflib-dev" group.
> To post to this group, send email to rdfli...@googlegroups.com.
> To unsubscribe from this group, send email to rdflib-dev+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>


----
Ivan Herman
4, rue Beauvallon, clos St Joseph
13090 Aix-en-Provence
France
http://www.ivan-herman.net

Graham Higgins

unread,
Jan 24, 2013, 11:28:52 PM1/24/13
to rdfli...@googlegroups.com
Hi all,

On Tue, 2012-11-06 at 16:08 +0100, Gunnar Aastrand Grimnes wrote:
> Enjoy your new rdflib with rdfa! :)


Here's some sideband data that reinforces the utility of an
RDFa/microformats parser package in RDFLib by illustrating an explosive
growth of RDFa and microdata use according to the Web Data Commons.

FTR: "The Web Data Commons project extracts all Microformat, Microdata
and RDFa data from the Common Crawl web corpus, the largest and most
up-to-data web corpus that is currently available to the public"

Triples retrieved from microformat hcard (for a comparison) vs rdfa vs
microdata for 2010 [1] and 2012 [2]

total triples:
2010: 5,193,276,058
2012: 7,350,953,995 (+41.55%)

triples from hcard:
2010: 3,226,066,019
2012: 3,547,824,107 (+9.97%)

triples from rdfa:
2010: 293,542,991
2012: 1,079,175,202 (+267.64%)

triples from microdata:
2010: 1,197,115
2012: 1,488,063,426 (+124204.13%)


[1] http://webdatacommons.org/2010-09/stats/stats.html
[2] http://webdatacommons.org/2012-08/stats/stats.html

Cheers,

--
Graham Higgins

http://bel-epa.com/gjh/
signature.asc
Reply all
Reply to author
Forward
0 new messages