real world examples of OSF

tinflute

unread,

Mar 6, 2012, 11:46:07 AM3/6/12

to Open Semantic Framework

Hi, I just recently discovered OSF and I'm wondering if there are any
real-world examples of OSF implementations
out on the web that I could look at. I've clicked through the Citizen
Dan site which is interesting, but doesn't tell me a lot about how
others are trying to implement this rich set of tools.
If anyone is willing to share links to their projects, I'd be very
interested to see what's out there.
Cheers
Ted in Montreal

shep husted

unread,

Mar 6, 2012, 12:12:54 PM3/6/12

to open-semant...@googlegroups.com

have you built it? I have it going but have not imported any data - concentrating on semantic mediawiki...
keep me posted - i think just have to experiment - it is possible to install it now pretty easily anyways..
shep

--
Best Regards,

Shep Husted
opensourceservers.com
opensourcenetworks.com
engineeredcomputer.com
1-207-409-4038
809 congress st. #7
portland, maine
04102

Frederick Giasson

unread,

Mar 6, 2012, 12:51:21 PM3/6/12

to open-semant...@googlegroups.com

Hi Ted!

> Hi, I just recently discovered OSF and I'm wondering if there are any
> real-world examples of OSF implementations
> out on the web that I could look at. I've clicked through the Citizen
> Dan site which is interesting, but doesn't tell me a lot about how
> others are trying to implement this rich set of tools.

Sure. There are some that we know of, but there are probably others that
we don't too.

(1) Citizen DAN Demo [1]
(2) MyPEG (another Citizen DAN instance) [2]
(3) Mike's Sweet Tools List [3]
(4) UMBEL's web portal [4]
(5) Volkswagen UK's search engine [5][6][7]

So, as demonstrated with these demos, OSF can be used for multiple
different kind of websites: it can be used for open government
initiatives by using the Citizen DAN principles; it can be used as an
ontology web portal like with UMBEL; it can be used as a RDF search
engine like the Sweet Tools and VW usecases.

Also know that you can arrange to get some private (more elaborated)
demos of Citizen DAN upon request.

Basically, OSF is used to ingest, manage and expose unstructured,
semi-structured and fully structured data. And all functionalities are
accessible as a web service endpoint which makes its usage quite easy on
all kind of different framework (PHP, Ajax, Flash, Mobile Applications,
etc, etc, etc).

[1] http://demo.citizen-dan.org
[2] http://mypeg.ca
[3] http://www.mkbergman.com/sweet-tools/
[4] http://umbel.org
[5]
http://fgiasson.com/blog/index.php/2011/10/11/volkswagens-rdf-data-management-workflow/
[6]
http://fgiasson.com/blog/index.php/2011/12/21/volkswagen-uks-search-engine-powered-by-structwsf/
[7]
http://www.w3.org/blog/SW/2011/10/10/new-sw-use-case-by-tribal-ddb-and-volkswagen-uk/

Hope it helps!

Thanks,

Take care,

Fred

shep husted

unread,

Mar 6, 2012, 2:01:59 PM3/6/12

to open-semant...@googlegroups.com

the real trick is to import all the relevant datasets - maybe clone a working version and start up some beta machines and see how far you can get. eventually you will start to gain ground. some easy how tos on a wiki would be nice - something like howto forge ---most of the stuff is documented - if you google you will end up on the right page on osf and be able to import the various formats they support.

mediawiki is supporting, or i should say starting to support virtuoso so that may make a difference. I think it would be fair to say that rapid development is on going and getting even more active and productive - see extension matrix of mediawiki.

good luck and keep us posted.

Ted Strauss

unread,

Mar 7, 2012, 10:26:43 AM3/7/12

to open-semant...@googlegroups.com

Thanks for the suggestions guys.

I will explore all these suggestions to figure out what combination of technologies is right for my project.

Cheers

ted

Frederick Giasson

unread,

Mar 7, 2012, 10:34:45 AM3/7/12

to open-semant...@googlegroups.com

Hi Ted!

> Thanks for the suggestions guys.
> I will explore all these suggestions to figure out what combination of
> technologies is right for my project.

Ok good. Don't hesitate to send more questions on this mailing list so
that we can help you to figure out if this is the right technology stack
to use or not.

Thanks,

Take care,

Fred

shep husted

unread,

Mar 7, 2012, 7:37:25 PM3/7/12

to open-semant...@googlegroups.com

I am processing wiki dump here on mediawiki 1.18 and smw 1.7 - rebuilding all tables and links and getting the red links to turn blue...it is about 7.5 million pages, I need to work on getting tags to work and working with templates and categories - the good news is that there is a lot to work with and it is a good starting point. using virtualization really helps with pushing things forward since restting vm to a previous snapshot is easy - it helps you get over the fear of failure and has made a big difference imo.
the only thing I have to complain about is how things slow down when you are importing lots of data I have 4 cpu and lots of ram and it is running at 26% and about 550 megs of ram - maybe ssd or faster cpu or pci-e ssd.
I did attempt to run runJobs,php with switch for 2 cpu and also by just using multiple shells - it would max out my cpus for a while until on of the jobs would drop out - I may have to do some tweaks to system settings or php.ini. I am running smp kernel - machine is ubuntu based. I think i just need to be patient but would appreciate some shop talk about running semantic apps on servers with big data is there some trick to unlock concurrency/multi threading or speed up php? I did load php-apc and am running a proxy too for local, no reverse proxy yet. php-apc sems to work pretty good- it speeds thing up lots after initial load. I can't seem to figure out where the botttleneck is.
thanks for the mailing list

Frederick Giasson

unread,

Mar 7, 2012, 9:50:37 PM3/7/12

to open-semant...@googlegroups.com

Hi!

> I am processing wiki dump here on mediawiki 1.18 and smw 1.7 -
> rebuilding all tables and links and getting the red links to turn
> blue...it is about 7.5 million pages, I need to work on getting tags
> to work and working with templates and categories - the good news is
> that there is a lot to work with and it is a good starting point.
> using virtualization really helps with pushing things forward since
> restting vm to a previous snapshot is easy - it helps you get over the
> fear of failure and has made a big difference imo.
> the only thing I have to complain about is how things slow down when
> you are importing lots of data I have 4 cpu and lots of ram and it is
> running at 26% and about 550 megs of ram - maybe ssd or faster cpu or
> pci-e ssd.
> I did attempt to run runJobs,php with switch for 2 cpu and also by
> just using multiple shells - it would max out my cpus for a while
> until on of the jobs would drop out - I may have to do some tweaks to
> system settings or php.ini. I am running smp kernel - machine is
> ubuntu based. I think i just need to be patient but would appreciate
> some shop talk about running semantic apps on servers with big data is
> there some trick to unlock concurrency/multi threading or speed up
> php? I did load php-apc and am running a proxy too for local, no
> reverse proxy yet. php-apc sems to work pretty good- it speeds thing
> up lots after initial load. I can't seem to figure out where the
> botttleneck is.
> thanks for the mailing list

This is a nice usecase! However, I will need more information about the
workflow you currently have in place. This is certainly a special
usecase, and there are many ways to improve it. The thing that will
probably the most impact is the workflow you will use to ingest all that
data.

First, let me ask you a few questions:

(1) These 7.5 million pages get described in RDF using some of your own
procedure? Is Scones involved in this?

(2) What are the structWSF endpoints currently involved? (Crud: Create,
Scones, Dataset: Create, etc, etc) So, what is the workflow you
currently have in place?

(3) What are the endpoint you are planning to use or to expose to the
public?

(4) What is the general size of the records to index (size in terms of
number of triples; and if you have plenty of textual data to index)

(5) If you are using Scones, what is the size of the ontology you are
using for tagging the documents (size in terms of the number of
classes/properties/named entities)

This said, there are many different strategies that can be used
depending how you answer these questions. Tweaking the memory available,
using multi-threading and the like is one side of the coin, but the
workflow is the other side. The first step is certainly to put the
proper workflow in place, and then to tweak accordingly.

By default, structWSF setuped and used to be able to cope with all
usecases; however, this ability has a price: speed. So, once we know our
usecase, we can use structWSF in different ways (using different
parameters; particularly for the Crud Create endpoint) to improve
(dramatically?) the processing time. So, this is why defining the
workflow is the real logical and essential step. We have to keep in mind
that structWSF is a API layer that manage multiple, and all kind of
different underlying system (such as Solr, Virtuoso, GATE, OWLAPI, etc).

Maybe a first tiny thing to do would certainly to upgrade your structWSF
instance to 1.0a94. If it is not already done, and if you have the
version 1.0a92 and above, then use the structWSF Upgrader [1]

Once we have a better understanding of your workflow, we will be able to
point you to different ways to define your workflow to improve the
overall process.

[1] https://github.com/structureddynamics/structWSF-Upgrader

Thanks,

Take care,

Fred

Reply all

Reply to author

Forward