> Hi, I just recently discovered OSF and I'm wondering if there are any
> real-world examples of OSF implementations
> out on the web that I could look at. I've clicked through the Citizen
> Dan site which is interesting, but doesn't tell me a lot about how
> others are trying to implement this rich set of tools.
Sure. There are some that we know of, but there are probably others that
we don't too.
(1) Citizen DAN Demo [1]
(2) MyPEG (another Citizen DAN instance) [2]
(3) Mike's Sweet Tools List [3]
(4) UMBEL's web portal [4]
(5) Volkswagen UK's search engine [5][6][7]
So, as demonstrated with these demos, OSF can be used for multiple
different kind of websites: it can be used for open government
initiatives by using the Citizen DAN principles; it can be used as an
ontology web portal like with UMBEL; it can be used as a RDF search
engine like the Sweet Tools and VW usecases.
Also know that you can arrange to get some private (more elaborated)
demos of Citizen DAN upon request.
Basically, OSF is used to ingest, manage and expose unstructured,
semi-structured and fully structured data. And all functionalities are
accessible as a web service endpoint which makes its usage quite easy on
all kind of different framework (PHP, Ajax, Flash, Mobile Applications,
etc, etc, etc).
[1] http://demo.citizen-dan.org
[2] http://mypeg.ca
[3] http://www.mkbergman.com/sweet-tools/
[4] http://umbel.org
[5]
http://fgiasson.com/blog/index.php/2011/10/11/volkswagens-rdf-data-management-workflow/
[6]
http://fgiasson.com/blog/index.php/2011/12/21/volkswagen-uks-search-engine-powered-by-structwsf/
[7]
http://www.w3.org/blog/SW/2011/10/10/new-sw-use-case-by-tribal-ddb-and-volkswagen-uk/
Hope it helps!
Thanks,
Take care,
Fred
> Thanks for the suggestions guys.
> I will explore all these suggestions to figure out what combination of
> technologies is right for my project.
Ok good. Don't hesitate to send more questions on this mailing list so
that we can help you to figure out if this is the right technology stack
to use or not.
Thanks,
Take care,
Fred
> I am processing wiki dump here on mediawiki 1.18 and smw 1.7 -
> rebuilding all tables and links and getting the red links to turn
> blue...it is about 7.5 million pages, I need to work on getting tags
> to work and working with templates and categories - the good news is
> that there is a lot to work with and it is a good starting point.
> using virtualization really helps with pushing things forward since
> restting vm to a previous snapshot is easy - it helps you get over the
> fear of failure and has made a big difference imo.
> the only thing I have to complain about is how things slow down when
> you are importing lots of data I have 4 cpu and lots of ram and it is
> running at 26% and about 550 megs of ram - maybe ssd or faster cpu or
> pci-e ssd.
> I did attempt to run runJobs,php with switch for 2 cpu and also by
> just using multiple shells - it would max out my cpus for a while
> until on of the jobs would drop out - I may have to do some tweaks to
> system settings or php.ini. I am running smp kernel - machine is
> ubuntu based. I think i just need to be patient but would appreciate
> some shop talk about running semantic apps on servers with big data is
> there some trick to unlock concurrency/multi threading or speed up
> php? I did load php-apc and am running a proxy too for local, no
> reverse proxy yet. php-apc sems to work pretty good- it speeds thing
> up lots after initial load. I can't seem to figure out where the
> botttleneck is.
> thanks for the mailing list
This is a nice usecase! However, I will need more information about the
workflow you currently have in place. This is certainly a special
usecase, and there are many ways to improve it. The thing that will
probably the most impact is the workflow you will use to ingest all that
data.
First, let me ask you a few questions:
(1) These 7.5 million pages get described in RDF using some of your own
procedure? Is Scones involved in this?
(2) What are the structWSF endpoints currently involved? (Crud: Create,
Scones, Dataset: Create, etc, etc) So, what is the workflow you
currently have in place?
(3) What are the endpoint you are planning to use or to expose to the
public?
(4) What is the general size of the records to index (size in terms of
number of triples; and if you have plenty of textual data to index)
(5) If you are using Scones, what is the size of the ontology you are
using for tagging the documents (size in terms of the number of
classes/properties/named entities)
This said, there are many different strategies that can be used
depending how you answer these questions. Tweaking the memory available,
using multi-threading and the like is one side of the coin, but the
workflow is the other side. The first step is certainly to put the
proper workflow in place, and then to tweak accordingly.
By default, structWSF setuped and used to be able to cope with all
usecases; however, this ability has a price: speed. So, once we know our
usecase, we can use structWSF in different ways (using different
parameters; particularly for the Crud Create endpoint) to improve
(dramatically?) the processing time. So, this is why defining the
workflow is the real logical and essential step. We have to keep in mind
that structWSF is a API layer that manage multiple, and all kind of
different underlying system (such as Solr, Virtuoso, GATE, OWLAPI, etc).
Maybe a first tiny thing to do would certainly to upgrade your structWSF
instance to 1.0a94. If it is not already done, and if you have the
version 1.0a92 and above, then use the structWSF Upgrader [1]
Once we have a better understanding of your workflow, we will be able to
point you to different ways to define your workflow to improve the
overall process.
[1] https://github.com/structureddynamics/structWSF-Upgrader
Thanks,
Take care,
Fred