Processing 10,000 files

18 views
Skip to first unread message

Tim Smith

unread,
Feb 10, 2012, 4:03:26 PM2/10/12
to topbrai...@googlegroups.com
Hi,

I have a situation where I have 10,000+ small text files in a rather deep file system tree.  Ultimately, I need to attach these files to instances in an ontology based on the presence/absence of various words in the files.

For example, if a file mentions the name of a particular database table, that file should be attached to the instance that is the database table.

I believe the embedded Lucene engine can do this.

I was thinking that if I can sequentially process each file, I can make each one an instance of a File class where each instance has a name (the file name) and a property that contains the contents of the file.  Then if I can trigger Lucene to index the resulting text strings, I will be able to look for the key words I'm interested in and Construct the tagging relationships (using pf:Match).

However, I'm struggling with how to process all the files in a given directory and how to trigger Lucene to index the text strings added to a model.

SPARQLMotion supports importing a single text file but does not appear to support multiple files/directory trees.

Any suggestions on how to process large numbers of files and trigger the indexing process?

Thanks in advance,

Tim


Scott Henninger

unread,
Feb 14, 2012, 3:46:24 PM2/14/12
to TopBraid Suite Users
Tim; You could use tops:files for processing a set of files. See Help
> Reference > SPARQL Property Functions.

<<I was thinking that if I can sequentially process each file, I can
make
each one an instance of a File class where each instance has a name
(the
file name) and a property that contains the contents of the file.>>

Yes, this is possible. You can use sml:ImportTextFile to read in the
file and insert it as the value for a property.

<<Then if I can trigger Lucene to index the resulting text strings>>

However, you are correct that there isn't currently a way to trigger
Lucene/LARQ other than through the SPARQL View (i.e. pre-processing a
file opened in Composer). We are evaluating options and use cases
around this issue. One thought would be to have a SPARQLMotion module
that creates LARQ indices.

We will have more to say soon. This is slated for enhancement in
3.6.1 or 3.6.2.

-- Scott

Tim Smith

unread,
Feb 14, 2012, 3:55:47 PM2/14/12
to topbrai...@googlegroups.com
Ahhh... I had forgotten about the tops: functions.  I'll give those a try.

I found that I can trigger a Lucene index build via TBL console.  At least I think it is the same thing.

Tim



--
You received this message because you are subscribed to the Google
Group "TopBraid Suite Users", the topics of which include Enterprise Vocabulary Network (EVN), TopBraid Composer,
TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
To post to this group, send email to
topbrai...@googlegroups.com
To unsubscribe from this group, send email to
topbraid-user...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/topbraid-users?hl=en

Scott Henninger

unread,
Feb 14, 2012, 4:05:29 PM2/14/12
to TopBraid Suite Users
Tim; You may be referring to the Dictionary for TopBraid Ensemble.
This does indeed use Lucene indexing and may or may not be useful for
you. Specifically, this is different than LARQ indexing in the SPARQL
View.

For more on how the dictionary code processes labels (subproperties of
rdfs:label) see Section 1.5 of the TopBraid Ensemble Application
Development Guide (More Information section on
http://www.topquadrant.com/products/TB_Ensemble.html)

-- Scott

On Feb 14, 2:55 pm, Tim Smith <smith.ts...@gmail.com> wrote:
> Ahhh... I had forgotten about the tops: functions.  I'll give those a try.
>
> I found that I can trigger a Lucene index build via TBL console.  At least
> I think it is the same thing.
>
> Tim
>
> On Tue, Feb 14, 2012 at 3:46 PM, Scott Henninger <shennin...@topquadrant.com

umar farooq

unread,
Feb 14, 2012, 11:45:28 PM2/14/12
to topbrai...@googlegroups.com
Hi guys,
           I have some problems if you can solve i will be thankful to you.

1) First of all i need query that returns SubClasses of owl:Thing.
i wrote this one. is it correct.
SELECT ?cls
WHERE
{  ?cls rdfs:subClassOf* owl:Thing .
}

it is not working...?
and what is * in rdfs:subClassOf*.

2) I want labels of the classes as well as classes uri
I wrote this query
             "PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> " +
             "PREFIX owl:<http://www.w3.org/2002/07/owl#> " +
             "Select ?l1"+
             " WHERE { "+
             "?uri    rdfs:subClassOf pz:Pizza ." +
             "?uri rdfs:label ?l1 ."+
             "}";
But it gives wrong results
3) I want to get all the properties of the class when a user clicks on the class as protege dose. What would be the query.

4) How to pass argument to the web service. Such that if i want to find All subclasses of Pizza then i should pass pizza in argument. 
--
Best Regards,
Umar

Irene Polikoff

unread,
Feb 15, 2012, 12:14:55 AM2/15/12
to topbrai...@googlegroups.com

Umar,

 

Please don’t highjack existing threads by switching their topics in a reply. Instead, start a new post. I will create a separate e-mail to answer your questions.

 

Irene Polikoff

Reply all
Reply to author
Forward
0 new messages