Search the EDG in localhost 6.3.2 - using Lucene on local machine or does it require TBL server ? and how may Lucene be configured ?

6 views
Skip to first unread message

Simon Opper

unread,
May 25, 2020, 1:02:15 AM5/25/20
to TopBraid Suite Users
Hi there

We need to optimise Lucene in a text search App and I want to run some tests and explore the configuration options for Lucene on a text corpus, datagraph and/or ontology.

Part 1 of my question is:

When using a local instance of TBCME and EDG on 6.3.2 search the EDG box from the main home page of EDG is not responsive. There is nothing triggered in the client after the on click action as observed via chrome dev tools when clicking the search button.

Does Search the EDG require a server or can it be run on localhost ?

Part 2:

I've run through documentation I can find for customizing Lucene. e.g. textindex.ui.ttl but it doesn't give many clues to configuration all the functionality that exists in Lucene or in EDG.

E.g. on the EDG search configuration screen (see below) the selected classes, search facets and properties are listed. Where are all these configured  ? 

How are/can the other many optimization aspects of Lucene configured ?


Current Search Configuration

Search is currently configured, by administrator, to find items in the asset collections:

Selected Collections

  • Data Graphs:  Wiring Rules Data Graph
     (urn:x-evn-master:rules_data_graph)
  • Ontologies:  wiring rules data
     (urn:x-evn-master:rules_data)

Regards

Simon 

Selected Classes

    Selected Search Properties

      Selected Search Facets

      Irene Polikoff

      unread,
      May 25, 2020, 2:35:42 AM5/25/20
      to topbrai...@googlegroups.com
      Simon,

      Please see below

      On May 25, 2020, at 1:02 AM, Simon Opper <simon...@surroundaustralia.com> wrote:

      Hi there

      We need to optimise Lucene in a text search App and I want to run some tests and explore the configuration options for Lucene on a text corpus, datagraph and/or ontology.

      Part 1 of my question is:

      When using a local instance of TBCME and EDG on 6.3.2 search the EDG box from the main home page of EDG is not responsive. There is nothing triggered in the client after the on click action as observed via chrome dev tools when clicking the search button.

      Does Search the EDG require a server or can it be run on localhost ?

      Yes, it runs on local host.

      I can’t reproduce your issues.

      One possibility is that you have no data in the asset collections you set to be indexed. From Rob’s e-mails, I know that he uses files and asset collections in EDG are simply “wrappers” for these files. If you are following the same pattern, you will get no search results. Search the EDG indexing will index only the content that you actually have in the asset collection, not content included by reference.


      Part 2:

      I've run through documentation I can find for customizing Lucene. e.g. textindex.ui.ttl but it doesn't give many clues to configuration all the functionality that exists in Lucene or in EDG.

      E.g. on the EDG search configuration screen (see below) the selected classes, search facets and properties are listed. Where are all these configured  ? 

      How are/can the other many optimization aspects of Lucene configured ?


      These customizations are about selecting facets and configuring how the results page will look like.

      By default, EDG will auto calculate the facets to be shown on the results page - using the “most populated” properties. But you can customize this.



      Current Search Configuration

      Search is currently configured, by administrator, to find items in the asset collections:

      Selected Collections

      • Data Graphs:  Wiring Rules Data Graph
         (urn:x-evn-master:rules_data_graph)
      • Ontologies:  wiring rules data
         (urn:x-evn-master:rules_data)

      Regards

      Simon 

      Selected Classes

        Selected Search Properties

          Selected Search Facets


          --
          You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
          To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
          To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/a993e039-8567-4d61-97ca-fb46df1f1284%40googlegroups.com.

          Simon Opper

          unread,
          May 25, 2020, 5:39:01 PM5/25/20
          to topbrai...@googlegroups.com
          Thanks very much for the reply Irene.

          Could you please fix the broken image links in the documentation link you sent please. The guidance on customising the facets is not visible and I believe this is the info I need ....

          image.png

          Re: data wrapping, it's not the issue per se it seems as I was able to get things working with the data assets we have. But it appears that something else we are doing with automated bulk loading / scanning for ontology manifest changes is breaking the indexing or triggering searching the EDG.

          Many thanks

          S





          Rob Atkinson

          unread,
          May 25, 2020, 8:00:30 PM5/25/20
          to TopBraid Suite Users

          Yes, it runs on local host.

          I can’t reproduce your issues.

          One possibility is that you have no data in the asset collections you set to be indexed. From Rob’s e-mails, I know that he uses files and asset collections in EDG are simply “wrappers” for these files. If you are following the same pattern, you will get no search results. Search the EDG indexing will index only the content that you actually have in the asset collection, not content included by reference.

          We definitely need to be able to configure search to handle customised graph closures  - and creating wrappers with appropriate imports is one obvious way to do this, as well as being the best way to handle large static data streams generated by other processes. 
           
          I have wondered if, something like teamwork:imports there is a property a graph could have to indicate a-box content, and hence inclusion in the default closure for search and display. It may be a big leap to allow editing only on local content in future - but perhaps we can start with the search problem

          So whats the way forward - is there a piece of Java code we need to rewrite here? 

          Part 2:

          I've run through documentation I can find for customizing Lucene. e.g. textindex.ui.ttl but it doesn't give many clues to configuration all the functionality that exists in Lucene or in EDG.

          E.g. on the EDG search configuration screen (see below) the selected classes, search facets and properties are listed. Where are all these configured  ? 

          How are/can the other many optimization aspects of Lucene configured ?

          Did you look at this https://doc.topquadrant.com/6.3/developer-guide/#Search_the_EDG_Customizations?

          These customizations are about selecting facets and configuring how the results page will look like.

          By default, EDG will auto calculate the facets to be shown on the results page - using the “most populated” properties. But you can customize this.


          Current Search Configuration

          Search is currently configured, by administrator, to find items in the asset collections:

          Selected Collections

          • Data Graphs:  Wiring Rules Data Graph
             (urn:x-evn-master:rules_data_graph)
          • Ontologies:  wiring rules data
             (urn:x-evn-master:rules_data)

          Regards

          Simon 

          Selected Classes

            Selected Search Properties

              Selected Search Facets


              --
              You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
              To unsubscribe from this group and stop receiving emails from it, send an email to topbrai...@googlegroups.com.

              Irene Polikoff

              unread,
              May 25, 2020, 8:50:06 PM5/25/20
              to topbrai...@googlegroups.com
              Please see below

              On May 25, 2020, at 8:00 PM, Rob Atkinson <robatki...@gmail.com> wrote:


              Yes, it runs on local host.

              I can’t reproduce your issues.

              One possibility is that you have no data in the asset collections you set to be indexed. From Rob’s e-mails, I know that he uses files and asset collections in EDG are simply “wrappers” for these files. If you are following the same pattern, you will get no search results. Search the EDG indexing will index only the content that you actually have in the asset collection, not content included by reference.

              We definitely need to be able to configure search to handle customised graph closures  - and creating wrappers with appropriate imports is one obvious way to do this, as well as being the best way to handle large static data streams generated by other processes. 

              Only content in the asset collection is indexed for the Search the EDG index. In other words, data must be in EDG’s triple store. If it is external data, it will not be indexed. 

               
              I have wondered if, something like teamwork:imports there is a property a graph could have to indicate a-box content, and hence inclusion in the default closure for search and display.

              You can run queries over included content. Search in the asset collection will work - even over data that is in a file. This search uses GraphQL access to a graph with all imports closure. 

              Search the EDG uses Lucene index that is created and maintained/updated as data changes. The indexing process only indexes the content in each asset collection.

              It may be a big leap to allow editing only on local content in future - but perhaps we can start with the search problem

              So whats the way forward - is there a piece of Java code we need to rewrite here? 

              I will leave this question for Holger. 

              Content outside of EDG repository requires import closure, loading and all operations to always be resolved in memory. Our strategic direction is to minimize such cases with the objective to eventually have all data stored in EDG’s repository. This aligns better with a number of goals such as cloud enablement.

              Further, one needs to make assumptions about the lifecycle of this external data (when does it change, etc.). If you have specific use cases for your application and specific system architecture in mind, you can make such assumptions, but we don’t. Typically, we expect customers to load data into EDG repository as an asset collection, marking it “read only". If the data is largely static why is it a problem to load it into EDG? A system that creates it can create EDG asset collection as opposed to make it available as a file. If the issue is size, I don’t believe that having it outside of EDG helps in dealing with size.

              To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
              To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/b4dfcc02-35c8-4ba1-8aea-2605b220d0c9%40googlegroups.com.

              Rob Atkinson

              unread,
              May 25, 2020, 11:40:40 PM5/25/20
              to TopBraid Suite Users

              Probably ought to wind up this topic is the answer is "Lucene works on local development server" and "Lucene needs data to be in the graph, not in imports"  - that said, I guess we'd still be interested in seeing if we can open up that easily.

              Its another thread to work out how much work it would take to make it feasible to copy all the data into EDG and sync it automatically with changes in a source. At the moment we are happier with version controlled sources and an import via project deployment and we can copy data at load time rather than import. 
              It would be a pain to use the file import service and have to multipart form-encode the file contents to load it.

              Is there another option to look at a deployed file and efficiently copy it into an new asset collection?  Still dont think this gives us the flexibility to keep the original data and potential additional data separate (segmented A-box union graphs) whilst retaining search capability. 

              Holger Knublauch

              unread,
              May 25, 2020, 11:46:00 PM5/25/20
              to topbrai...@googlegroups.com

              Hi Rob,

              I think the scenario you describe is already supported by the free-text search if you are in individual asset collections. The Lucene-text index there does include all imported files. However, the Search the EDG index has the goal to drive the navigation into asset collections, so it doesn't include those triples that are not from imported files.

              Holger
              To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
              To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/33531b69-7ce5-43f7-aabc-19c31c36d07a%40googlegroups.com.
              Reply all
              Reply to author
              Forward
              0 new messages