Search Functionality in TopBraid EDG

43 views
Skip to first unread message

AH1987

unread,
Jul 10, 2019, 2:35:39 PM7/10/19
to TopBraid Suite Users
Hello All,

I have a conceptual question about TopBraid EDG search functionality. In most of the demos that I have watched, the Auto-classifier/Tagger is searching within a Corpora.

Is there a way to search through documents not in a corpora (PDF, word docs etc)?  If this is possible, where do these documents have to be stored? Can it be on a sharepoint site?

Thank you,
Anna

Irene Polikoff

unread,
Jul 10, 2019, 3:35:04 PM7/10/19
to topbrai...@googlegroups.com
Search the EDG will index any asset collection that is has “include in Search the EDG” turned on. All information in the collection will be used to populate searchable index.

When you create a corpus, you get a choice of connectors - see below. Additional connectors can be added.


In the simplest case, you could select “no connector” or “URL list". Then, you could add individual documents by either uploading them (no connector) or by providing a list of their URLs (URL list).

Connectors will get all metadata available for each document and store it in EDG. 
They will also get the content of the document and store it as a value of the content field. The document remains where it was originally (e.g., web, CMS, file system), but its content is also captured in EDG to enable search and auto-classification.

If a number of documents you have is not very large (e.g., not tens and hundreds of millions), then the converted content can continue to be stored in EDG and, thus, if corpus is indexed, you can search over it with Search the EDG.

This all works without any additional processing with the Autoclassifier.

When you have a very large number of documents, then there is an option to store the content only temporary in order to classify it, extract tags and then flash it out of EDG. 

In this case, what remains stored in EDG is only the metadata about each document including discovered tags and, thus, only this information will participate in the Search the EDG. The document content itself will not. This scenario is used if you already have a solution that indexes your document content and EDG is only used to enrich the index you already have. This would mean some integration between your existing search and EDG - either through export of what is produced by the Auto-classifier or some real time integration, etc.

--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/dd3a1ac4-f645-433d-9e26-5852bea34773%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Richard Cyganiak

unread,
Jul 11, 2019, 5:38:12 AM7/11/19
to topbraid-users list
Hi Anna,

To add to Irene's answer:

1. SharePoint supports CMIS. This may need to be enabled on the SharePoint side. Then, an EDG corpus with the CMIS connector can be used to connect to SharePoint and access the documents.

2. If the number of documents is small enough to upload one by one, then an EDG corpus without connector can be created, and the documents added via “Import single document”.

Richard


Anna Hicken

unread,
Jul 19, 2019, 7:13:41 PM7/19/19
to TopBraid Suite Users
Thank you both for your responses.  How does Search the EDG actually work? Is it is powered by an elastic search? 

Irene Polikoff

unread,
Jul 19, 2019, 7:33:25 PM7/19/19
to topbrai...@googlegroups.com
It is powered by a built-in Lucene engine that is integrated with the the graph store that is serving as EDG repository.

Users can select which graphs (asset collections) are to be indexed. 

When data in the repository changes, text search index is updated automatically.

--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages