Search the EDG will index any asset collection that is has “include in Search the EDG” turned on. All information in the collection will be used to populate searchable index.
When you create a corpus, you get a choice of connectors - see below. Additional connectors can be added.
In the simplest case, you could select “no connector” or “URL list". Then, you could add individual documents by either uploading them (no connector) or by providing a list of their URLs (URL list).
Connectors will get all metadata available for each document and store it in EDG.
They will also get the content of the document and store it as a value of the content field. The document remains where it was originally (e.g., web, CMS, file system), but its content is also captured in EDG to enable search and auto-classification.
If a number of documents you have is not very large (e.g., not tens and hundreds of millions), then the converted content can continue to be stored in EDG and, thus, if corpus is indexed, you can search over it with Search the EDG.
This all works without any additional processing with the Autoclassifier.
When you have a very large number of documents, then there is an option to store the content only temporary in order to classify it, extract tags and then flash it out of EDG.
In this case, what remains stored in EDG is only the metadata about each document including discovered tags and, thus, only this information will participate in the Search the EDG. The document content itself will not. This scenario is used if you already have a solution that indexes your document content and EDG is only used to enrich the index you already have. This would mean some integration between your existing search and EDG - either through export of what is produced by the Auto-classifier or some real time integration, etc.