Combining documents

0 views
Skip to first unread message

Mohankumar Govindasamy

unread,
Feb 2, 2015, 9:07:05 AM2/2/15
to Google-Search-...@googlegroups.com
Hi All,
I have a requirement to combine documents and provide it as a single result.
Example.
For a product xyz, I have a document from sales team and a document from package team.
If a user searches for xyz, I need to give the combined single result as response.
 
Note: Both the documents are available in index.
 
Is it possible in GSA?
 
Any clue how to address this requirement?
 
Thanks,
Mohan.
 

Mathias Bierl

unread,
Feb 2, 2015, 9:36:45 AM2/2/15
to Google-Search-...@googlegroups.com
You should combine the documents before feeding it into the index of the GSA.
Otherwise you have to filter this in frontend but this could be really complicated if it happens often

Mohankumar Govindasamy

unread,
Feb 2, 2015, 11:29:44 AM2/2/15
to Google-Search-...@googlegroups.com
Yes Mathias,we can use feeds to combine but unfortunately here documents are indexed thru web crawl.

Dave Watts

unread,
Feb 2, 2015, 11:33:05 AM2/2/15
to Google-Search-...@googlegroups.com
> Yes Mathias,we can use feeds to combine but unfortunately here documents are
> indexed thru web crawl.

They won't be if you want to merge them. You'll need to replace your
crawl with a feed.

Even there, though, you're going to have to figure out what end result
you want. For example, if the user runs a search and you have two
related documents, say a PDF and an HTML page, what do you want the
user to see? Ultimately, they're going to have to click on a link to
view a document. Are you going to have landing pages for these
composite documents?

Dave Watts, CTO, Fig Leaf Software
1-202-527-9569
http://www.figleaf.com/
http://training.figleaf.com/

Fig Leaf Software is a Service-Disabled Veteran-Owned Small Business
(SDVOSB) on GSA Schedule, and provides the highest caliber vendor-
authorized instruction at our training centers, online, or onsite.

Alessandro Lapadula

unread,
Feb 3, 2015, 3:36:28 AM2/3/15
to Google-Search-...@googlegroups.com
Hi All,
You can try this workaround: create a zip archive containing the documents and index it. 

Notice that in terms of license a zip archive is counted as the number of contained files.

Regards,
Alessandro

Edwin Stauthamer

unread,
Feb 3, 2015, 4:32:24 AM2/3/15
to Google-Search-...@googlegroups.com
Hi,

So the GSA extracts the text from all files in the ZIP-file?
In the results, will the ZIP file be presented or the individual files contained in the ZIP, or both?

--
You received this message because you are subscribed to the Google Groups "Google Search Appliance/Google Mini" group.
To unsubscribe from this group and stop receiving emails from it, send an email to Google-Search-Applia...@googlegroups.com.
To post to this group, send email to Google-Search-...@googlegroups.com.
Visit this group at http://groups.google.com/group/Google-Search-Appliance-Help.
For more options, visit https://groups.google.com/d/optout.



--
Met vriendelijke groet,

Edwin Stauthamer
ed...@stauthamer.net
(+31) (0) 6 45554994

Dave Watts

unread,
Feb 3, 2015, 10:14:56 AM2/3/15
to Google-Search-...@googlegroups.com
> So the GSA extracts the text from all files in the ZIP-file?

Yes.

http://www.google.com/support/enterprise/static/gsa/docs/admin/72/gsa_doc_set/file_formats/file_formats.html#1081660

> In the results, will the ZIP file be presented or the individual files
> contained in the ZIP, or both?

If I recall correctly, the cached copy will be the individual file,
but the link will go to the zip file.

Mohankumar Govindasamy

unread,
Feb 3, 2015, 11:39:48 PM2/3/15
to Google-Search-...@googlegroups.com
Thank you all for your valuable inputs.
 
Regards,
Mohan
Reply all
Reply to author
Forward
0 new messages