Search attachments

16 views
Skip to first unread message

Peter Chan

unread,
Nov 22, 2012, 12:14:51 AM11/22/12
to ep...@googlegroups.com
Do we want to search
1. attachment + body
2. attachment only
3. body only

Chaiyasit (Sit) Manovit

unread,
Nov 22, 2012, 6:50:15 PM11/22/12
to Peter Chan, ep...@googlegroups.com

--

Current plan is to make 1 the default behavior and maybe we can add advanced options for users to choose 2/3.

Other related questions are:

Should we make metadata, such as filename/author/camera model (for photos)/etc., available for the search (and maybe also viewable to users)? I plan to include it.

Should we also identify for users which attachments have the search hits? Sudheendra suggests we do so. If so, in what manner? Probably, highlighting them in the message browser - see also below question.

Currently, the search result viewer is message-oriented where we see the chart of messages over time. With attachment search in mind, might users want to see results in an attachment-oriented way where the chart presents the hit attachments over time? If so, should clicking on such chart directly bring up the attachment (may need attachment browser then) or should it bring up the message that contains the hit attachment which are highlighted?

As for supported document formats, we use Tika and here is the list:
http://tika.apache.org/1.2/formats.html

Thanks,

--Sit

Peter Chan

unread,
Nov 27, 2012, 2:22:07 PM11/27/12
to ep...@googlegroups.com, Peter Chan, s...@ixoratech.com
Hi All,

Sit is working on searching attachments of emails. He needs feedback from us.

Since we will do entity search only in discovery module, I assume what you suggested refer to other modules where we will have FULL text search.


"Should we make metadata, such as filename/author/camera model (for photos)/etc., available for the search (and maybe also viewable to users)? I plan to include it."
I think all metadata should be searchable. A windows (can be turned off) showing these metadata next to the file would be useful.

"Should we also identify for users which attachments have the search hits? Sudheendra suggests we do so. If so, in what manner? Probably, highlighting them in the message browser - see also below question."
Since only a limited file formats are searchable using the "Tika" toolkit, we should indicate which attachments are included. Highlighting is my choice as well (with explanation, may be, at the bottom what highlighting means.

"Currently, the search result viewer is message-oriented where we see the chart of messages over time. With attachment search in mind, might users want to see results in an attachment-oriented way where the chart presents the hit attachments over time? If so, should clicking on such chart directly bring up the attachment (may need attachment browser then) or should it bring up the message that contains the hit attachment which are highlighted?"
Can we add viewing of attachments to the existing email view? See the email body to be followed by the attachments?

Peter

Glynn Edwards

unread,
Nov 27, 2012, 4:59:00 PM11/27/12
to Peter Chan, ep...@googlegroups.com, s...@ixoratech.com
In future it would be good to specify which Module we are discussing in addition to the issue at hand. 

Since much of this refers to searching full-text I assume it is primarily for Delivery Module (and Appraisal/Processing Modules) rather than the Discovery Module? However there could be an element of entity extraction from the text of the attachments that could be discovered/indexed in the Discovery Module.

I agree that the full metadata for image and other files should be accessible - filename/author/camera model/date, etc. - in other 3 modules. Turning it off would allow patrons to simplify screen if they are not interested.

And, yes we need to identify results to users - would like to see how it would look highlighted. Am open to other suggestions. Do we think we would display a "wall" of attachments as MUSE did? Would this be narrowed by format?

Attachments for photos would work well over time - I'm not sure about text based docs. Is this flexible? Could one arrange by TIME or FORMAT or AUTHOR? Many attachments are most relevant in context of their email conversation - I think MUSE offered a link or at least the file name an attachment in the body? 

Glynn
 


--
 
 



--
Glynn Edwards | home: (650) 498-9382
969 Clark Way, Palo Alto, CA   94304
cell: (650) 521-2255 | glyn...@gmail.com

Peter Chan

unread,
Nov 27, 2012, 10:29:11 PM11/27/12
to ep...@googlegroups.com, Peter Chan, s...@ixoratech.com
Hi,

Please see the attached screen shot of what I mean by show attachment after email body. 

Peter
ePADD_001.PNG
Reply all
Reply to author
Forward
0 new messages