Multiple FileTypes

565 views
Skip to first unread message

Afzal

unread,
Mar 17, 2010, 2:17:27 PM3/17/10
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Can we search multiple file types without using the filter option in
GSA. I want to use the query parameter to search for multiple files
with extensions like (.html, .doc, .pdf).

Based on the search protocol reference document, it says

You can specify multiple file types by adding filetype: terms to the
search query, combined with the Boolean OR.

I tried adding following in my query:

&as_filetype=html&as_filetype=pdf

but it is retruning only pdf documents. Am I doing anything wrong in
this.

Appreciate any help or guidance.

Thanks,
Afzal

Marcos Farias

unread,
Mar 17, 2010, 3:29:30 PM3/17/10
to google-search-...@googlegroups.com
Hi Afzal,


  You'll notice that as_filetype is different of filetype. The first one is a distinct param while the last one (filetype) is a special query term, which means you should include it inside your q param value.

   For instance, if you want to get all pdf or doc files that include "health", you can use "q=health filetype:pdf OR filetype:doc" as in the following example:

/search?q=health+filetype:pdf+OR+filetype:doc+&btnG=Google+Search&access=p&client=default_frontend&output=xml_no_dtd&proxystylesheet=default_frontend&sort=date:D:L:d1&entqr=3&oe=UTF-8&ie=UTF-8&ud=1&site=default_collection

Regards and good luck,

--
You received this message because you are subscribed to the Google Groups "Google Search Appliance/Google Mini" group.
To post to this group, send email to Google-Search-...@googlegroups.com.
To unsubscribe from this group, send email to Google-Search-Applia...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/Google-Search-Appliance-Help?hl=en.


Afzal

unread,
Mar 18, 2010, 9:51:50 AM3/18/10
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Marcos,

Thanks for your help, it worked with what you have suggested. Though,
I have another question:

On the search result page it is diplaying

Results 1 - 2 of about 1570 for training filetype:pdf OR filetype:doc
OR filetype:html OR filetype:htm.

Is there any way we can hide all these filetypes from this so that it
should only say

Results 1 - 2 of about 1570 for training

Thank you
-Afzal


On Mar 17, 3:29 pm, Marcos Farias <mfarias2...@gmail.com> wrote:
> Hi Afzal,
>

>   Take a look onhttp://code.google.com/apis/searchappliance/documentation/62/xml_refe...


>
>   You'll notice that as_filetype is different of filetype. The first one is
> a distinct param while the last one (filetype) is a special query term,
> which means you should include it inside your q param value.
>
>    For instance, if you want to get all pdf or doc files that include
> "health", you can use "q=health filetype:pdf OR filetype:doc" as in the
> following example:
>

> /search?q=health+filetype:pdf+OR+filetype:doc+&btnG=Google+Search&access=p&­client=default_frontend&output=xml_no_dtd&proxystylesheet=default_frontend&­sort=date:D:L:d1&entqr=3&oe=UTF-8&ie=UTF-8&ud=1&site=default_collection
>
> Regards and good luck,
> Marcos Fariashttp://www.justdigital.com.br/


>
>
>
> On Wed, Mar 17, 2010 at 3:17 PM, Afzal <afzal....@gmail.com> wrote:
> > Can we search multiple file types without using the filter option in
> > GSA. I want to use the query parameter to search for multiple files
> > with extensions like (.html, .doc, .pdf).
>
> > Based on the search protocol reference document, it says
>
> > You can specify multiple file types by adding filetype: terms to the
> > search query, combined with the Boolean OR.
>
> > I tried adding following in my query:
>
> > &as_filetype=html&as_filetype=pdf
>
> > but it is retruning only pdf documents. Am I doing anything wrong in
> > this.
>
> > Appreciate any help or guidance.
>
> > Thanks,
> > Afzal
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google Search Appliance/Google Mini" group.
> > To post to this group, send email to
> > Google-Search-...@googlegroups.com.
> > To unsubscribe from this group, send email to

> > Google-Search-Applia...@googlegroups.com<Google-Search-App­liance-Help%2Bunsu...@googlegroups.com>


> > .
> > For more options, visit this group at

> >http://groups.google.com/group/Google-Search-Appliance-Help?hl=en.- Hide quoted text -
>
> - Show quoted text -

Marcos Farias

unread,
Mar 18, 2010, 10:39:22 AM3/18/10
to google-search-...@googlegroups.com
Afzal,

  I believe the easiest (and perhaps the best) way to reach that goal is make use of collections. In that way, you could create a collection, named as docs_collection for instance, which would include just pdf, doc, html and htm files.

  Then, when performing search, you would just specify in the q param the string training and set site param as equal to docs_collection. 

  This also could give you an extra benefit of getting results quicker when compared to use the filetype special query term.

  Access Help Center > Crawl and Index > Collections at your GSA's Admin Console in order to get more information on how to use this feature.

Good luck,
Marcos Farias
To unsubscribe from this group, send email to Google-Search-Applia...@googlegroups.com.

Afzal

unread,
Mar 19, 2010, 12:12:12 PM3/19/10
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Marcos,

I tried adding the filetype filters in the collections as you have
suggested, but still seeing the files that are not supposed to be in
the result set. Following is the list of filetypes I added in the
collection:

.css$
.csv$
.doc$
.dot$
.exe$
.gif$
.htm$
.html$
.jpg$
.pdf$
.ppt$
.prn$
.txt$
.xls$

Am I doing anything wrong in this.

Please help.

Thanks,
-Afzal

On Mar 18, 10:39 am, Marcos Farias <mfarias2...@gmail.com> wrote:
> Afzal,
>
>   I believe the easiest (and perhaps the best) way to reach that goal is
> make use of collections. In that way, you could create a collection, named
> as docs_collection for instance, which would include just pdf, doc, html and
> htm files.
>
>   Then, when performing search, you would just specify in the q param the
> string training and set site param as equal to docs_collection.
>
>   This also could give you an extra benefit of getting results quicker when
> compared to use the filetype special query term.
>
>   Access Help Center > Crawl and Index > Collections at your GSA's Admin
> Console in order to get more information on how to use this feature.
>
> Good luck,

> Marcos Fariashttp://www.justdigital.com.br/

> > > > Google-Search-Applia...@googlegroups.com<Google-Search-Appliance-Help%2Bunsu...@googlegroups.com>
> > <Google-Search-App­liance-Help%2Bunsu...@googlegroups.com<liance-Help%252Buns...@googlegroups.com>


>
> > > > .
> > > > For more options, visit this group at

> > > >http://groups.google.com/group/Google-Search-Appliance-Help?hl=en.-Hidequoted text -


>
> > > - Show quoted text -
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google Search Appliance/Google Mini" group.
> > To post to this group, send email to
> > Google-Search-...@googlegroups.com.
> > To unsubscribe from this group, send email to

> > Google-Search-Applia...@googlegroups.com<Google-Search-Appliance-Help%2Bunsu...@googlegroups.com>

Dave Watts

unread,
Mar 19, 2010, 12:26:49 PM3/19/10
to google-search-...@googlegroups.com
> I tried adding the filetype filters in the collections as you have
> suggested, but still seeing the files that are not supposed to be in
> the result set. Following is the list of filetypes I added in the
> collection:
>
> .css$
> .csv$
> .doc$
> .dot$
> .exe$
> .gif$
> .htm$
> .html$
> .jpg$
> .pdf$
> .ppt$
> .prn$
> .txt$
> .xls$
>
> Am I doing anything wrong in this.

Well, first, why do you have .exe and .gif in there? You don't want
those in your index, do you? They aren't searchable text, really.

Second, are you filtering by collection?

Dave Watts, CTO, Fig Leaf Software
http://www.figleaf.com/
http://training.figleaf.com/

Fig Leaf Software is a Veteran-Owned Small Business (VOSB) on
GSA Schedule, and provides the highest caliber vendor-authorized
instruction at our training centers, online, or onsite.

Afzal

unread,
Mar 19, 2010, 1:12:37 PM3/19/10
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Yes I am filtering by collection as per Marcos's suggestion and
moreover I tried taking out the .exe and .gif files extensions from
the collection , but still it does not return desired result.

Thanks,


On Mar 19, 12:26 pm, Dave Watts <dwa...@figleaf.com> wrote:
> > I tried adding the filetype filters in the collections as you have
> > suggested, but still seeing the files that are not supposed to be in
> > the result set. Following is the list of filetypes I added in the
> > collection:
>
> > .css$
> > .csv$
> > .doc$
> > .dot$
> > .exe$
> > .gif$
> > .htm$
> > .html$
> > .jpg$
> > .pdf$
> > .ppt$
> > .prn$
> > .txt$
> > .xls$
>
> > Am I doing anything wrong in this.
>
> Well, first, why do you have .exe and .gif in there? You don't want
> those in your index, do you? They aren't searchable text, really.
>
> Second, are you filtering by collection?
>

> Dave Watts, CTO, Fig Leaf Softwarehttp://www.figleaf.com/http://training.figleaf.com/

Marcos Farias

unread,
Mar 19, 2010, 1:31:51 PM3/19/10
to google-search-...@googlegroups.com
Azfal,

1 - You are sure that you are not confunding the "Crawl and Index" menu with the "Collections" menu, right?
2 - remember GSA is by default case sensitive. In case your files have differents cases in their extension, you could try to use regexpIgnoreCase 
3 - could you send us the url you are using to perform the search? Maybe that give us a hint of what is happening.

Rgds

--
You received this message because you are subscribed to the Google Groups "Google Search Appliance/Google Mini" group.
To post to this group, send email to Google-Search-...@googlegroups.com.
To unsubscribe from this group, send email to Google-Search-Applia...@googlegroups.com.

Afzal

unread,
Mar 19, 2010, 2:18:13 PM3/19/10
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hi Marcos,

I have put the same filter for the Crawl and Index too and added these
in the both cases (lower and upper)

Following is the url that I am using currently

/search?
q=training&site=hcs_hin_dev&client=hcs_frontend&output=xml_no_dtd&proxystylesheet=hcs_frontend&filter=1

I have defined the filters in the collection "hcs_hin_dev".

Regards,


On Mar 19, 1:31 pm, Marcos Farias <mfarias2...@gmail.com> wrote:
> Azfal,
>
> 1 - You are sure that you are not confunding the "Crawl and Index" menu with
> the "Collections" menu, right?
> 2 - remember GSA is by default case sensitive. In case your files have
> differents cases in their extension, you could try to use regexpIgnoreCase
> 3 - could you send us the url you are using to perform the search? Maybe
> that give us a hint of what is happening.
>
> Rgds
>

> > Google-Search-Applia...@googlegroups.com<Google-Search-Appliance-Help%2Bunsu...@googlegroups.com>

Reply all
Reply to author
Forward
0 new messages