Dynamic navigation not working for metadata and url feed but does work for full feed

15 views
Skip to first unread message

Grant

unread,
May 19, 2012, 1:26:07 PM5/19/12
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
When pushing up data through metadata and url and using the dynamic
navigation, the filters populate fine, but when I attempt to use one I
get the following types of messages:

Your search - test inmeta:corporate_source=Western%20Municipal%20Water
%20District%2C%20Riverside%2C%20CA - did not match any documents.
No pages were found containing "test inmeta:corporate_source=Western
%20Municipal%20Water%20District%2C%20Riverside%2C%20CA".

When pushing it up through a full feed it works fine.

This is a sample record using full feed:
<record url="http://testfeed.a_w_w_a.org/test/0075204.html"
mimetype="text/plain">
<metadata>
<meta name="title" content="Direct Filtration: A Tale of Two
Winters" />
<meta name="waternet_accession_no" content="0075204" />
<meta name="waternet_pub_abbv" content="OPF" />
<meta name="author" content="James, Wendell" />
<meta name="author" content="Craik, Stephen" />
<meta name="author" content="Molla, Saif" />
<meta name="author" content="Suarez, Alfredo" />
<meta name="corporate_source" content="EPCOR Water Services,
Edmonton, Alberta, Canada" />
<meta name="publication_name" content="Opflow" />
<meta name="publisher" content="AWWA" />
<meta name="issn" content="0149-8029" />
<meta name="coden" content="OPFLDG" />
<meta name="pub_year" content="2012" />
<meta name="pub_month" content="January" />
<meta name="volume" content="38" />
<meta name="issue_no" content="1" />
<meta name="number_of_pages" content="4" />
<meta name="content_type" content="Newsletter" />
<meta name="media_type" content="Print" />
<meta name="media_type" content="PDF" />
<meta name="keyword" content="Direct Filtration" />
<meta name="keyword" content="Water Treatment" />
<meta name="keyword" content="Edmonton, Alberta, Canada" />
<meta name="keyword" content="Coagulants" />
<meta name="article_type" content="Peer Reviewed" />
</metadata>
<content>This article discusses how Edmonton, Alberta, Canada,
substantially reduced
coagulant requirements and residuals production
by converting its water treatment plants to direct
filtration mode in winter.</content>
</record>

This is a metadata and url one -- the only difference is that this has
no content or mimetype tags.
<record url="http://testfeed.a_w_w_a.org/test/0075204.html">
<metadata>
<meta name="title" content="Direct Filtration: A Tale of Two
Winters" />
<meta name="waternet_accession_no" content="0075204" />
<meta name="waternet_pub_abbv" content="OPF" />
<meta name="author" content="James, Wendell" />
<meta name="author" content="Craik, Stephen" />
<meta name="author" content="Molla, Saif" />
<meta name="author" content="Suarez, Alfredo" />
<meta name="corporate_source" content="EPCOR Water Services,
Edmonton, Alberta, Canada" />
<meta name="publication_name" content="Opflow" />
<meta name="publisher" content="AWWA" />
<meta name="issn" content="0149-8029" />
<meta name="coden" content="OPFLDG" />
<meta name="pub_year" content="2012" />
<meta name="pub_month" content="January" />
<meta name="volume" content="38" />
<meta name="issue_no" content="1" />
<meta name="number_of_pages" content="4" />
<meta name="content_type" content="Newsletter" />
<meta name="media_type" content="Print" />
<meta name="media_type" content="PDF" />
<meta name="keyword" content="Direct Filtration" />
<meta name="keyword" content="Water Treatment" />
<meta name="keyword" content="Edmonton, Alberta, Canada" />
<meta name="keyword" content="Coagulants" />
<meta name="article_type" content="Peer Reviewed" />
</metadata>
</record>

Dave Watts

unread,
May 19, 2012, 2:02:34 PM5/19/12
to google-search-...@googlegroups.com
> When pushing up data through metadata and url and using the dynamic
> navigation, the filters populate fine, but when I attempt to use one I
> get the following types of messages:
>
> Your search - test inmeta:corporate_source=Western%20Municipal%20Water
> %20District%2C%20Riverside%2C%20CA - did not match any documents.
> No pages were found containing "test inmeta:corporate_source=Western
> %20Municipal%20Water%20District%2C%20Riverside%2C%20CA".
>
> When pushing it up through a full feed it works fine.

I can verify from personal experience that metadata-and-URL feeds work
with dynamic navigation, so there must be something else going on. Can
you post the HTTP request header and the XML document header from your
feed?

Also, you'll be better served by ignoring dynamic navigation right
now, and focusing on whether you're getting metadata in the first
place - you can directly verify that by using inmeta queries in an
unmodified front end.

> This is a sample record using full feed:
>        <record url="http://testfeed.a_w_w_a.org/test/0075204.html"
> mimetype="text/plain">

Just as a heads-up, you should be using the correct MIME type there (text/html).

Dave Watts, CTO, Fig Leaf Software
http://www.figleaf.com/
http://training.figleaf.com/

Fig Leaf Software is a Veteran-Owned Small Business (VOSB) on
GSA Schedule, and provides the highest caliber vendor-authorized
instruction at our training centers, online, or onsite.

Grant

unread,
May 19, 2012, 2:34:05 PM5/19/12
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
> Also, you'll be better served by ignoring dynamic navigation right
> now, and focusing on whether you're getting metadata in the first
> place
Dave - the appliance is getting the metadata from the feed. It
appears to populate everything correctly with the correct number of
hits.

> you can directly verify that by using inmeta queries in an
> unmodified front end.
Can I do queries from the search box? If so, how?

> Can
> you post the HTTP request header and the XML document header from your
> feed?
This is what I have now (is this all you want?):
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE gsafeed PUBLIC "-//Google//DTD GSA Feeds//EN" "gsafeed.dtd">
<gsafeed>
<header>
<datasource>testawwa</datasource>
<feedtype>full</feedtype>
</header>

The only thing that changed was I originally had:
<feedtype>metadata-and-url</feedtype>

> Just as a heads-up, you should be using the correct MIME type there (text/html).
I have a question on this - the text in the content tag is actually
just plain text, an abstract, even though it's referencing an html
document. Is the MIME type referencing the document or the <content>
tag? In the documentation it looks like


On May 19, 12:02 pm, Dave Watts <dwa...@figleaf.com> wrote:
> > When pushing up data through metadata and url and using the dynamic
> > navigation, the filters populate fine, but when I attempt to use one I
> > get the following types of messages:
>
> > Your search - test inmeta:corporate_source=Western%20Municipal%20Water
> > %20District%2C%20Riverside%2C%20CA - did not match any documents.
> > No pages were found containing "test inmeta:corporate_source=Western
> > %20Municipal%20Water%20District%2C%20Riverside%2C%20CA".
>
> > When pushing it up through a full feed it works fine.
>
> I can verify from personal experience that metadata-and-URL feeds work
> with dynamic navigation, so there must be something else going on. Can
> you post the HTTP request header and the XML document header from your
> feed?
>
> Also, you'll be better served by ignoring dynamic navigation right
> now, and focusing on whether you're getting metadata in the first
> place - you can directly verify that by using inmeta queries in an
> unmodified front end.
>
> > This is a sample record using full feed:
> >        <record url="http://testfeed.a_w_w_a.org/test/0075204.html"
> > mimetype="text/plain">
>
> Just as a heads-up, you should be using the correct MIME type there (text/html).
>
> Dave Watts, CTO, Fig Leaf Softwarehttp://www.figleaf.com/http://training.figleaf.com/

Grant

unread,
May 19, 2012, 2:36:30 PM5/19/12
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
I hit send a little too early - in the documentation it gives an
example:

<record url="..." mimetype="text/plain">
<content>Hello world. Here is some page content.</content>
</record>

Dave Watts

unread,
May 19, 2012, 7:36:48 PM5/19/12
to google-search-...@googlegroups.com
> Dave - the appliance is getting the metadata from the feed.  It
> appears to populate everything correctly with the correct number of
> hits.

I'm not sure what you mean by "populate everything correctly". Are you
running metadata searches successfully against the documents, using
inmeta, requiredfields or partialfields?

> Can I do queries from the search box?  If so, how?

Use the inmeta operator:

https://developers.google.com/search-appliance/documentation/614/xml_reference#inmeta_filter

> > Can you post the HTTP request header and the XML document header from your
> > feed?
>
> This is what I have now (is this all you want?):

No, I also want the HTTP request (actually, not just the header as I
mentioned before, but the request body as well). Within the HTTP
request, you also specify feedtype and datasource using two separate
form fields, and those values should match what's in your XML. I think
that the form fields will actually override what's in the XML, in
which case your metadata may well be ignored!

> > Just as a heads-up, you should be using the correct MIME type there (text/html).
>
> I have a question on this - the text in the content tag is actually
> just plain text, an abstract, even though it's referencing an html
> document.  Is the MIME type referencing the document or the <content>
> tag?

When you build a content feed, the document itself should be included,
not just an abstract. That document will be indexed, and the appliance
WILL NOT crawl the actual URL at all.

I strongly suggest you do one of the following:

1. Attend a training session - we offer two two-day GSA courses that
are developed by Google. Of course, I'm a bit biased, and this option
wouldn't be free. But I think that every question you've asked is
covered in the courses, so they would probably be helpful to you. If
you're interested in that, go here:

http://training.figleaf.com/Courses/Google.cfm

2. Read the feeds documentation in its entirety. Then, read it again!
It's fairly complicated if you haven't done it before.

https://developers.google.com/search-appliance/documentation/614/feedsguide

Grant

unread,
May 20, 2012, 1:25:23 PM5/20/12
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
> > This is what I have now (is this all you want?):
>
> No, I also want the HTTP request (actually, not just the header as I
> mentioned before, but the request body as well). Within the HTTP
> request, you also specify feedtype and datasource using two separate
> form fields, and those values should match what's in your XML. I think
> that the form fields will actually override what's in the XML, in
> which case your metadata may well be ignored!

I was using the form feed, so I wasn't actually writing the HTTP
request, I was just entering the information. But now that you
mention it, it's possible that this was the problem because I think I
renamed the datasource at one point which may have caused the issue.

> When you build a content feed, the document itself should be included,
> not just an abstract. That document will be indexed, and the appliance
> WILL NOT crawl the actual URL at all.

Dave - the documents being referenced are actually all going to be
PDFs articles from some of our Journals and newsletters. I will have
to investigate how to pull the text from the PDFs to be included in
the feed or whether that's feasible.

>
> I strongly suggest you do one of the following:
>
> 1. Attend a training session - we offer two two-day GSA courses that
> are developed by Google. Of course, I'm a bit biased, and this option
> wouldn't be free. But I think that every question you've asked is
> covered in the courses, so they would probably be helpful to you. If
> you're interested in that, go here:
>
> http://training.figleaf.com/Courses/Google.cfm

Believe it or not I did attend the two day training in Chicago last
year :). Unfortunately it was well before I'd really done anything
with the GSA. Now that I'm to the point of really trying to leverage
the capabilities of the appliance, I am running into things that might
not have resonated with me at the time. It probably would be worth
retaking the training though, you're right.

>
> 2. Read the feeds documentation in its entirety. Then, read it again!
> It's fairly complicated if you haven't done it before.
>
> https://developers.google.com/search-appliance/documentation/614/feed...
>

Thanks, I have referenced that documentation frequently. I think that
my original question of why the dynamic navigation didn't work when
the filters were actually being populated was not a result of me not
reading the documentation, but just a case unintende consequences when
I renamed the feed. It's possible it was just a data entry error into
the form. I think what threw me is that the filters appeared, but
just did not work. What I probably should have done was deleted the
stuff from the metadata and URL feed and retried it.

Thanks for your help!

Dave Watts

unread,
May 20, 2012, 2:06:09 PM5/20/12
to google-search-...@googlegroups.com
> Dave - the documents being referenced are actually all going to be
> PDFs articles from some of our Journals and newsletters.   I will have
> to investigate how to pull the text from the PDFs to be included in
> the feed or whether that's feasible.

In that case, to build a content feed, you'll want to embed the PDF in
its entirety, as a base64-encoded string, within the content child
element of the record element. This will require programming - you
won't be able to do it from an HTML form.

> Believe it or not I did attend the two day training in Chicago last
> year  :).  Unfortunately it was well before I'd really done anything
> with the GSA.  Now that I'm to the point of really trying to leverage
> the capabilities of the appliance, I am running into things that might
> not have resonated with me at the time.  It probably would be worth
> retaking the training though, you're right.

If you took it from us (which is pretty likely, I think) we do have an
audit policy. Contact our training department for more information - I
can give you that information offline if you need it.
Reply all
Reply to author
Forward
0 new messages