Having problems with a simple URL filter

49 views
Skip to first unread message

Paul Biggar

unread,
Jun 20, 2011, 12:35:48 PM6/20/11
to HNSearch
Hi folks,

I'm having problems filtering on the url field. I suspect I'm doing
something silly, but I've tried every combination I can think of.
Details of what I've tried are below, but I guess the question boils
down to "what is the correct way to search for an item with the url
'http://hnsearch.com'"?

Thanks,
Paul

Details:

The search
http://api.thriftdb.com/api.hnsearch.com/items/_search?q=hnsearch&pretty_print=true&filter[fields][url]=http://hnsearch.com
returns no entries.

There's a bit of ambiguity in the docs, where it looks like I should
be doing on of:

http://api.thriftdb.com/api.hnsearch.com/items/_search?q=hnsearch&pretty_print=true&filter[fields][url][]=http://hnsearch.com
http://api.thriftdb.com/api.hnsearch.com/items/_search?q=hnsearch&pretty_print=true&filter[fields][url][http://hnsearch.com]

But I tried them too to no avail (they filtered nothing as I recall).
I also tried URL encoding, but that didn't help.

Andres Morey

unread,
Jun 20, 2011, 1:54:29 PM6/20/11
to hnse...@googlegroups.com

Paul Biggar

unread,
Jun 23, 2011, 2:22:15 PM6/23/11
to hnse...@googlegroups.com
Hey,

Thanks for the quick reply, and sorry for the stupid mistake. I have
it working now, in many cases. Here's an example that doesn't work:

http://api.thriftdb.com/api.hnsearch.com/items/_search?q=stackoverflow&filter[fields][url]=http%3A%2F%2Fstackoverflow.com%2Fquestions%2F6441218%2Flocal-variables-memory-can-be-accessed-outside-its-scope%2F6445794%236445794&sortby=create_ts%20desc&limit=1&filter[fields][type]=submission&pretty_print=true

That should, as I understand it, find this
http://news.ycombinator.com/item?id=2686580. But I think I don't have
a good understanding of the query field. Would you be able to explain
why that query doesnt work?

Thanks,
Paul

--
Paul Biggar
paulbiggar.com
@paulbiggar

Andres Morey

unread,
Jun 23, 2011, 2:50:33 PM6/23/11
to hnse...@googlegroups.com
Hi Paul,

No problem! Happy to help.

Regarding your new query, there are two reasons why it isn't returning the result you are expecting:

1) the submission url is incorrect

you are searching for this url:

but the HN item's url is actually:


2) the item does not match on a fulltext query for "stackoverflow"


The string "stackoverflow" only appears in the item in the context of the "domain" attribute and the "url" attribute:

However, those attributes are StringTypes which means the search engine will only match on the full string, not words within the string:

Andres

Paul Biggar

unread,
Jun 23, 2011, 4:08:29 PM6/23/11
to hnse...@googlegroups.com
On Thu, Jun 23, 2011 at 11:50, Andres Morey <and...@octopart.com> wrote:
> Hi Paul,
> No problem! Happy to help.
> Regarding your new query, there are two reasons why it isn't returning the
> result you are expecting:
> 1) the submission url is incorrect
> you are searching for this url:
> http://stackoverflow.com/questions/6441218/local-variables-memory-can-be-accessed-outside-its-scope/6445794#6445794
> but the HN item's url is actually:
> http://stackoverflow.com/questions/6441218/c-local-variable-can-be-accessed-outside-its-scope/6445794#6445794

Right. So it seems there are redirects, and the HN submission wasn't a
canonical URL. (Similarly, the #6445794 at the end is a problem).


> 2) the item does not match on a fulltext query for "stackoverflow"
> When you use the "q" url argument, ThriftDB searches across all indexed
> fields for a match. In this case, "stackoverflow" doesn't match any field
> but "stackoverflow.com" does:
> http://api.thriftdb.com/api.hnsearch.com/items/_search?q=stackoverflow.com&filter[fields][url]=http%3A%2F%2Fstackoverflow.com%2Fquestions%2F6441218%2Fc-local-variable-can-be-accessed-outside-its-scope%2F6445794%236445794&pretty_print=true&filter[fields][type]=submission&sortby=create_ts%20desc&limit=1
> The string "stackoverflow" only appears in the item in the context of the
> "domain" attribute and the "url" attribute:
> http://api.thriftdb.com/api.hnsearch.com/items/2686580-0f445?pretty_print=true
> However, those attributes are StringTypes which means the search engine will
> only match on the full string, not words within the string:
> http://api.thriftdb.com/api.hnsearch.com/items?pretty_print=true

Great, thanks for the explanation. I think I understand the model well
enough to fix my problems now.

Thanks!
Paul

Reply all
Reply to author
Forward
0 new messages