Review Request: make character search only search the character field

1 view
Skip to first unread message

Ralf Haring

unread,
Oct 1, 2010, 11:47:13 AM10/1/10
to GGD Tech Group, Ralf Haring

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.comics.org/r/573/
-----------------------------------------------------------

Review request for GGD Tech Group.


Summary
-------

Fix the character search so that it only searches the character field. The site currently duplicates the badly broken functionality of the old site where it searches both character and feature fields.


Diffs
-----

/pydjango/apps/gcd/views/search.py 810

Diff: http://reviews.comics.org/r/573/diff


Testing
-------

Searched for something that is only a feature in the character search and received no hits.


Thanks,

Ralf

Henry Andrews

unread,
Oct 1, 2010, 11:58:11 AM10/1/10
to GGD Tech Group, Ralf Haring, Henry Andrews

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.comics.org/r/573/#review1325
-----------------------------------------------------------


No, we went through this before. You filed a bug, we argued about it, and you lost. This bug was resolved wontfix as the result of mailing list discussions: http://dev.comics.org/bugs/show_bug.cgi?id=462

The intention of the simple search is to do a comprehensive search where characters are likely to be. If you want to search just the field, use the advanced search.

- Henry


On 2010-10-01 15:47:12, Ralf Haring wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.comics.org/r/573/
> -----------------------------------------------------------
>

> (Updated 2010-10-01 15:47:12)

Ralf Haring

unread,
Oct 1, 2010, 12:09:42 PM10/1/10
to GGD Tech Group, Ralf Haring, Henry Andrews

> On 2010-10-01 15:58:11, Henry Andrews wrote:
> > No, we went through this before. You filed a bug, we argued about it, and you lost. This bug was resolved wontfix as the result of mailing list discussions: http://dev.comics.org/bugs/show_bug.cgi?id=462
> >
> > The intention of the simple search is to do a comprehensive search where characters are likely to be. If you want to search just the field, use the advanced search.

I closed it because I was annoyed with contributing at all to the gcd tech-wise at the time. This was always and still is a problem.


- Ralf


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.comics.org/r/573/#review1325
-----------------------------------------------------------

Ralf Haring

unread,
Oct 1, 2010, 12:09:47 PM10/1/10
to GGD Tech Group, Ralf Haring, Henry Andrews

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.comics.org/r/573/
-----------------------------------------------------------

(Updated 2010-10-01 16:09:47.516788)


Review request for GGD Tech Group.


Summary
-------

Fix the character search so that it only searches the character field. The site currently duplicates the badly broken functionality of the old site where it searches both character and feature fields.


This addresses bug 462.
http://dev.comics.org/bugs/show_bug.cgi?id=462

Henry Andrews

unread,
Oct 1, 2010, 12:34:37 PM10/1/10
to GGD Tech Group, Ralf Haring, Henry Andrews

> On 2010-10-01 15:58:11, Henry Andrews wrote:

> > No, we went through this before. You filed a bug, we argued about it, and you lost. This bug was resolved wontfix as the result of mailing list discussions: http://dev.comics.org/bugs/show_bug.cgi?id=462
> >
> > The intention of the simple search is to do a comprehensive search where characters are likely to be. If you want to search just the field, use the advanced search.
>

> Ralf Haring wrote:
> I closed it because I was annoyed with contributing at all to the gcd tech-wise at the time. This was always and still is a problem.

The correct course of action here is to first run a data cleanup project, and then once the character field is sufficiently well-filled-out that casual users of the simple search box will get the results they expect from just searching that field, then change the search behavior. In the meantime it is more important that casual uses of the site find as many references to the character they type in the box as possible, than that the box match the underlying schema directly. The casual user doesn't care about our schema or data weirdness problems.


- Henry


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.comics.org/r/573/#review1325
-----------------------------------------------------------


On 2010-10-01 16:09:47, Ralf Haring wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.comics.org/r/573/
> -----------------------------------------------------------
>

> (Updated 2010-10-01 16:09:47)


>
>
> Review request for GGD Tech Group.
>
>
> Summary
> -------
>
> Fix the character search so that it only searches the character field. The site currently duplicates the badly broken functionality of the old site where it searches both character and feature fields.
>
>

> This addresses bug 462.
> http://dev.comics.org/bugs/show_bug.cgi?id=462
>
>

Lionel English

unread,
Oct 1, 2010, 12:49:35 PM10/1/10
to GGD Tech Group, Lionel English, Ralf Haring, Henry Andrews

> On 2010-10-01 15:58:11, Henry Andrews wrote:

> > No, we went through this before. You filed a bug, we argued about it, and you lost. This bug was resolved wontfix as the result of mailing list discussions: http://dev.comics.org/bugs/show_bug.cgi?id=462
> >
> > The intention of the simple search is to do a comprehensive search where characters are likely to be. If you want to search just the field, use the advanced search.
>

> Ralf Haring wrote:
> I closed it because I was annoyed with contributing at all to the gcd tech-wise at the time. This was always and still is a problem.
>

> Henry Andrews wrote:
> The correct course of action here is to first run a data cleanup project, and then once the character field is sufficiently well-filled-out that casual users of the simple search box will get the results they expect from just searching that field, then change the search behavior. In the meantime it is more important that casual uses of the site find as many references to the character they type in the box as possible, than that the box match the underlying schema directly. The casual user doesn't care about our schema or data weirdness problems.

Changing the search behavior seems like a waste of time because when the Solr integration is complete it will just search all the fields anyway. Fixing the actual data seems like a more appropriate solution. Can we run a query of stories where the feature is filled in and the value in feature does not appear within the character field? That will obviously generate a lot of hits that aren't actually problems, but it will also provide a starting point for people who like to work on clean-up projects.


- Lionel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.comics.org/r/573/#review1325
-----------------------------------------------------------


On 2010-10-01 16:09:47, Ralf Haring wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.comics.org/r/573/
> -----------------------------------------------------------
>

> (Updated 2010-10-01 16:09:47)


>
>
> Review request for GGD Tech Group.
>
>
> Summary
> -------
>
> Fix the character search so that it only searches the character field. The site currently duplicates the badly broken functionality of the old site where it searches both character and feature fields.
>
>

> This addresses bug 462.
> http://dev.comics.org/bugs/show_bug.cgi?id=462
>
>

Ralf Haring

unread,
Oct 1, 2010, 1:40:56 PM10/1/10
to GGD Tech Group, Lionel English, Ralf Haring, Henry Andrews

> On 2010-10-01 15:58:11, Henry Andrews wrote:

> > No, we went through this before. You filed a bug, we argued about it, and you lost. This bug was resolved wontfix as the result of mailing list discussions: http://dev.comics.org/bugs/show_bug.cgi?id=462
> >
> > The intention of the simple search is to do a comprehensive search where characters are likely to be. If you want to search just the field, use the advanced search.
>

> Ralf Haring wrote:
> I closed it because I was annoyed with contributing at all to the gcd tech-wise at the time. This was always and still is a problem.
>
> Henry Andrews wrote:
> The correct course of action here is to first run a data cleanup project, and then once the character field is sufficiently well-filled-out that casual users of the simple search box will get the results they expect from just searching that field, then change the search behavior. In the meantime it is more important that casual uses of the site find as many references to the character they type in the box as possible, than that the box match the underlying schema directly. The casual user doesn't care about our schema or data weirdness problems.
>

> Lionel English wrote:
> Changing the search behavior seems like a waste of time because when the Solr integration is complete it will just search all the fields anyway. Fixing the actual data seems like a more appropriate solution. Can we run a query of stories where the feature is filled in and the value in feature does not appear within the character field? That will obviously generate a lot of hits that aren't actually problems, but it will also provide a starting point for people who like to work on clean-up projects.

I don't think a cleanup project is achievable. What kind of query are you going to run to create a manageable list of features that should be looked at for possible movement to the character field? A list of stories where the feature string is not present in the character field will be on the order of 600K stories (840K total).


- Ralf


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.comics.org/r/573/#review1325
-----------------------------------------------------------


On 2010-10-01 16:09:47, Ralf Haring wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.comics.org/r/573/
> -----------------------------------------------------------
>

> (Updated 2010-10-01 16:09:47)


>
>
> Review request for GGD Tech Group.
>
>
> Summary
> -------
>
> Fix the character search so that it only searches the character field. The site currently duplicates the badly broken functionality of the old site where it searches both character and feature fields.
>
>

> This addresses bug 462.
> http://dev.comics.org/bugs/show_bug.cgi?id=462
>
>

Henry Andrews

unread,
Oct 1, 2010, 2:09:26 PM10/1/10
to GGD Tech Group, Lionel English, Ralf Haring, Henry Andrews

> On 2010-10-01 15:58:11, Henry Andrews wrote:

> > No, we went through this before. You filed a bug, we argued about it, and you lost. This bug was resolved wontfix as the result of mailing list discussions: http://dev.comics.org/bugs/show_bug.cgi?id=462
> >
> > The intention of the simple search is to do a comprehensive search where characters are likely to be. If you want to search just the field, use the advanced search.
>

> Ralf Haring wrote:
> I closed it because I was annoyed with contributing at all to the gcd tech-wise at the time. This was always and still is a problem.
>
> Henry Andrews wrote:
> The correct course of action here is to first run a data cleanup project, and then once the character field is sufficiently well-filled-out that casual users of the simple search box will get the results they expect from just searching that field, then change the search behavior. In the meantime it is more important that casual uses of the site find as many references to the character they type in the box as possible, than that the box match the underlying schema directly. The casual user doesn't care about our schema or data weirdness problems.
>
> Lionel English wrote:
> Changing the search behavior seems like a waste of time because when the Solr integration is complete it will just search all the fields anyway. Fixing the actual data seems like a more appropriate solution. Can we run a query of stories where the feature is filled in and the value in feature does not appear within the character field? That will obviously generate a lot of hits that aren't actually problems, but it will also provide a starting point for people who like to work on clean-up projects.
>

> Ralf Haring wrote:
> I don't think a cleanup project is achievable. What kind of query are you going to run to create a manageable list of features that should be looked at for possible movement to the character field? A list of stories where the feature string is not present in the character field will be on the order of 600K stories (840K total).

For stories that have the character field filled out at all, it's about 130K stories. You can immediately start blocking out features that aren't characters. You can block out a bunch that are correct but don't match up right:

| 67 | Max and Maurice | Max; Maurice |
| 69 | Brown, Jones, and Robinson | Mr. Brown; Mr. Jones; Mr. Robinson |
| 141 | Buster Brown & Tige | Buster Brown; Tige |

It's a big job but it's far from impossible, and a little work on tools, like what Alexandros has done to track bad character replacements but a bit more complex, would go a long way. Cleaning up the type field wasn't a small amount of work either, but dedicated volunteers got the job done. That's what we should be doing here, not throwing up our hands.


- Henry


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.comics.org/r/573/#review1325
-----------------------------------------------------------


On 2010-10-01 16:09:47, Ralf Haring wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.comics.org/r/573/
> -----------------------------------------------------------
>

> (Updated 2010-10-01 16:09:47)


>
>
> Review request for GGD Tech Group.
>
>
> Summary
> -------
>
> Fix the character search so that it only searches the character field. The site currently duplicates the badly broken functionality of the old site where it searches both character and feature fields.
>
>

> This addresses bug 462.
> http://dev.comics.org/bugs/show_bug.cgi?id=462
>
>

Lionel English

unread,
Oct 1, 2010, 2:09:39 PM10/1/10
to GGD Tech Group, Lionel English, Ralf Haring, Henry Andrews

> On 2010-10-01 15:58:11, Henry Andrews wrote:

> > No, we went through this before. You filed a bug, we argued about it, and you lost. This bug was resolved wontfix as the result of mailing list discussions: http://dev.comics.org/bugs/show_bug.cgi?id=462
> >
> > The intention of the simple search is to do a comprehensive search where characters are likely to be. If you want to search just the field, use the advanced search.
>

> Ralf Haring wrote:
> I closed it because I was annoyed with contributing at all to the gcd tech-wise at the time. This was always and still is a problem.
>
> Henry Andrews wrote:
> The correct course of action here is to first run a data cleanup project, and then once the character field is sufficiently well-filled-out that casual users of the simple search box will get the results they expect from just searching that field, then change the search behavior. In the meantime it is more important that casual uses of the site find as many references to the character they type in the box as possible, than that the box match the underlying schema directly. The casual user doesn't care about our schema or data weirdness problems.
>
> Lionel English wrote:
> Changing the search behavior seems like a waste of time because when the Solr integration is complete it will just search all the fields anyway. Fixing the actual data seems like a more appropriate solution. Can we run a query of stories where the feature is filled in and the value in feature does not appear within the character field? That will obviously generate a lot of hits that aren't actually problems, but it will also provide a starting point for people who like to work on clean-up projects.
>
> Ralf Haring wrote:
> I don't think a cleanup project is achievable. What kind of query are you going to run to create a manageable list of features that should be looked at for possible movement to the character field? A list of stories where the feature string is not present in the character field will be on the order of 600K stories (840K total).
>

> Henry Andrews wrote:
> For stories that have the character field filled out at all, it's about 130K stories. You can immediately start blocking out features that aren't characters. You can block out a bunch that are correct but don't match up right:
>
> | 67 | Max and Maurice | Max; Maurice |
> | 69 | Brown, Jones, and Robinson | Mr. Brown; Mr. Jones; Mr. Robinson |
> | 141 | Buster Brown & Tige | Buster Brown; Tige |
>
> It's a big job but it's far from impossible, and a little work on tools, like what Alexandros has done to track bad character replacements but a bit more complex, would go a long way. Cleaning up the type field wasn't a small amount of work either, but dedicated volunteers got the job done. That's what we should be doing here, not throwing up our hands.

It won't be easy or quick, but it does need to be done anyway at some point.


- Lionel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.comics.org/r/573/#review1325
-----------------------------------------------------------


On 2010-10-01 16:09:47, Ralf Haring wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.comics.org/r/573/
> -----------------------------------------------------------
>

> (Updated 2010-10-01 16:09:47)


>
>
> Review request for GGD Tech Group.
>
>
> Summary
> -------
>
> Fix the character search so that it only searches the character field. The site currently duplicates the badly broken functionality of the old site where it searches both character and feature fields.
>
>

> This addresses bug 462.
> http://dev.comics.org/bugs/show_bug.cgi?id=462
>
>

Reply all
Reply to author
Forward
0 new messages