Sorting and range queries

3 views
Skip to first unread message

bterkuile

unread,
Aug 30, 2011, 6:04:15 AM8/30/11
to xapian_db
This is a post because I have been working with XapianDb and came
across some issues I would like to discuss with you since my
understanding of Xapian is still basic. To make it more clear what I
will discuss: I'm using it for filtering and sorting for a rails app
with a CouchDb backend. Other thing range queries. I started with date
ranges which took me some time because of an issue I will explain
(might save you time as well :). My ugly trick to support date range
queries is to add a rangeprocessor to the query parser for fields
matching the name =~ /date/ (really ugly I know but it works). To
make this work I had to modify the to_yaml behaviour of dates:
class Date
def to_yaml
strftime("%Y%m%d")
end
end
Which is also something I am not proud of but makes it work.

Another issue sorting numerical fields. Now they are sorted using
string comparison. 1 11 144 2 3 4 47 5 etc. The basic issue is the
to_yaml conversion for fields. I tried to remove this, but it only
made thing worse. Can you please explain me the logic of this, that
will help me fix other issues. Also a suggestion to replace the
to_yaml conversion with something like:
class Object
def to_xapian_value
to_yaml
end
end
(if needed) which allows overrides that are only for the index and
will not influence the to_yaml behaviour. To finalize here a summary
of my questions:
1. Do you know a good way to sort by numerical fields?
2. Do you know a proper way of adding range support (date and
numerical fields)?
3. Do you know how to make sorting case insensitive?
4 What is the idea behind the yaml conversion (if still relevant after
the previous questions).
I would love to help out and improve the gem since I think it is a
really nice one easing the use of xapian in rails! Thank you for that!

Regards,

Benjamin

Gernot

unread,
Aug 31, 2011, 1:48:30 AM8/31/11
to xapian_db
Hi Benjamin

The idea behind yaml serialisation of the attributes is simplicity.
Anything that is serializable by yaml can be stored as an document
attribute, be it simple objects like strings and dates or complex
objects like an active record object or even a collection of complex
objects. And I can restore the object using YAML::load (see
DocumentBlueprint#accessors_module). In the case of active record or
datamapper objects, I'm serializing the attributes hash only.

However, if you want to sort by a date or query a date range, a date
should be stored as a string in the format "yyyymmdd". My idea is
this:

- add an optional type parameter for blueprints, like
'blueprint.attribute :date_of_birth, :as => :date'
- store dates as strings in the format "yyyymmdd"
- support range processors

So, to your questions:

1: I will have to add type support for attributes (see above)
2: See above
3: I will look into this - didn't know it's case sensitive right
now ;-)
4: See above

If you want to help me, I will happily accept pull requests. All I'm
asking for is clean and well tested code.

Regards,
Gernot
Reply all
Reply to author
Forward
0 new messages