Extend RavenDb Faceted Search (GetFacet)

301 views
Skip to first unread message

Maxime Beaudoin

unread,
Apr 23, 2012, 3:43:50 PM4/23/12
to rav...@googlegroups.com
Hi, I need to add custom code around line 74 of the FacetedQueryRunner, see: https://github.com/ravendb/ravendb/blob/master/Raven.Database/Queries/FacetedQueryRunner.cs.

How to get started? Is that manageable through bundles or plugins?

Itamar Syn-Hershko

unread,
Apr 23, 2012, 4:13:56 PM4/23/12
to rav...@googlegroups.com
Doing what?

We will accept pull request if it will make sense

Maxime Beaudoin

unread,
Apr 23, 2012, 4:27:01 PM4/23/12
to rav...@googlegroups.com
Well, actually I might be mistaken on the exact line and file but the idea is to extend RavenDb in order to support a hierarchical taxonomy or hierarchical facets and values.

See this resource for a concrete example: http://wiki.apache.org/solr/HierarchicalFaceting
I'm calling this an advanced scenario but it's pretty much basic nowadays.

First, I'd like to be able to replicate that specific scenario. Basically, Raven should allow some sort of extensibility in handling terms and facets when faceting.

Maybe you can point me to the right direction?

Itamar Syn-Hershko

unread,
Apr 23, 2012, 4:58:12 PM4/23/12
to rav...@googlegroups.com
It is absolutely doable, it's just a matter of figuring out the API and actually implementing this. Note the indexing time processing that is required - something needs to tell RavenDB to invoke this. And it needs to make sense in the client API.

Take a look at the Facets responder and FacetQueryRunner on the server side,  and the GetFacets method on the client side.

You need to think of how those hierarchies will be represented in your object, and move from there

Maxime Beaudoin

unread,
Apr 23, 2012, 5:12:12 PM4/23/12
to rav...@googlegroups.com
I familiarized myself a little with the server side GetFacet; I tried to re-implement the whole thing on the client side with an obvious issue: most assemblies are server side.

Now, I understand that it's doable with a "Facet responder". However, only triggers are documented on the official website, is that what you meant when you said "figuring out the API" :)?

As for how to represent the hierarchies, they are mostly part of the faceted values such as "fruit/banana" where "banana" is under "fruit". Also, a fictional "HierarchicalPathAnalyzer" could probably compute the paths for the client.

Here I go... Let me know if I'm wrong.

Maxime Beaudoin

unread,
Apr 23, 2012, 5:53:26 PM4/23/12
to rav...@googlegroups.com
It's been a long way since the stable release.. I can't use the Facets responder as is. I can't say if I can do it without working on the trunk which mean I would have to work on an unstable release arrg.

Matt Warren

unread,
Apr 23, 2012, 6:01:04 PM4/23/12
to rav...@googlegroups.com
RavenDB facets already allow you to specify a search parameter, I'm pretty sure you just this to mimic the example you posted?

If your index is something like this:
    from book in docs
    from category in book.Categories
    let levels = category.Split(new [] " > ")
    from level in levels
    //We want to convert "NonFic > Sci > Phys" into "2/NonFic/Sci/Phys"
    select new {
              Hierachy = levels.Count + "/" + category.Replace(" > ", "/"),
              LevelCount = levels.Count //Might also be useful to store the count on it's own
              RawHierachy = category.Replace(" > ", "/")
           }

You can then do a basic query like this:
         var facetResults = s.Query<T>("BookCatergories")                       
                        .ToFacets("facets/BookCatergories");

And a hierachial one like this (find all books with a cagetory below "NonFic/Sci"
         var facetResults = s.Query<T>("BookCatergories")                       
                        .Where(x => x.Hierachy.StartsWith("NonFic/Sci"))
                        .ToFacets("facets/BookCatergories");

Maxime Beaudoin

unread,
Apr 23, 2012, 6:21:09 PM4/23/12
to rav...@googlegroups.com
Hi Matt, thanks for replying.

It seems very flexible.. Somehow I just missed the .StartWith() part. Not sure if it's because I told myself not to use that for some fictive performance issues.

Anyway, here's my test:

using System.Collections.Generic;
using System.Linq;
using NUnit.Framework;
using Raven.Abstractions.Data;
using Raven.Client;
using Raven.Client.Document;
using Raven.Client.Indexes;
using Raven.Client.Linq;

namespace Prototype.Search.Tests
{
    [TestFixture]
    public class HierarchicalFaceting
    {
        //
        // Document definition
        //
        public class Doc
        {
            public Doc()
            {
                Categories = new List<string>();
            }

            public int Id { get; set; }
            public List<string> Categories { get; set; }
        }

        //
        // Data sample
        //
        public IEnumerable<Doc>  GetDocs()
        {
            yield return new Doc { Id = 1, Categories = new List<string> { "0/NonFic", "1/NonFic/Law"} };
            yield return new Doc { Id = 2, Categories = new List<string> { "0/NonFic", "1/NonFic/Sci" } };
            yield return new Doc { Id = 3, Categories = new List<string> { "0/NonFic", "1/NonFic/Hist", "1/NonFic/Sci", "2/NonFic/Sci/Phys" } };
        }

        //
        // The index
        //
        public class DocByCategory : AbstractIndexCreationTask<Doc, DocByCategory.ReduceResult>
        {
            public class ReduceResult
            {
                public string Category { get; set; }
            }
            
            public DocByCategory()
            {
                Map = docs =>
                      from d in docs
                      from c in d.Categories
                      select new
                                 {
                                     Category = c
                                 };
            }
        }

        //
        // FacetSetup
        //
        public FacetSetup GetDocFacetSetup()
        {
            return new FacetSetup
                       {
                           Id = "facets/Doc",
                           Facets = new List<Facet>
                                        {
                                            new Facet
                                                {
                                                    Name = "Category"
                                                }
                                        }
                       };
        }

        [SetUp]
        public void SetupDb()
        {
            IDocumentStore store = new DocumentStore()
            {
                Url = "http://localhost:8080"
            };
            store.Initialize();
            IndexCreation.CreateIndexes(typeof(HierarchicalFaceting).Assembly, store);

            var session = store.OpenSession();
            session.Store(GetDocFacetSetup());
            session.SaveChanges();

            store.Dispose();
        }

        [Test]
        [Ignore]
        public void DeleteAll()
        {
            IDocumentStore store = new DocumentStore()
            {
                Url = "http://localhost:8080"
            };
            store.Initialize();

            store.DatabaseCommands.DeleteIndex("Raven/DocByCategory");
            store.DatabaseCommands.DeleteByIndex("Raven/DocumentsByEntityName", new IndexQuery());

            store.Dispose();
        }

        [Test]
        [Ignore]
        public void StoreDocs()
        {
            IDocumentStore store = new DocumentStore()
            {
                Url = "http://localhost:8080"
            };
            store.Initialize();

            var session = store.OpenSession();

            foreach (var doc in GetDocs())
            {
                session.Store(doc);
            }

            session.SaveChanges();
            session.Dispose();
            store.Dispose();
        }

        [Test]
        public void QueryDocsByCategory()
        {
            IDocumentStore store = new DocumentStore()
            {
                Url = "http://localhost:8080"
            };
            store.Initialize();

            var session = store.OpenSession();

            var q = session.Query<DocByCategory.ReduceResult, DocByCategory>()
                .Where(d => d.Category == "1/NonFic/Sci")
                .As<Doc>();

            var results = q.ToList();
            var facetResults = q.ToFacets("facets/Doc").ToList();

            session.Dispose();
            store.Dispose();
        }

        [Test]
        public void GetFacets()
        {
            IDocumentStore store = new DocumentStore()
            {
                Url = "http://localhost:8080"
            };
            store.Initialize();

            var session = store.OpenSession();

            var q = session.Query<DocByCategory.ReduceResult, DocByCategory>()
                .Where(d => d.Category.StartsWith("1/NonFic"))
                .As<Doc>();

            var results = q.ToList();
            var facetResults = q.ToFacets("facets/Doc").ToList();

            session.Dispose();
            store.Dispose();
        }
    }
}

Maxime Beaudoin

unread,
Apr 23, 2012, 6:22:34 PM4/23/12
to rav...@googlegroups.com
I'm still unsure why they used a Depth.. It seems to make no real difference.

Matt Warren

unread,
Apr 23, 2012, 6:31:32 PM4/23/12
to rav...@googlegroups.com
Does that test give you the facet results you expect?


> I'm still unsure why they used a Depth.. It seems to make no real difference.
I think it's there if you want to limit the depth it can be found at. But you're right I'm not sure what actual scenarios it would be useful because you're already specifying the prefix.

Matt Warren

unread,
Apr 23, 2012, 6:34:12 PM4/23/12
to rav...@googlegroups.com
> It seems very flexible.. Somehow I just missed the .StartWith() part. Not sure if it's because I told myself not > to use that for some fictive performance issues.

Well in Lucene StartsWith queries are okay, it's EndsWith ones that have a perf issue.

So "matt*" is okay, but "*tthew" could be slow

Maxime Beaudoin

unread,
Apr 23, 2012, 8:30:27 PM4/23/12
to rav...@googlegroups.com
So far yes, thank you! :)
Reply all
Reply to author
Forward
0 new messages