SolrNet and BlockJoin

1,722 views
Skip to first unread message

Sergio García Maroto

unread,
Oct 24, 2013, 5:27:28 AM10/24/13
to sol...@googlegroups.com
Hi Mauricio.

First of all I would like to congratulate you for all nice work you have done so far with SolrNet.
I am using Solrnet at the moment and I discovered last week new Solr 4.5 Blockjoin functionality.
I would like to know if you have any idea on updating SolrNet to allow nested document indexation.
I would be able to help you adding this functionality if you hae any thoughts or advice let me know.

Regards,
Sergio

Mauricio Scheffer

unread,
Oct 27, 2013, 11:34:20 PM10/27/13
to sol...@googlegroups.com
Hi Sergio,

The way to do this right now is implementing a custom ISolrDocumentResponseParser<T> and ISolrDocumentSerializer<T> .
Overriding the document parser/serializer is quite powerful but I realize it may be a rather coarse abstraction sometimes.
Ultimately, I want to model documents and fields as first-class values. Here's a very rough first draft: https://gist.github.com/mausch/7191042
This would give us a more composable document model, and an intermediate model to serialize, between the fully-typed document type and the XML, which would make (de)serialization simpler, at the cost of some memory.
What do you think?

Cheers



--
Mauricio


--
You received this message because you are subscribed to the Google Groups "SolrNet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to solrnet+u...@googlegroups.com.
To post to this group, send email to sol...@googlegroups.com.
Visit this group at http://groups.google.com/group/solrnet.
For more options, visit https://groups.google.com/groups/opt_out.

Sergio García Maroto

unread,
Oct 29, 2013, 12:15:02 PM10/29/13
to sol...@googlegroups.com
I think make sense. I have another question what about how to map a Solr document to a class.
A the moment you can create a class which represents your schema like that.
public class Person {
    [SolrUniqueKey("id")]
    public string Id { get; set; }

    [SolrField("name")]
    public string Name { get; set; }
}
If we need to define nested documents with Person. For instance. One person can have many jobs and each job has two properties Name and Salary.
How would you represent this?






--
You received this message because you are subscribed to a topic in the Google Groups "SolrNet" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/solrnet/rqfihOOTsCQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to solrnet+u...@googlegroups.com.

Ted Jardine

unread,
Oct 29, 2013, 1:44:37 PM10/29/13
to sol...@googlegroups.com
Sergio,

With Solr, you have to remember to throw out your typical relational database thinking as the whole point is to flatten your data structure. As usual, Mauricio provides the best answer at http://stackoverflow.com/questions/5584857/solr-documents-with-child-elements. In other words, depending on your search needs, you'll need to de-normalize your data accordingly.

Specifically mapping a repeating field into your document class, you would do an ICollection (or whatever suits your purposes):

        private IList<string> _jobs = new List<string>();

        [SolrField("jobs")]
        public IList<string> Jobs
        {
            get { return _jobs; }
            set { _jobs = value; }
        }

Ted

Sergio García Maroto

unread,
Oct 30, 2013, 6:20:06 AM10/30/13
to sol...@googlegroups.com
Hi Ted and Maurizio

I will try to explain my problem a bit better, New Solr 4.5 functionality allows you to index complex data structures using a single index and schema.
Each document can represent a parent document or a child. You can find a nice article here http://blog.griddynamics.com/2013/09/solr-block-join-support.html?m=1.

My schema looks like below
<field name="PersonID"  type="string" indexed="true" stored="true"  /> 
<field name="Name"  type="string" indexed="true" stored="true"  /> 
<field name="type_s" type="string" indexed="true" stored="true" />
<field name="_root_" type="int" indexed="true" stored="true" multiValued="false" required="false"/> 
<!-- Job fields -->
<field name="JobStart"  type="date" indexed="true" stored="true" />
<field name="JobEnd"  type="date" indexed="true" stored="true" />   
<field name="JobPosition"  type="string" indexed="true" stored="true" />

I can index a document like this one below using update handler UI in Solr 4.5
<add>
<doc>
        <field name="PersonID">1</field>
        <field name="type_level">parent</field>
        <field name="Name">John</field>
<doc>
                <field name="PersonID">11</field>
                <field name="type_level">child</field>
<field name="JobStart">2000-01-01T00:00:00Z</field>
<field name="JobEnd">2005-01-01T00:00:00Z</field>
<field name="JobPosition">Consultant</field>
</doc>
<doc>
                <field name="PersonID">12</field>
                <field name="type_level">child</field>
<field name="JobStart">2000-01-01T00:00:00Z</field>
<field name="JobEnd">2005-01-01T00:00:00Z</field>
<field name="JobPosition">Manager</field>
</doc>
</doc>
</add>
<commit/>

This will create 3 records in Solr. and internally Solr will associate using "_root_" field to know all docs are related to the same parent.

What I want to do is update SolrNet to be able to Index this new idea. 

So my question is:
1) Should we create 3 Documents in Solr Net and create a new implementation of IDocumentSerializer which is able to understand children are attached to the previous parent?
2) Should we create a new way of represent a Parent, child in SolrSchema ?

Regards,
Sergio

Mauricio Scheffer

unread,
Oct 30, 2013, 9:11:42 AM10/30/13
to sol...@googlegroups.com
I was writing this answer last night but but connection died... well here it is anyway, it's as Sergio explains:

I agree with Ted that it's best to try to flatten the data structure, but Solr is quickly progressing towards a full-blown NoSQL document database. Things are changing. 
Especially since Elasticsearch, its main competitor, already has this and other features. 
And now Solr 4.5 implements child documents. I still haven't tried it, and there isn't a lot documentation about it, but you can see some examples in http://blog.griddynamics.com/2013/09/solr-block-join-support.html .
SolrNet should support this. I think this is a good excuse to introduce the document model I mentioned earlier.

Sergio: yes, when you write your own serializer you can pretty much map any model you want.



--
Mauricio

Sergio García Maroto

unread,
Oct 30, 2013, 10:34:41 AM10/30/13
to sol...@googlegroups.com
Ok. I will try to develop this. Do you want to create a branch?. Where should I implement these changes?

Regards,
Sergio

Mauricio Scheffer

unread,
Oct 30, 2013, 10:51:43 AM10/30/13
to sol...@googlegroups.com
Fork the repository, put things anywhere you want, they can be easily moved later if needed.

Cheers



--
Mauricio

Mauricio Scheffer

unread,
Nov 4, 2013, 11:42:27 PM11/4/13
to sol...@googlegroups.com
Hi Sergio,

I just took a look at your fork. While your solution will work, I want to avoid magic (i.e. ad-hoc) interfaces like ISolrParentDocument / ISolrNestedDocument. That's why I drafted that document model, so that we have a composable model of first-class fields and documents that we can use as a basis for serializers and attribute mapping. Actually I'd also like to get rid of attributes since it's magic as well, but I realize that's probably too big of a breaking change so at least I want to make it less ad-hoc by mapping things to this document model.

Here's some more work around that initial draft: https://github.com/mausch/SolrNet/compare/document-model

The next step is to introduce a serializer that returns this model instead of an XElement. Writing such serializers for custom document types should be much easier than the current serializers. Then the current document/dictionary serializers should be rewritten to return this new SolrDocument. The only serializer returning XElement should be the one taking a SolrDocument as input.

Also, the model should be modified to include the atomic update options ( http://wiki.apache.org/solr/Atomic_Updates ).

What do you think?

Cheers,
Mauricio

PS: Please keep all discussions about SolrNet in the mailing list.


On Thu, Oct 31, 2013 at 2:41 PM, Sergio García Maroto <maro...@gmail.com> wrote:


On 31 October 2013 17:40, Sergio García Maroto <maro...@gmail.com> wrote:
Hi Mauricio.

I finished a first draft. I didn't write a new Serializer.
Basically I created two interfazes and a class can implement a parent or child interface.
Previous functionality is still working. 

Have a look to the test and let know. SolrDocumentSerializerTests.


Regards,
Sergio

Randy

unread,
Aug 7, 2014, 8:00:09 PM8/7/14
to sol...@googlegroups.com
Mauricio, what's the current status of indexing nested documents? I've used your wonderful library for years and am only now stumbling upon a need for nested docs & block joins.I only really need them for writing, because we're only going to pull IDs out, but was hoping you'd worked your magic on this new feature already.

Thanks,
Randy

Mauricio Scheffer

unread,
Aug 18, 2014, 5:36:36 PM8/18/14
to sol...@googlegroups.com
Hi Randy, 

Nobody has expressed any further interest in developing this and I don't have a use for this feature at the moment, so the status is still the same.

Cheers

Zeeshan Ali

unread,
Oct 27, 2015, 5:41:19 AM10/27/15
to SolrNet
Hi Mauricio,

I don't see it being implemented sooner but I need to use that in one of my projects. So I guess for now I will have to stick to ISolrDocumentResponseParser<T> and ISolrDocumentSerializer<T>, do you have any samples or documentation that you think could be helpful.

Thanks

Philippe Grassia

unread,
Nov 17, 2015, 10:44:33 AM11/17/15
to SolrNet
I needed this for one of my projects and wanted to keep SolrNet managed via nuget so here is what worked for me (YMMV):
create a BlockJoinResultParser class

    public class BlockjoinResponseParser<T> : ISolrAbstractResponseParser<T>
    {
        private readonly ISolrDocumentResponseParser<T> docParser;

        public BlockjoinResponseParser(ISolrDocumentResponseParser<T> docParser)
        {
            this.docParser = docParser;
        }

        public void Parse(XDocument xml, AbstractSolrQueryResults<T> results)
        {
          var expandedNode = xml.Element("response")
                 .Elements("lst")
                 .FirstOrDefault(X.AttrEq("name", "expanded"));
            

            if (expandedNode != null)
            {
                foreach(var parent in expandedNode.Elements())
                {
                    var lst = this.docParser.ParseResults(parent);
                    results.AddRange(lst);
                }
            }

        }

basically this happens to docs in the "expanded" section to the query results. this means:
- your document model needs to accommodate the possible different structures of the 2 types of documents. 
- your application logic must keep that in mind and process the doc accordingly (e.g. detect a content_type value)

at run time unregister the default document parser and register a new one which is an aggregate of the defaultdocumentresponseparser and the new blockjoinresponseparser

            // unregister the *incomplete* default document parser
            SolrNet.Startup.Container.Remove<ISolrAbstractResponseParser<myDocumentType>>();
            
            // register my document parser which is an aggregate of the default one and the extra BlockJoinResponseParser        
            SolrNet.Startup.Container.Register<ISolrAbstractResponseParser<myDocumentType>>(c => new AggregateResponseParser<myDocumentType>(new ISolrAbstractResponseParser<myDocumentType>[] {
                new DefaultResponseParser<myDocumentType>(c.GetInstance<ISolrDocumentResponseParser<myDocumentType>>()),
                new BlockjoinResponseParser<myDocumentType>(c.GetInstance<ISolrDocumentResponseParser<myDocumentType>>())
            })); 


that's pretty much it. if you're willing to get back to the source, it would probably a definite option return the joined documents in a specific field of results modeled like the facets are. 


Hoping this helps 

Philippe

Alex Wainger

unread,
Jun 17, 2016, 10:34:49 AM6/17/16
to SolrNet
Hey Phillippe,

I'm trying to implement the blockjoinresultparser you posted about above, but I'm getting an error on the line 

var expandedNode = xml.Element("response")
    .Elements("lst")
    .FirstOrDefault(X.AttrEq("name", "expanded"));

It says X does not exist in the current context. Is there a using directive I need for that?

Best,
Alex
Reply all
Reply to author
Forward
0 new messages