SolrNet and BlockJoin

Sergio García Maroto

unread,

Oct 24, 2013, 5:27:28 AM10/24/13

to sol...@googlegroups.com

Hi Mauricio.

First of all I would like to congratulate you for all nice work you have done so far with SolrNet.

I am using Solrnet at the moment and I discovered last week new Solr 4.5 Blockjoin functionality.

I would like to know if you have any idea on updating SolrNet to allow nested document indexation.

I would be able to help you adding this functionality if you hae any thoughts or advice let me know.

Regards,

Sergio

Mauricio Scheffer

unread,

Oct 27, 2013, 11:34:20 PM10/27/13

to sol...@googlegroups.com

Hi Sergio,

The way to do this right now is implementing a custom ISolrDocumentResponseParser<T> and ISolrDocumentSerializer<T> .

Overriding the document parser/serializer is quite powerful but I realize it may be a rather coarse abstraction sometimes.

Ultimately, I want to model documents and fields as first-class values. Here's a very rough first draft: https://gist.github.com/mausch/7191042

This would give us a more composable document model, and an intermediate model to serialize, between the fully-typed document type and the XML, which would make (de)serialization simpler, at the cost of some memory.

What do you think?

Cheers

--

Mauricio

--
You received this message because you are subscribed to the Google Groups "SolrNet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to solrnet+u...@googlegroups.com.
To post to this group, send email to sol...@googlegroups.com.
Visit this group at http://groups.google.com/group/solrnet.
For more options, visit https://groups.google.com/groups/opt_out.

Sergio García Maroto

unread,

Oct 29, 2013, 12:15:02 PM10/29/13

to sol...@googlegroups.com

I think make sense. I have another question what about how to map a Solr document to a class.

A the moment you can create a class which represents your schema like that.

public class Person {
    [SolrUniqueKey("id")]
    public string Id { get; set; }

    [SolrField("name")]
    public string Name { get; set; }
}

If we need to define nested documents with Person. For instance. One person can have many jobs and each job has two properties Name and Salary.

How would you represent this?

--
You received this message because you are subscribed to a topic in the Google Groups "SolrNet" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/solrnet/rqfihOOTsCQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to solrnet+u...@googlegroups.com.

Ted Jardine

unread,

Oct 29, 2013, 1:44:37 PM10/29/13

to sol...@googlegroups.com

Sergio,

With Solr, you have to remember to throw out your typical relational database thinking as the whole point is to flatten your data structure. As usual, Mauricio provides the best answer at http://stackoverflow.com/questions/5584857/solr-documents-with-child-elements. In other words, depending on your search needs, you'll need to de-normalize your data accordingly.

Specifically mapping a repeating field into your document class, you would do an ICollection (or whatever suits your purposes):

private IList<string> _jobs = new List<string>();

[SolrField("jobs")]

public IList<string> Jobs

{

get { return _jobs; }

set { _jobs = value; }

}

Ted

Sergio García Maroto

unread,

Oct 30, 2013, 6:20:06 AM10/30/13

to sol...@googlegroups.com

Hi Ted and Maurizio

I will try to explain my problem a bit better, New Solr 4.5 functionality allows you to index complex data structures using a single index and schema.

Each document can represent a parent document or a child. You can find a nice article here http://blog.griddynamics.com/2013/09/solr-block-join-support.html?m=1.

My schema looks like below

I can index a document like this one below using update handler UI in Solr 4.5

<add>

<doc>

<field name="type_level">parent</field>

<doc>

<field name="type_level">child</field>

<field name="JobPosition">Consultant</field>

</doc>

<doc>

<field name="type_level">child</field>

<field name="JobPosition">Manager</field>

</doc>

</add>

This will create 3 records in Solr. and internally Solr will associate using "_root_" field to know all docs are related to the same parent.

What I want to do is update SolrNet to be able to Index this new idea.

So my question is:

1) Should we create 3 Documents in Solr Net and create a new implementation of IDocumentSerializer which is able to understand children are attached to the previous parent?

2) Should we create a new way of represent a Parent, child in SolrSchema ?

Regards,

Sergio

Mauricio Scheffer

unread,

Oct 30, 2013, 9:11:42 AM10/30/13

to sol...@googlegroups.com

I was writing this answer last night but but connection died... well here it is anyway, it's as Sergio explains:

I agree with Ted that it's best to try to flatten the data structure, but Solr is quickly progressing towards a full-blown NoSQL document database. Things are changing.

Especially since Elasticsearch, its main competitor, already has this and other features.

And now Solr 4.5 implements child documents. I still haven't tried it, and there isn't a lot documentation about it, but you can see some examples in http://blog.griddynamics.com/2013/09/solr-block-join-support.html .

SolrNet should support this. I think this is a good excuse to introduce the document model I mentioned earlier.

Sergio: yes, when you write your own serializer you can pretty much map any model you want.

--

Mauricio

Sergio García Maroto

unread,

Oct 30, 2013, 10:34:41 AM10/30/13

to sol...@googlegroups.com

Ok. I will try to develop this. Do you want to create a branch?. Where should I implement these changes?

Regards,

Sergio

Mauricio Scheffer

unread,

Oct 30, 2013, 10:51:43 AM10/30/13

to sol...@googlegroups.com

Fork the repository, put things anywhere you want, they can be easily moved later if needed.

Cheers

--

Mauricio

Mauricio Scheffer

unread,

Nov 4, 2013, 11:42:27 PM11/4/13

to sol...@googlegroups.com

Hi Sergio,

I just took a look at your fork. While your solution will work, I want to avoid magic (i.e. ad-hoc) interfaces like ISolrParentDocument / ISolrNestedDocument. That's why I drafted that document model, so that we have a composable model of first-class fields and documents that we can use as a basis for serializers and attribute mapping. Actually I'd also like to get rid of attributes since it's magic as well, but I realize that's probably too big of a breaking change so at least I want to make it less ad-hoc by mapping things to this document model.

Here's some more work around that initial draft: https://github.com/mausch/SolrNet/compare/document-model

The next step is to introduce a serializer that returns this model instead of an XElement. Writing such serializers for custom document types should be much easier than the current serializers. Then the current document/dictionary serializers should be rewritten to return this new SolrDocument. The only serializer returning XElement should be the one taking a SolrDocument as input.

Also, the model should be modified to include the atomic update options ( http://wiki.apache.org/solr/Atomic_Updates ).

What do you think?

Cheers,

Mauricio

PS: Please keep all discussions about SolrNet in the mailing list.

On Thu, Oct 31, 2013 at 2:41 PM, Sergio García Maroto <maro...@gmail.com> wrote:

Source code is here https://github.com/marotosg/SolrNet/trunk

On 31 October 2013 17:40, Sergio García Maroto <maro...@gmail.com> wrote:

Hi Mauricio.

I finished a first draft. I didn't write a new Serializer.

Basically I created two interfazes and a class can implement a parent or child interface.
Previous functionality is still working.

Have a look to the test and let know. SolrDocumentSerializerTests.

Regards,
Sergio

Randy

unread,

Aug 7, 2014, 8:00:09 PM8/7/14

to sol...@googlegroups.com

Mauricio, what's the current status of indexing nested documents? I've used your wonderful library for years and am only now stumbling upon a need for nested docs & block joins.I only really need them for writing, because we're only going to pull IDs out, but was hoping you'd worked your magic on this new feature already.

Thanks,

Randy

Mauricio Scheffer

unread,

Aug 18, 2014, 5:36:36 PM8/18/14

to sol...@googlegroups.com

Hi Randy,

Nobody has expressed any further interest in developing this and I don't have a use for this feature at the moment, so the status is still the same.

Cheers

Zeeshan Ali

unread,

Oct 27, 2015, 5:41:19 AM10/27/15

to SolrNet

Hi Mauricio,

I don't see it being implemented sooner but I need to use that in one of my projects. So I guess for now I will have to stick to ISolrDocumentResponseParser<T> and ISolrDocumentSerializer<T>, do you have any samples or documentation that you think could be helpful.

Thanks

Philippe Grassia

unread,

Nov 17, 2015, 10:44:33 AM11/17/15

to SolrNet

I needed this for one of my projects and wanted to keep SolrNet managed via nuget so here is what worked for me (YMMV):

create a BlockJoinResultParser class

public class BlockjoinResponseParser<T> : ISolrAbstractResponseParser<T>

{

private readonly ISolrDocumentResponseParser<T> docParser;

public BlockjoinResponseParser(ISolrDocumentResponseParser<T> docParser)

{

this.docParser = docParser;

}

public void Parse(XDocument xml, AbstractSolrQueryResults<T> results)

{

var expandedNode = xml.Element("response")

.Elements("lst")

.FirstOrDefault(X.AttrEq("name", "expanded"));

if (expandedNode != null)

{

foreach(var parent in expandedNode.Elements())

{

var lst = this.docParser.ParseResults(parent);

results.AddRange(lst);

}

basically this happens to docs in the "expanded" section to the query results. this means:

- your document model needs to accommodate the possible different structures of the 2 types of documents.

- your application logic must keep that in mind and process the doc accordingly (e.g. detect a content_type value)

at run time unregister the default document parser and register a new one which is an aggregate of the defaultdocumentresponseparser and the new blockjoinresponseparser

// unregister the *incomplete* default document parser

SolrNet.Startup.Container.Remove<ISolrAbstractResponseParser<myDocumentType>>();

// register my document parser which is an aggregate of the default one and the extra BlockJoinResponseParser

SolrNet.Startup.Container.Register<ISolrAbstractResponseParser<myDocumentType>>(c => new AggregateResponseParser<myDocumentType>(new ISolrAbstractResponseParser<myDocumentType>[] {

new DefaultResponseParser<myDocumentType>(c.GetInstance<ISolrDocumentResponseParser<myDocumentType>>()),

new BlockjoinResponseParser<myDocumentType>(c.GetInstance<ISolrDocumentResponseParser<myDocumentType>>())

}));

that's pretty much it. if you're willing to get back to the source, it would probably a definite option return the joined documents in a specific field of results modeled like the facets are.

Hoping this helps

Philippe

Alex Wainger

unread,

Jun 17, 2016, 10:34:49 AM6/17/16

to SolrNet

Hey Phillippe,

I'm trying to implement the blockjoinresultparser you posted about above, but I'm getting an error on the line

var expandedNode = xml.Element("response")

.Elements("lst")

.FirstOrDefault(X.AttrEq("name", "expanded"));

It says X does not exist in the current context. Is there a using directive I need for that?