Exception occurred while trying to associate many-many entities using ManyAssociation (ElasticSearchIndexException)

346 views
Skip to first unread message

Jaydatt Desai

unread,
Aug 29, 2014, 9:21:39 AM8/29/14
to qi4j...@googlegroups.com
Hello Gentlemen,

I was doing some testing with ManyAssociation, where I was adding many(~200) number of entities to this ManyAssociation property. For Index/Query here is ElasticSearch.
So, when I was executing this test case it failed and throws an "ElasticSearchIndexException". This is only occurred when trying to associate entities up to some limit (works perfectly with ~100 entities in this case, but throws exception when we try to associate ~200), and this limit vary upon entity size.

Details are below:
Exception:
org.qi4j.index.elasticsearch.ElasticSearchIndexException: failure in bulk execution:
[178]: index [qi4j_index], type [qi4j_entities], id [869f4cf5-b258-4aa4-9536-c11b336862c5-0], message [IllegalArgumentException[Document contains at least one immense term in field="_all" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[38 36 39 66 34 63 66 35 2d 62 32 35 38 2d 34 61 61 34 2d 39 35 33 36 2d 63 31 31 62 33 33]...']]
at org.qi4j.index.elasticsearch.ElasticSearchIndexer$Mixin.notifyChanges(ElasticSearchIndexer.java:150)
at org.qi4j.spi.entitystore.StateChangeNotificationConcern$1.commit(StateChangeNotificationConcern.java:44)
at org.qi4j.spi.entitystore.ConcurrentModificationCheckConcern$ConcurrentCheckingEntityStoreUnitOfWork$1.commit(ConcurrentModificationCheckConcern.java:116)

Code:

@Test
public void testManyAssociation() throws Exception{
UnitOfWork uow = module.newUnitOfWork();
TestEntity testEntity = module.currentUnitOfWork().newEntity(TestEntity.class);

for(int i = 0 ; i<200; i++) {
TestEntity2 testEntity2 = module.currentUnitOfWork().newEntity(TestEntity2.class);
testEntity2.property().set("test");
testEntity.manyAssociation().add(testEntity2);
}
uow.complete();

}


public interface TestEntity
extends EntityComposite
{
@Optional
Property<String> property();

ManyAssociation<TestEntity2> manyAssociation();
}

public interface TestEntity2
extends EntityComposite
{
@Optional
Property<String> property();

@Optional
Property<List<Byte>> binaryProperty();

}

Paul Merlin

unread,
Aug 29, 2014, 11:38:41 AM8/29/14
to qi4j...@googlegroups.com
Hey,

Looks like a bug in the ElasticSearch Query engine.

Filled it as https://ops4j1.jira.com/browse/QI-412

Thanks for the test-code.
Will try to fix this asap.


Jaydatt Desai a écrit :

Paul Merlin

unread,
Oct 14, 2014, 9:04:30 AM10/14/14
to Jaydatt Desai, qi4j...@googlegroups.com
Jaydatt,

Finally found some time to investigate this.

An indexed term in Lucene was capped to 32k.
All ManyAssociations of an entity are stored in a term.
If the UTF-8 JSON encoded term is > 32k, ElasticSearch simply cannot index it.

This has been fixed in Lucene 4.5 (https://issues.apache.org/jira/browse/LUCENE-4583) and ElasticSearch 1.3.x.

The dependency version is fixed in the Qi4j develop branch (worked for 10k associations) and will be included in 2.1. For the time being, simply override the ElasticSearch version used in your project with the latest one available (1.3.4).

Note that there are obviously some other limits going on, not every use-case has been tested. This one was fixed by a simple version upgrade, we'll see if others will need indexing shema changes. Thanks for the report !

Cheers

/Paul

Jaydatt Desai

unread,
Oct 14, 2014, 9:10:18 AM10/14/14
to qi4j...@googlegroups.com, jaydat...@gmail.com, pa...@nosphere.org
Hi Paul,

Thanks a lot for the fix and for updating me, I will take a look on your fix on development branch and I will also change elastic search to the latest one in my project. 
So, thanks again for the fix...

Paul Merlin

unread,
Oct 14, 2014, 9:17:58 AM10/14/14
to Jaydatt Desai, qi4j...@googlegroups.com
Jaydatt Desai a écrit :
Hi Paul,

Thanks a lot for the fix and for updating me, I will take a look on your fix on development branch and I will also change elastic search to the latest one in my project. 
So, thanks again for the fix...
There's no fix on the develop branch except the dependency upgrade and a unit test for 10_000 associations.

https://github.com/Qi4j/qi4j-sdk/commit/b8efcda1446afbf0dde033719d6956542bcec9d8


Bharvi Dixit

unread,
Dec 16, 2014, 2:21:09 AM12/16/14
to qi4j...@googlegroups.com, jaydat...@gmail.com, pa...@nosphere.org
Hi Paul,

I am using elasticsearch version 1.3.4 but still getting the beloq exceptions for some of the documents while indexing the data.

java.lang.IllegalArgumentException: Document contains at least one immense term in field="_all" (whose UTF8 encoding is longer than the max len
gth 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[119,
 111, 114, 114, 121, 105, 110, 103, 32, 110, 101, 119, 32, 102, 105, 110, 100, 105, 110, 103, 115, 32, 102, 114, 111, 109, 32, 97, 32, 100]...'
, original message: bytes can be at most 32766 in length; got 42362
        at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:671)
        at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
        at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
        at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
        at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:451)
        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1539)
        at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1254)
        at org.elasticsearch.index.engine.internal.InternalEngine.innerIndex(InternalEngine.java:563)
        at org.elasticsearch.index.engine.internal.InternalEngine.index(InternalEngine.java:492)
        at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:409)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:446)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:535)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:434)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 42362
        at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
        at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151)
        at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:645)
        ... 16 more


Regards
Bharvi

Niclas Hedhman

unread,
Dec 16, 2014, 6:48:22 AM12/16/14
to Bharvi Dixit, qi4j...@googlegroups.com, Jaydatt Desai, Paul MERLIN
This is related to Lucene's "fields" feature, which we use to map to Property fields in Qi4j. I guess that marking that Qi4j field @Queryable(false) could resolve it.

Niclas

--
You received this message because you are subscribed to the Google Groups "qi4j-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qi4j-dev+u...@googlegroups.com.
To post to this group, send email to qi4j...@googlegroups.com.
Visit this group at http://groups.google.com/group/qi4j-dev.
For more options, visit https://groups.google.com/d/optout.


--
Niclas Hedhman, Software Developer
http://www.qi4j.org - New Energy for Java
Reply all
Reply to author
Forward
0 new messages