Hello,
I'm facing an error "caused_by":{"type":"too_many_clauses","reason":"maxClauseCount is set to 1024"} that comes from ES, during simple {"user":"<user_id>"} query to UR.
It's actually similar to this case
The suggested solution is to use latest (0.6.0) docker image, but I already have actionml/harness:latest in my docker-compose.yml, so I'm not sure what to do next...
And, it's not always reproducible. It all goes well for some users who have far more than 1024 events in total.
Here is engine config:
{
"engineId": "MyEngine",
"engineFactory": "com.actionml.engines.ur.UREngine",
"dataset": {
"ttl": "3652 days"
},
"sparkConf": {
"es.index.auto.create": "true",
"es.nodes": "elasticsearch",
"es.nodes.wan.only": "true",
"master": "local",
"spark.driver.memory": "10g",
"spark.es.index.auto.create": "true",
"spark.es.nodes": "elasticsearch",
"spark.es.nodes.wan.only": "true",
"spark.executor.memory": "20g",
"spark.kryo.referenceTracking": "false",
"spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
"spark.kryoserializer.buffer": "500m",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer"
},
"algorithm": {
"returnSelf": "false",
"maxQueryEvents": 500,
"maxIndicatorsPerQuery": 500,
"maxEventsPerEventType": 9999,
"maxCorrelatorsPerEventType": 9999,
"blacklistIndicators": ["like", "watch", "dislike"],
"indicators": [
{ "name": "like" },
{ "name": "watch" },
{ "name": "dislike" },
{ "name": "watch_title_type", "maxCorrelatorsPerItem": 1 },
{ "name": "watch_top_person" },
{ "name": "watch_director" },
{ "name": "watch_writer" },
{ "name": "watch_main_actor" },
{ "name": "watch_actor" },
{ "name": "watch_production_crew" },
{ "name": "watch_genre" },
{ "name": "watch_keyword" },
{ "name": "watch_company" },
{ "name": "watch_streaming_service" },
{ "name": "watch_original_language" },
{ "name": "watch_with_awards", "maxCorrelatorsPerItem": 1 },
{ "name": "watch_year" },
{ "name": "watch_content_rating" },
{ "name": "watch_rating" },
]
}
}
The dataset is about 600K items. Number of events is really huge, I can't even say but that's 10s of millions for sure.
My gut feeling is telling me that the number of secondary indicators should be lowered to get it all work but I'd appreciate your support.
Sergiy