ElasticSearch query fail

26 views
Skip to first unread message

Jerard

unread,
Mar 26, 2023, 1:20:18 PM3/26/23
to actionml-user
Hello,

I'm facing an error "caused_by":{"type":"too_many_clauses","reason":"maxClauseCount is set to 1024"} that comes from ES, during simple {"user":"<user_id>"} query to UR.

It's actually similar to this case

The suggested solution is to use latest (0.6.0) docker image, but I already have actionml/harness:latest in my docker-compose.yml, so I'm not sure what to do next...

And, it's not always reproducible. It all goes well for some users who have far more than 1024 events in total.

Here is engine config:

{
  "engineId": "MyEngine",
  "engineFactory": "com.actionml.engines.ur.UREngine",
  "dataset": {
    "ttl": "3652 days"
  },
  "sparkConf": {
    "es.index.auto.create": "true",
    "es.nodes": "elasticsearch",
    "es.nodes.wan.only": "true",
    "master": "local",
    "spark.driver.memory": "10g",
    "spark.es.index.auto.create": "true",
    "spark.es.nodes": "elasticsearch",
    "spark.es.nodes.wan.only": "true",
    "spark.executor.memory": "20g",
    "spark.kryo.referenceTracking": "false",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryoserializer.buffer": "500m",
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer"
  },
  "algorithm": {
    "returnSelf": "false",
    "maxQueryEvents": 500,
    "maxIndicatorsPerQuery": 500,
    "maxEventsPerEventType": 9999,
    "maxCorrelatorsPerEventType": 9999,
    "blacklistIndicators": ["like", "watch", "dislike"],
    "indicators": [
      { "name": "like" },
      { "name": "watch" },
      { "name": "dislike" },
      { "name": "watch_title_type", "maxCorrelatorsPerItem": 1 },
      { "name": "watch_top_person" },
      { "name": "watch_director" },
      { "name": "watch_writer" },
      { "name": "watch_main_actor" },
      { "name": "watch_actor" },
      { "name": "watch_production_crew" },
      { "name": "watch_genre" },
      { "name": "watch_keyword" },
      { "name": "watch_company" },
      { "name": "watch_streaming_service" },
      { "name": "watch_original_language" },
      { "name": "watch_with_awards", "maxCorrelatorsPerItem": 1 },
      { "name": "watch_year" },
      { "name": "watch_content_rating" },
      { "name": "watch_rating" },
    ]
  }
}

The dataset is about 600K items. Number of events is really huge, I can't even say but that's 10s of millions for sure.

My gut feeling is telling me that the number of secondary indicators should be lowered to get it all work but I'd appreciate your support.

Sergiy
Reply all
Reply to author
Forward
0 new messages