Token Search

90 views
Skip to first unread message

Александр Зюзько

unread,
Apr 13, 2022, 3:45:23 AM4/13/22
to ArangoDB
Hello,

Arango DB documentation contains token search example where search for movies with both dinosaur and park in the description (https://www.arangodb.com/docs/stable/arangosearch-fulltext-token-search.html)

My case:  
v_roads view :
{
  "cleanupIntervalStep": 2,
  "commitIntervalMsec": 1000,
  "consolidationIntervalMsec": 1000,
  "globallyUniqueId": "h7AEB0046C8EF/1716517141",
  "links": {
    "roads": {
      "analyzers": [
        "text_en"
      ],
      "fields": {
        "country": {
          "fields": {
            "names": {
              "fields": {
                "name": {}
              }
            }
          }
        },
        "road": {
          "fields": {
            "names": {
              "fields": {
                "name": {}
              }
            }
          }
        },
        "location": {
          "analyzers": [
            "hfts_geo_json"
          ]
        },
        "houseNumber": {
          "fields": {
            "names": {
              "fields": {
                "name": {}
              }
            }
          }
        },
        "district": {
          "fields": {
            "names": {
              "fields": {
                "name": {}
              }
            }
          }
        },
        "crossroads": {
          "fields": {
            "names": {
              "fields": {
                "name": {}
              }
            }
          }
        },
        "place": {
          "fields": {
            "names": {
              "fields": {
                "name": {}
              }
            }
          }
        },
        "postalCode": {
          "fields": {
            "names": {
              "fields": {
                "name": {}
              }
            }
          }
        }
      },
      "includeAllFields": false,
      "storeValues": "none",
      "trackListPositions": false
    }
  },
  "consolidationPolicy": {
    "type": "tier",
    "segmentsBytesFloor": 2097152,
    "segmentsBytesMax": 5368709120,
    "segmentsMax": 10,
    "segmentsMin": 1,
    "minScore": 0
  },
  "id": "1716517141",
  "primarySort": [],
  "writebufferActive": 0,
  "primarySortCompression": "lz4",
  "storedValues": [],
  "type": "arangosearch",
  "writebufferIdle": 64,
  "writebufferSizeMax": 33554432
}

My query :
FOR doc IN v_roads SEARCH ANALYZER(TOKENS("Rua do Sarraipo", "text_en") ALL IN doc.crossroads.names.name, "text_en") SORT BM25(doc) DESC, doc.rank DESC LIMIT 10 RETURN {"road" : doc.road.names, "crossroad" : doc.crossroads.names, "roadId" : doc._key}

This query returns unexpected result (roadId = 13445), this item does not contain all tokens in crossroads.names.name attributes, see result_1.txt, result_2.txt in attachments

Could you explain is this expected behaviour or a bug?
result_1.txt
result_2.txt

Simran Spiller

unread,
Apr 22, 2022, 1:35:16 PM4/22/22
to ArangoDB
doc.crossroads.names.name refers to this data:

    "crossroads": {
      "names": [
        {
          "name": "Travessa do Salão", ...
        },
        {
          "name": "Rua do Alquebe", ...
        },
        {
          "name": "Travessa do Sarraipo", ...
        },
        { ... }
      ]
    }

What your SEARCH expression does is look for the three tokens [ "rua", "do",  "sarraipo" ]
in all of the name tokens [ "travessa", "do", "salao", "rua", "do", "alqueb", "travessa", "do", "sarraipo", ... ]
and it does find all three of them somewhere.

There is no way to express that you want the tokens to appear in a single object yet, but you can use your current expression to pre-filter documents and add a FILTER to ignore the undesired matches, e.g. like:

LET tokens = TOKENS("Rua do Sarraipo", "text_en")
FOR doc IN v_roads
  SEARCH ANALYZER(tokens ALL IN doc.crossroads.names.name, "text_en")
  FILTER LENGTH(doc.crossroads.names[* FILTER tokens ALL IN TOKENS(CURRENT.name, "text_en")]) > 0
  ...
Reply all
Reply to author
Forward
0 new messages