FT.AGGREGATE performance problems

Paul

unread,

Dec 16, 2019, 9:15:29 AM12/16/19

to RediSearch Discussion Forum

Hi,

We are using Redis Enterprise Cloud, and switching to FT.AGGREGATE from FT.SEARCH has introduced significant performance problems - our instance is using Redis 5.0.4/ Redisearch 1.4.6.

Here is a query that takes 3 seconds to return:

"FT.AGGREGATE" "trips_hash_index" "(@VisibleToCompanyIDs:{2}) (@TripMainTypeID:{6|4|1|3|2}) (@TripStatusID:{1|5|4|6|3})" "SORTBY" "8" "@CreatedDateTimeUTC" "DESC" "@TripStartDate" "ASC" "@TripEndDate" "ASC" "@TripID" "DESC" "LIMIT" "0" "10"

with FT.SEARCH it is quite quick:

"FT.SEARCH" "trips_hash_index" "(@VisibleToCompanyIDs:{2}) (@TripMainTypeID:{6|4|1|3|2}) (@TripStatusID:{1|5|4|6|3})" "SORTBY" "TripID" "DESC" "LIMIT" "0" "10"

And here is what the index looks like:

ft.info trips_hash_index
 1) index_name
 2) "trips_hash_index"
 3) fields
 4)  1) 1) PrimaryKey
        2) type
        3) TEXT
        4) WEIGHT
        5) "1"
        6) SORTABLE
     2) 1) TripID
        2) type
        3) NUMERIC
        4) SORTABLE
     3) 1) UserID
        2) type
        3) TAG
        4) SEPARATOR
        5) ,
     4) 1) TripDescription
        2) type
        3) TEXT
        4) WEIGHT
        5) "1"
     5) 1) TripStartDate
        2) type
        3) NUMERIC
        4) SORTABLE
     6) 1) TripEndDate
        2) type
        3) NUMERIC
        4) SORTABLE
     7) 1) TripTypeID
        2) type
        3) TAG
        4) SEPARATOR
        5) ,
     8) 1) TripMainTypeID
        2) type
        3) TAG
        4) SEPARATOR
        5) ,
     9) 1) TripStatusID
        2) type
        3) TAG
        4) SEPARATOR
        5) ,
    10) 1) CreatedDateTimeUTC
        2) type
        3) NUMERIC
        4) SORTABLE
    11) 1) VisibleToCompanyIDs
        2) type
        3) TAG
        4) SEPARATOR
        5) ,
    12) 1) ParentTripID
        2) type
        3) TAG
        4) SEPARATOR
        5) ,
    13) 1) Supplier
        2) type
        3) TAG
        4) SEPARATOR
        5) ,
    14) 1) TripCarrierTypeID
        2) type
        3) TAG
        4) SEPARATOR
        5) ,
    15) 1) ClientIDs
        2) type
        3) TAG
        4) SEPARATOR
        5) ,
 5) index_options
 6) 1) "NOOFFSETS"
 7) gc_stats
 8) 1) current_hz
    2) "3.0289804935455322"
    3) bytes_collected
    4) (integer) 68723
    5) effectiv_cycles_rate
    6) "0.036071268224674498"
 9) cursor_stats
10) 1) global_idle
    2) (integer) 0
    3) global_total
    4) (integer) 0
    5) index_capacity
    6) (integer) 128
    7) index_total
    8) (integer) 0
11) num_docs
12) (integer) 599715
13) max_doc_id
14) (integer) 600432
15) num_terms
16) (integer) 878752
17) num_records
18) (integer) 4937876
19) inverted_sz_mb
20) "4.6262814823090749e+18"
21) offset_vectors_sz_mb
22) "0"
23) doc_table_size_mb
24) "4.6329217579703337e+18"
25) key_table_size_mb
26) "4.6255695592675082e+18"
27) records_per_doc_avg
28) "8.2337043428962087"
29) bytes_per_record_avg
30) "4.2159641108849231"
31) offsets_per_term_avg
32) "0"
33) offset_bits_per_record_avg
34) "nan"

Any thoughts on what the problem might be?

Thanks!

Filipe C. Oliveira

unread,

Dec 16, 2019, 10:14:07 AM12/16/19

to RediSearch Discussion Forum

Hi there Paul, specifically focusing on FT.AGGREGATE performance recommendations I would recommend:
- using the MAX parameter within your sortby clauses: SORTBY … MAX , since MAX is used to optimized sorting, by sorting only for the n-largest elements. https://oss.redislabs.com/redisearch/Aggregations/#parameters_in_detail
- not using LIMIT in favor of using the cursor api, with the WITHCURSOR keyword, since cursors allow you to consume only part of the response, allowing you to fetch additional results as needed. This is much quicker than using LIMIT with offset, since the query is executed only once, and its state is stored on the server. https://oss.redislabs.com/redisearch/Aggregations/#cursor_api

Can you give it a try and let us know if that solved the issue?

Paul

unread,

Dec 16, 2019, 10:53:02 AM12/16/19

to RediSearch Discussion Forum

Hi Filipe,

I am actually using MAX, but did not include it in the query - the difference between using it and not using is very negligible.

Secondly, CURSORS wouldn't be acceptable in my case because the data changes all the time, so going from one page to the next could have different records because some may have got added, other deleted, other modified.

Do you have any other suggestions? What would be the reason search is so much faster? The only reason I need aggregate is to be able to sort by multiple fields? Is there a different way to accomplish this with search?

Filipe C. Oliveira

unread,

Dec 16, 2019, 11:09:32 AM12/16/19

to RediSearch Discussion Forum

Hi again Paul, given that using the cursor API is not a possible solution here, I think that we should investigate further why the MAX parameter on SORTBY is not having the expected effect on performance improvement ( given that the query is the same on FT.AGGREGATE and FT.SEARCH ). With that in mind, is it possible for you to create an issue on:
https://github.com/RediSearch/RediSearch/issues
with what you've described here, + recreation instruction, and ( if possible ) an rdb so that we can test this for you and provide a solution? The more info we have the faster we can provide an explanation/solution.

Paul

unread,

Dec 16, 2019, 11:44:54 AM12/16/19

to RediSearch Discussion Forum

Hi Filipe,

I created this issue: https://github.com/RediSearch/RediSearch/issues/1016

How would I go about giving you access to our QA environment where you could troubleshoot yourself? There are only about 165k records in that index, but FT.AGGREGATE still takes about a second to return with or without MAX.

Thanks for your help.

Reply all

Reply to author

Forward