User search query times are slow on large user counts

Visto 1.674 veces
Saltar al primer mensaje no leído

Baltabayev Artur (IOC/PAU2)

no leída,
13 jul 2021, 12:11:2613/7/21
a keyclo...@googlegroups.com

Hello Keycloak community,

 

one of our customers discovered that searching a user via the Admin Console is really slow. To be specific, the search takes anywhere from 8 to 13 seconds on a system with around one million users.

After investigating the issue, we discovered that the database query execution time scales with rising user numbers because it uses full text search with surrounding wildcard characters (i.e. ‘%’) which doesn’t work well together with indexes.

 

Here is the request that is triggered when searching for a user in the Admin Console:

/realms/example-realm/users?briefRepresentation=true&first=0&max=20&search='example-user'

 

And this is the mentioned query :

SELECT * FROM USER_ENTITY AS generatedAlias0

WHERE ( generatedAlias0.REALM_ID = 'example-realmid' )

AND ( generatedAlias0.SERVICE_ACCOUNT_CLIENT_LINK IS null )

AND ( ( LOWER(generatedAlias0.USERNAME) LIKE '%example-username%' )

OR ( LOWER(generatedAlias0.EMAIL) LIKE '%example-username%' )

OR ( LOWER(((coalesce(generatedAlias0.FIRST_NAME, '') + ' ')) + coalesce(generatedAlias0.LAST_NAME, '')) LIKE '%example-username%'))

ORDER BY generatedAlias0.USERNAME ASC

 

We would like to provide a way that makes the search fast and independent from the user count at the same time.

The two methods we found to improve the execution time where either removing the trailing wildcard characters from the query, i.e. “ USERNAME LIKE ‘example-username%’ ”,

or performing an exact search without any wildcards, i.e. “ USERNAME = ‘example-username’ “.

This could be achieved by giving the user different options for searching. Those options could be displayed as a dropdown in the UI. Here is a screenshot of how this could look like :

 

 

The options are probably self-explaining. For clarity, the “Starts with” option would execute a query without the trailing wildcards, the “Exact” option would execute a query without wildcards and the

“Includes” option is the current default behavior.

 

Another option would be implementing them as prefixes in the search string (the one you enter in the UI search field) like the already available “id:” rather than a drop down box.

So you would for example search for “startsWith:admin” or “exact:admin” instead of just admin.

However this would make it significantly less usable when searching multiple users in a row, because you would have to type out the prefix every time.

 

What are your thoughts on this topic ? Any feedback is greatly appreciated.

 

Mit freundlichen Grüßen / Best regards

Artur Baltabayev


Bosch IoT Permissions - Product Area User Management (IOC/PAU-PM)
Bosch.IO GmbH | Ziegelei 7 | 88090 Immenstaad | GERMANY |
www.bosch.io
Tel.
+49 7545 202-497 | Fax +49 7545 202-301 | Artur.Ba...@bosch.io

Registered Office: Berlin, Registration Court: Amtsgericht Charlottenburg; HRB 148411 B
Chairman of the Supervisory Board: Dr.-Ing.
Thorsten Lücke; Managing Directors: Dr. Stefan Ferber, Dr. Aleksandar Mitrovic, Yvonne Reckling

Hynek Mlnarik

no leída,
14 jul 2021, 13:44:3614/7/21
a Baltabayev Artur (IOC/PAU2),keyclo...@googlegroups.com
Hi Artur,

Thank you for the proposal. This idea of distinguishing the exact, prefix and infix searches sounds reasonably.

As you mention, the use of infix searches makes it impossible to reasonably leverage relational database indices. Indices could help if there was prefix / exact search clearly distinguishable from infix search.

The search syntax should be consistent across the UI. There is a related work already in client search where the syntax A:B is used which means "search for value B in attribute A". The syntax from your proposal conflicts with this syntax as the word before colon states operator rather than field / attribute name. Would the following proposal glob-like search serve better?

- abc and abc* would be a prefix search
- *abc* would be an infix search
- "abc" would be an exact search

This syntax would not conflict with the work already done for clients and would allow for extending search by field in users as well as expressing substring / exact searches in clients in the future. At the same time, it would not require a dropdown in the UI which could be confusing once there would be the free text syntax similar to clients implemented for user search.

Side note: Functional index would still be necessary and for this specific use case, I'd rather have an administrator create that one manually and not automatically since liquibase abilities to create conditional indices are limited, and its creation might be time consuming especially with millions of users.

WDYT?

--Hynek
--
You received this message because you are subscribed to the Google Groups "Keycloak Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-dev/70906be9a3564ff5bbb9edb1fdf42559%40bosch.io.

Baltabayev Artur (IOC/PAU2)

no leída,
15 jul 2021, 6:52:5115/7/21
a Hynek Mlnarik,keyclo...@googlegroups.com

Hello Hynek,

 

thank you a lot for the quick response ! We agree with your proposal,

and I’ll start preparing the PR if there are no objections.

 

Note: Sorry I didn’t answer to all on the first response.

 

Mit freundlichen Grüßen / Best regards

Artur Baltabayev


Bosch IoT Permissions - Product Area User Management (IOC/PAU-PM)
Bosch.IO GmbH | Ziegelei 7 | 88090 Immenstaad | GERMANY |
www.bosch.io
Tel.
+49 7545 202-497 | Fax +49 7545 202-301 | Artur.Ba...@bosch.io

Registered Office: Berlin, Registration Court: Amtsgericht Charlottenburg; HRB 148411 B
Chairman of the Supervisory Board: Dr.-Ing.
Thorsten Lücke; Managing Directors: Dr. Stefan Ferber, Dr. Aleksandar Mitrovic, Yvonne Reckling

Pedro Igor Craveiro e Silva

no leída,
15 jul 2021, 7:12:4415/7/21
a Baltabayev Artur (IOC/PAU2),Hynek Mlnarik,keyclo...@googlegroups.com
Any change we start looking at oData to standardize queries in our APIs?

Hynek Mlnarik

no leída,
15 jul 2021, 7:42:5515/7/21
a Pedro Igor Craveiro e Silva,Baltabayev Artur (IOC/PAU2),keyclo...@googlegroups.com
Not at this moment and not as part of this task, oData would require a separate long-term initiative.

Pedro Igor Craveiro e Silva

no leída,
15 jul 2021, 7:50:1715/7/21
a Hynek Mlnarik,Baltabayev Artur (IOC/PAU2),keyclo...@googlegroups.com
Right. But I think we should at least consider using its definitions for queries to make the transition easier.

I think the same goes for the new storage which, if planned with oData in mind, should help a lot to map oData queries to the underlying storage.

Hynek Mlnarik

no leída,
15 jul 2021, 10:24:3815/7/21
a Pedro Igor Craveiro e Silva,Baltabayev Artur (IOC/PAU2),keyclo...@googlegroups.com
I am not following. Do you envision users entering queries using oData syntax when searching for users?

Could you elaborate on what you mean by "planned with oData in mind"? Storage is not by itself exposing any REST interfaces.

Pedro Igor Craveiro e Silva

no leída,
15 jul 2021, 12:41:2415/7/21
a Hynek Mlnarik,Baltabayev Artur (IOC/PAU2),keyclo...@googlegroups.com
On Thu, Jul 15, 2021 at 11:24 AM Hynek Mlnarik <hmln...@redhat.com> wrote:
I am not following. Do you envision users entering queries using oData syntax when searching for users?

Could you elaborate on what you mean by "planned with oData in mind"? Storage is not by itself exposing any REST interfaces.

Yeah, it is not. My point is that we could potentially provide some mechanism for easily mapping OData query syntax to the syntax of the underlying storage. Like some libraries do (e.g.: oData -> JPQL).

Making it a lot easier to standardize our REST APIs and remove redundant code for dealing with parsing queries sent over REST. Perhaps related to the model criteria API you did?
 

Hynek Mlnarik

no leída,
15 jul 2021, 13:29:5315/7/21
a Pedro Igor Craveiro e Silva,Baltabayev Artur (IOC/PAU2),keyclo...@googlegroups.com
On Thu, Jul 15, 2021 at 6:41 PM Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:


On Thu, Jul 15, 2021 at 11:24 AM Hynek Mlnarik <hmln...@redhat.com> wrote:
I am not following. Do you envision users entering queries using oData syntax when searching for users?

Could you elaborate on what you mean by "planned with oData in mind"? Storage is not by itself exposing any REST interfaces.

Yeah, it is not. My point is that we could potentially provide some mechanism for easily mapping OData query syntax to the syntax of the underlying storage. Like some libraries do (e.g.: oData -> JPQL).

Making it a lot easier to standardize our REST APIs and remove redundant code for dealing with parsing queries sent over REST. Perhaps related to the model criteria API you did?

Yup, and if you mean filter definitions from OData and the criteria builder as implemented in the new storage, they are both capturing Boolean expressions upon base predicates, and the operators used in the predicates are reasonably mappable from OData to map storage ones. Map storage does not work with text version of the expression, and thus has no parser, and also map storage supports only operators sensible in the context of Keycloak storage, but that is sufficient for the moment. Thus in that sense the APIs are aligned, parser would need to be created in the REST layer (but that's simple as it is a standard context-free language).

Baltabayev Artur (IOC/PAU2)

no leída,
5 ago 2021, 5:00:325/8/21
a Hynek Mlnarik,Pedro Igor Craveiro e Silva,keyclo...@googlegroups.com

Hello,

 

i just want to let you know that the PR “KEYCLOAK-18727 Improve user search query” is finished and ready to be reviewed. You can find it here : https://github.com/keycloak/keycloak/pull/8346

 

During development some additional things came up, which we commented to the PR

as remarks. Please have a look at them too.

 

As always, any feedback is greatly appreciated !

Responder a todos
Responder al autor
Reenviar
0 mensajes nuevos