10 seconds delay in mfa-gauth and trusted devices

50 views
Skip to first unread message

Jose Luis Martínez del Amor

unread,
Jul 12, 2024, 9:40:59 AM7/12/24
to CAS Developer

Hi,

In our organization, we have a CAS 6.5.9.3 with mfa-gauth and trusted devices implementation with REDIS. Nowadays we have around 15.000 users with MFA activated.

 

From some time until now users are experiencing a 10 seconds delay between login page submit and MFA code request page, if needed, or redirection to service. This delay looks like a issue with number of users with active MFA.

 

After some investigation, we found the lines of code that causes the delay. In CAS log we have the lines:

 

2024-07-12 12:37:03,358 TRACE [org.apereo.cas.trusted.web.flow.MultifactorAuthenticationVerifyTrustAction] - <Retrieving trusted authentication records for [XXXXX]>

. . .

2024-07-12 12:37:13,875 TRACE [org.apereo.cas.trusted.web.flow.MultifactorAuthenticationVerifyTrustAction] - <Retrieving authentication records for [XXXXX] that matches [****************************]>


The portion of code that prints this log is located at the method doExecute in class org.apereo.cas.trusted.web.flow.MultifactorAuthenticationVerifyTrustAction and we think the problem is in the invocation to this.storage.get(principal);. The “storage” attribute is an org.apereo.cas.trusted.authentication.storage.RedisMultifactorAuthenticationTrustStorage instance and we found that this class performs a remove() call in each query operation to the storage:


    @Override
    public Set<? extends MultifactorAuthenticationTrustRecord> get(final String principal) {
        remove();
        val keys = RedisUtils.keys(this.redisTemplate, buildRedisKeyForRecord(principal), this.scanCount);
        return getFromRedisKeys(keys);
    }


The remove method performs a SCAN command over all entries with RedisMultifactorAuthenticationTrustStorage:* key, and for each result invoke a GET to bring the object, deserialize it and check de expiration date:


public void remove(final ZonedDateTime expirationDate) {
    RedisUtils.keys(this.redisTemplate, getPatternRedisKey(), this.scanCount).map((redisKey) -> {
        return (List)this.redisTemplate.boundValueOps(redisKey).get();
    }).filter(Objects::nonNull).flatMap(Collection::stream).filter((record) -> {
        return DateTimeUtils.zonedDateTimeOf(record.getExpirationDate()).isBefore(expirationDate);
    }).forEach((record) -> {
        Set<String> recordKeys = (Set)RedisUtils.keys(this.redisTemplate, buildRedisKeyForRecord(record), this.scanCount).collect(Collectors.toSet());
        this.redisTemplate.delete((Collection)Objects.requireNonNull(recordKeys));
    });
}


In newer versions of the code, the remove invocation still exists. Why is the query method of this RedisMultifactorAuthenticationTrustStorage implemented in this way? After all, there exist a cleaning mechanism.


We thinks this delay is unacceptable and we are thinking in overwrite the class to eliminate the remove invocation and filter the results to avoid expirated results. 


Is there a better solution to our problem? Have we overlooked something?


Regards,

Jose Luis



Łukasz Woźniak

unread,
Jul 20, 2024, 3:34:36 AM7/20/24
to Jose Luis Martínez del Amor, CAS Developer
Hi, 

We actually have 6.5 on our production and this version have problem with performance on Redis. Keys im Redis are created to general. So CAS is checking all keys for example to find that one which is needed. 

We override this class and create simillar on cron job to clear this trusted devices.

There is mamy performance problem on this version, for example with failure throttling.

Version 7.0.1 have better performance with Redis, but have mamy Bugs.

Best regards,
Lukas


--
You received this message because you are subscribed to the Google Groups "CAS Developer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cas-dev+u...@apereo.org.
To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-dev/453a0c91-6e15-422f-a3cb-07ea2273ce51n%40apereo.org.

Jose Luis Martínez del Amor

unread,
Jul 20, 2024, 3:36:28 AM7/20/24
to CAS Developer, Jose Luis Martínez del Amor
Hi,
Definitely overwriting the class so that there were no deletions in the queries improved performance, but there was still some lag. This time the delay occurred when doing the SCAN in REDIS looking for the user's Google MFA credentials keys.

The way the SCAN command works, several calls are made to REDIS looking for the key RedisGoogleAuthenticatorTokenCredentialRepository:username:*, but you can reduce the number of calls with the COUNT parameter that indicates the size of the search block, the bigger the block size the less calls to REDIS. This parameter translates into the following configuration in cas.properties:

cas.authn.mfa.gauth.redis.scan-count=100000

cas.authn.mfa.trusted.redis.scan-count=100000

We have given such a high value to COUNT to try to find results on the first attempt and to make it work like the KEYS command.

I hope this can be useful to someone else.

Regards,
Jose Luis

Reply all
Reply to author
Forward
0 new messages