--
You received this message because you are subscribed to the Google Groups "Keycloak Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-dev/ded22181-a5f8-4557-b098-2926609d2d55n%40googlegroups.com.
The short story is that what I learned when using it is that you should balance the benefits and not blindly assume that async response is going to give you what you expect.
Did you run any benchmark to see how it affects runtime in terms of throughput and memory? IIRC, you have a separate thread pool from JAX-RS, and the overhead due to context switch should be considered. But if you are sure that it works and improve things for your use case, go for it.
By switching to another thread, contextual data might be lost too because you are running in a different thread. For JPA, this is also a problem as the entity manager is not thread-safe. You end up having to do thins like https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L1342.
At the beginning of authz I tried to use async responses to process policies in parallel. That is why we have support for async in our filter. It did not work very mainly due to contextual data being lost and issues with JPA.Why not just do sync? What you are doing is different than what we do for logout actions, for instance (in terms of calling clients through HTTP)?
--On Thu, Jun 17, 2021 at 11:37 AM Michal Hajas <mha...@redhat.com> wrote:Hello everyone,In the last Keycloak Weekly meeting we haven't finished the discussion regarding the justification for usage of AsyncResponses for resolving SAML Artifact binding artifacts.Currently, we are using AsyncResponse on lines: [1] and [2]. Then in the method [3] we are resuming the response directly in the same thread if the request binding was POST or REDIRECT (we already have the saml message in the request data, no need to wait for anything). However, if the request binding was ARTIFACT, we need to first resolve the saml message from the caller and then we can continue. In that case, we schedule ArtifactResolutionRunnable[4] here: [5]. When Saml message is obtained and processed the request is resumed in AsyncResponseTransaction[6]Do you think it makes sense to process SAML Artifact binding messages this way? Do you know about some disadvantages of AsyncResponse that we should consider?--
You received this message because you are subscribed to the Google Groups "Keycloak Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-dev/ded22181-a5f8-4557-b098-2926609d2d55n%40googlegroups.com.
You received this message because you are subscribed to the Google Groups "Keycloak Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keycloak-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-dev/CA%2B3s2iT5w0seAp2YnP4pX_LzsoVwHiFPG%2BsbetrvuL44cFWzUw%40mail.gmail.com.
On Thu, Jun 17, 2021 at 5:26 PM Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:The short story is that what I learned when using it is that you should balance the benefits and not blindly assume that async response is going to give you what you expect.This is right. Using async everywhere leads to every kind of dirty code unless done right from the beginning. It may become hard to share the right state, it may be hard to assure thread safety (cf. the entity manager you also mentioned), if overused it makes things significantly slower. On the other hand, it has its use cases, and I am convinced SAML artifact resolution is one of them.Did you run any benchmark to see how it affects runtime in terms of throughput and memory? IIRC, you have a separate thread pool from JAX-RS, and the overhead due to context switch should be considered. But if you are sure that it works and improve things for your use case, go for it.
Throughput has been considered, and in this case up to the level of prevention of DoS type of attacks. The wait time for the external systems responding with SAML artifact can be a matter of seconds or time out after tens of seconds. So if kept in sync processing, by sending relatively few requests, the worker thread pool could be exhausted and the server would be effectively blocked. By offloading SAML artifact processing into another threads pool, the worker threads are kept operating for non SAML artifact requests, and the dedicated SAML artifact thread pool can be tuned independently for this part.
Exact memory impact has not been measured. Do you think it could affect the memory footprint significantly, i.e. tens of MB or more?
By switching to another thread, contextual data might be lost too because you are running in a different thread. For JPA, this is also a problem as the entity manager is not thread-safe. You end up having to do thins like https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L1342.This is acceptable for this use case.
At the beginning of authz I tried to use async responses to process policies in parallel. That is why we have support for async in our filter. It did not work very mainly due to contextual data being lost and issues with JPA.
Why not just do sync? What you are doing is different than what we do for logout actions, for instance (in terms of calling clients through HTTP)?This is exactly the other use case which should be done in the background independently, and in independent threads (see deferred [1]). The logout is a better case compared to SAML artifact though, as the former requires existing session so cannot be invoked by arbitrary request, while SAML artifact request does not need any preauthentication - the artifact to be retrieved might be the very first AuthnRequest. Hence sync would open the door for DoS.
On Fri, Jun 18, 2021 at 5:27 AM Hynek Mlnarik <hmln...@redhat.com> wrote:On Thu, Jun 17, 2021 at 5:26 PM Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:The short story is that what I learned when using it is that you should balance the benefits and not blindly assume that async response is going to give you what you expect.This is right. Using async everywhere leads to every kind of dirty code unless done right from the beginning. It may become hard to share the right state, it may be hard to assure thread safety (cf. the entity manager you also mentioned), if overused it makes things significantly slower. On the other hand, it has its use cases, and I am convinced SAML artifact resolution is one of them.Did you run any benchmark to see how it affects runtime in terms of throughput and memory? IIRC, you have a separate thread pool from JAX-RS, and the overhead due to context switch should be considered. But if you are sure that it works and improve things for your use case, go for it.Sorry, a correction here ... I meant "maintain a separate thread pool with JAX-RS".Throughput has been considered, and in this case up to the level of prevention of DoS type of attacks. The wait time for the external systems responding with SAML artifact can be a matter of seconds or time out after tens of seconds. So if kept in sync processing, by sending relatively few requests, the worker thread pool could be exhausted and the server would be effectively blocked. By offloading SAML artifact processing into another threads pool, the worker threads are kept operating for non SAML artifact requests, and the dedicated SAML artifact thread pool can be tuned independently for this part.That is a good point and makes sense for me too. But we should also be able to set the timeout for HTTP requests (and not assume a global timeout for every case) as well as think about the observability, fault-tolerance, and resilience aspects of external calls made by Keycloak, in addition to the thread pool.We should probably start discussing a review of the HttpClient SPI and think about enhancing it to support external/async calls. In this particular case, I can think about leveraging the http client to perform these calls using a specific thread pool (like the one used by saml artifact) together with async responses. The benefit here is that we could leverage capabilities usually provided by these libraries for some of the aspects mentioned above.
Exact memory impact has not been measured. Do you think it could affect the memory footprint significantly, i.e. tens of MB or more?If we are always adding the configuration for the thread pool used by the artifact, it does. Not significantly, but IMO we should be opinionated about what we ship by default. For this particular case, I'm not sure if artifact binding will be used very often so I would avoid creating one by default and not use the default executor by default. But an executor that just runs the runnable in the calling thread.
People should be able to switch accordingly whether or not they want to use sync vs async. And use accordingly to their needs. Thus my question on how much you gain in terms of throughput. For some, sync should be fine.By switching to another thread, contextual data might be lost too because you are running in a different thread. For JPA, this is also a problem as the entity manager is not thread-safe. You end up having to do thins like https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L1342.This is acceptable for this use case.Sure, it works. But it is one of the things that made me give up on using async responses. The code is very unnatural and susceptible to errors when the code you are relying on changes to require additional data from the context. IMO, it also makes it hard to maintain.At the beginning of authz I tried to use async responses to process policies in parallel. That is why we have support for async in our filter. It did not work very mainly due to contextual data being lost and issues with JPA.Btw, you shouldn't need to manually handle the transaction for async responses if you support that in our filer. I just realized that we removed support for async responses from there, but it was enough to properly handle async responses and delay closing sessions by using a listener. Looks more natural and more generic than the approach taken based on a custom transaction. See https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L390.We did that a long time ago. Although Bill was not very confident about the usage :) And he was right, at least for the use cases I had in mind.
Why not just do sync? What you are doing is different than what we do for logout actions, for instance (in terms of calling clients through HTTP)?This is exactly the other use case which should be done in the background independently, and in independent threads (see deferred [1]). The logout is a better case compared to SAML artifact though, as the former requires existing session so cannot be invoked by arbitrary request, while SAML artifact request does not need any preauthentication - the artifact to be retrieved might be the very first AuthnRequest. Hence sync would open the door for DoS.You are still open to DoS because it is going to depend on how the thread pool is configured. A separate thread pool helps but you are still sharing the same resources. But yeah, it gives at least an option to reduce the risk.IMO, this could be discussed in a design document so that we can think about a solution that can be used not only here but in other places where we need to make external calls.
On Fri, Jun 18, 2021 at 2:26 PM Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:On Fri, Jun 18, 2021 at 5:27 AM Hynek Mlnarik <hmln...@redhat.com> wrote:On Thu, Jun 17, 2021 at 5:26 PM Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:The short story is that what I learned when using it is that you should balance the benefits and not blindly assume that async response is going to give you what you expect.This is right. Using async everywhere leads to every kind of dirty code unless done right from the beginning. It may become hard to share the right state, it may be hard to assure thread safety (cf. the entity manager you also mentioned), if overused it makes things significantly slower. On the other hand, it has its use cases, and I am convinced SAML artifact resolution is one of them.Did you run any benchmark to see how it affects runtime in terms of throughput and memory? IIRC, you have a separate thread pool from JAX-RS, and the overhead due to context switch should be considered. But if you are sure that it works and improve things for your use case, go for it.Sorry, a correction here ... I meant "maintain a separate thread pool with JAX-RS".Throughput has been considered, and in this case up to the level of prevention of DoS type of attacks. The wait time for the external systems responding with SAML artifact can be a matter of seconds or time out after tens of seconds. So if kept in sync processing, by sending relatively few requests, the worker thread pool could be exhausted and the server would be effectively blocked. By offloading SAML artifact processing into another threads pool, the worker threads are kept operating for non SAML artifact requests, and the dedicated SAML artifact thread pool can be tuned independently for this part.That is a good point and makes sense for me too. But we should also be able to set the timeout for HTTP requests (and not assume a global timeout for every case) as well as think about the observability, fault-tolerance, and resilience aspects of external calls made by Keycloak, in addition to the thread pool.We should probably start discussing a review of the HttpClient SPI and think about enhancing it to support external/async calls. In this particular case, I can think about leveraging the http client to perform these calls using a specific thread pool (like the one used by saml artifact) together with async responses. The benefit here is that we could leverage capabilities usually provided by these libraries for some of the aspects mentioned above.All of these sound sensible to me. I would even go further and question HttpClient itself due to its issues with connection pools that have bitten us several times. Could OkHttp become the new standard? Anything else?
Exact memory impact has not been measured. Do you think it could affect the memory footprint significantly, i.e. tens of MB or more?If we are always adding the configuration for the thread pool used by the artifact, it does. Not significantly, but IMO we should be opinionated about what we ship by default. For this particular case, I'm not sure if artifact binding will be used very often so I would avoid creating one by default and not use the default executor by default. But an executor that just runs the runnable in the calling thread.I agree, and in this case, there is no new thread pool created by default. When needed, it can be configured as any other dedicated thread pool accessible via ExecutorsSpi. The default behaviour is - if the saml-artifact-pool thread pool is configured, use that one, otherwise use the default thread pool. It does not exactly align with the proposed behaviour as it is not executed on the calling thread. That by itself seems to be an interesting idea. On the other hand, I'd like to avoid having two implementations, one for sync and one for async mode, as this would open room for errors. Would AsyncResponse logic work on the same thread? Also, how to ensure that in sync there is some prevention measure of DoS explained above?
People should be able to switch accordingly whether or not they want to use sync vs async. And use accordingly to their needs. Thus my question on how much you gain in terms of throughput. For some, sync should be fine.By switching to another thread, contextual data might be lost too because you are running in a different thread. For JPA, this is also a problem as the entity manager is not thread-safe. You end up having to do thins like https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L1342.This is acceptable for this use case.Sure, it works. But it is one of the things that made me give up on using async responses. The code is very unnatural and susceptible to errors when the code you are relying on changes to require additional data from the context. IMO, it also makes it hard to maintain.At the beginning of authz I tried to use async responses to process policies in parallel. That is why we have support for async in our filter. It did not work very mainly due to contextual data being lost and issues with JPA.Btw, you shouldn't need to manually handle the transaction for async responses if you support that in our filer. I just realized that we removed support for async responses from there, but it was enough to properly handle async responses and delay closing sessions by using a listener. Looks more natural and more generic than the approach taken based on a custom transaction. See https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L390.We did that a long time ago. Although Bill was not very confident about the usage :) And he was right, at least for the use cases I had in mind.This looks like a good idea. The only concern is how to guarantee that the same thread which is calling the session.close() has also been the one which opened the session? IOW how to guarantee that in the case of an async response being processed in another thread, that the transaction is properly committed even for non-thread safe objects like the entity manager?
To view this discussion on the web visit https://groups.google.com/d/msgid/keycloak-dev/CAMvXD%3DEivHP%3D4uf3HLUBNH3CD%3DLxCYkCq4HX0taWD4vgqf2kKQ%40mail.gmail.com.
On Fri, Jun 18, 2021 at 12:32 PM Hynek Mlnarik <hmln...@redhat.com> wrote:On Fri, Jun 18, 2021 at 2:26 PM Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:On Fri, Jun 18, 2021 at 5:27 AM Hynek Mlnarik <hmln...@redhat.com> wrote:On Thu, Jun 17, 2021 at 5:26 PM Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:The short story is that what I learned when using it is that you should balance the benefits and not blindly assume that async response is going to give you what you expect.This is right. Using async everywhere leads to every kind of dirty code unless done right from the beginning. It may become hard to share the right state, it may be hard to assure thread safety (cf. the entity manager you also mentioned), if overused it makes things significantly slower. On the other hand, it has its use cases, and I am convinced SAML artifact resolution is one of them.Did you run any benchmark to see how it affects runtime in terms of throughput and memory? IIRC, you have a separate thread pool from JAX-RS, and the overhead due to context switch should be considered. But if you are sure that it works and improve things for your use case, go for it.Sorry, a correction here ... I meant "maintain a separate thread pool with JAX-RS".Throughput has been considered, and in this case up to the level of prevention of DoS type of attacks. The wait time for the external systems responding with SAML artifact can be a matter of seconds or time out after tens of seconds. So if kept in sync processing, by sending relatively few requests, the worker thread pool could be exhausted and the server would be effectively blocked. By offloading SAML artifact processing into another threads pool, the worker threads are kept operating for non SAML artifact requests, and the dedicated SAML artifact thread pool can be tuned independently for this part.That is a good point and makes sense for me too. But we should also be able to set the timeout for HTTP requests (and not assume a global timeout for every case) as well as think about the observability, fault-tolerance, and resilience aspects of external calls made by Keycloak, in addition to the thread pool.We should probably start discussing a review of the HttpClient SPI and think about enhancing it to support external/async calls. In this particular case, I can think about leveraging the http client to perform these calls using a specific thread pool (like the one used by saml artifact) together with async responses. The benefit here is that we could leverage capabilities usually provided by these libraries for some of the aspects mentioned above.All of these sound sensible to me. I would even go further and question HttpClient itself due to its issues with connection pools that have bitten us several times. Could OkHttp become the new standard? Anything else?Yeah, OkHttp can be an alternative. Not sure though if we are going to need something else to address some other aspects. It is also available from Quarkus and used by some of its extensions.Exact memory impact has not been measured. Do you think it could affect the memory footprint significantly, i.e. tens of MB or more?If we are always adding the configuration for the thread pool used by the artifact, it does. Not significantly, but IMO we should be opinionated about what we ship by default. For this particular case, I'm not sure if artifact binding will be used very often so I would avoid creating one by default and not use the default executor by default. But an executor that just runs the runnable in the calling thread.I agree, and in this case, there is no new thread pool created by default. When needed, it can be configured as any other dedicated thread pool accessible via ExecutorsSpi. The default behaviour is - if the saml-artifact-pool thread pool is configured, use that one, otherwise use the default thread pool. It does not exactly align with the proposed behaviour as it is not executed on the calling thread. That by itself seems to be an interesting idea. On the other hand, I'd like to avoid having two implementations, one for sync and one for async mode, as this would open room for errors. Would AsyncResponse logic work on the same thread? Also, how to ensure that in sync there is some prevention measure of DoS explained above?I think the code from Michal is already processing on the same thread when not doing artifact binding.
There other ways to prevent and reduce the impact that does not require any application-level logic, but on the network level. Not different than other places that DoS can happen.
People should be able to switch accordingly whether or not they want to use sync vs async. And use accordingly to their needs. Thus my question on how much you gain in terms of throughput. For some, sync should be fine.By switching to another thread, contextual data might be lost too because you are running in a different thread. For JPA, this is also a problem as the entity manager is not thread-safe. You end up having to do thins like https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L1342.This is acceptable for this use case.Sure, it works. But it is one of the things that made me give up on using async responses. The code is very unnatural and susceptible to errors when the code you are relying on changes to require additional data from the context. IMO, it also makes it hard to maintain.At the beginning of authz I tried to use async responses to process policies in parallel. That is why we have support for async in our filter. It did not work very mainly due to contextual data being lost and issues with JPA.Btw, you shouldn't need to manually handle the transaction for async responses if you support that in our filer. I just realized that we removed support for async responses from there, but it was enough to properly handle async responses and delay closing sessions by using a listener. Looks more natural and more generic than the approach taken based on a custom transaction. See https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L390.We did that a long time ago. Although Bill was not very confident about the usage :) And he was right, at least for the use cases I had in mind.This looks like a good idea. The only concern is how to guarantee that the same thread which is calling the session.close() has also been the one which opened the session? IOW how to guarantee that in the case of an async response being processed in another thread, that the transaction is properly committed even for non-thread safe objects like the entity manager?The AsyncResponse basically delays the closing of responses. As we keep the reference to the session, we should be able to get any contextual data and providers created during the session lifetime from the listener.
On Tue, Jun 22, 2021 at 4:15 PM Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:On Fri, Jun 18, 2021 at 12:32 PM Hynek Mlnarik <hmln...@redhat.com> wrote:On Fri, Jun 18, 2021 at 2:26 PM Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:On Fri, Jun 18, 2021 at 5:27 AM Hynek Mlnarik <hmln...@redhat.com> wrote:On Thu, Jun 17, 2021 at 5:26 PM Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:The short story is that what I learned when using it is that you should balance the benefits and not blindly assume that async response is going to give you what you expect.This is right. Using async everywhere leads to every kind of dirty code unless done right from the beginning. It may become hard to share the right state, it may be hard to assure thread safety (cf. the entity manager you also mentioned), if overused it makes things significantly slower. On the other hand, it has its use cases, and I am convinced SAML artifact resolution is one of them.Did you run any benchmark to see how it affects runtime in terms of throughput and memory? IIRC, you have a separate thread pool from JAX-RS, and the overhead due to context switch should be considered. But if you are sure that it works and improve things for your use case, go for it.Sorry, a correction here ... I meant "maintain a separate thread pool with JAX-RS".Throughput has been considered, and in this case up to the level of prevention of DoS type of attacks. The wait time for the external systems responding with SAML artifact can be a matter of seconds or time out after tens of seconds. So if kept in sync processing, by sending relatively few requests, the worker thread pool could be exhausted and the server would be effectively blocked. By offloading SAML artifact processing into another threads pool, the worker threads are kept operating for non SAML artifact requests, and the dedicated SAML artifact thread pool can be tuned independently for this part.That is a good point and makes sense for me too. But we should also be able to set the timeout for HTTP requests (and not assume a global timeout for every case) as well as think about the observability, fault-tolerance, and resilience aspects of external calls made by Keycloak, in addition to the thread pool.We should probably start discussing a review of the HttpClient SPI and think about enhancing it to support external/async calls. In this particular case, I can think about leveraging the http client to perform these calls using a specific thread pool (like the one used by saml artifact) together with async responses. The benefit here is that we could leverage capabilities usually provided by these libraries for some of the aspects mentioned above.All of these sound sensible to me. I would even go further and question HttpClient itself due to its issues with connection pools that have bitten us several times. Could OkHttp become the new standard? Anything else?Yeah, OkHttp can be an alternative. Not sure though if we are going to need something else to address some other aspects. It is also available from Quarkus and used by some of its extensions.Exact memory impact has not been measured. Do you think it could affect the memory footprint significantly, i.e. tens of MB or more?If we are always adding the configuration for the thread pool used by the artifact, it does. Not significantly, but IMO we should be opinionated about what we ship by default. For this particular case, I'm not sure if artifact binding will be used very often so I would avoid creating one by default and not use the default executor by default. But an executor that just runs the runnable in the calling thread.I agree, and in this case, there is no new thread pool created by default. When needed, it can be configured as any other dedicated thread pool accessible via ExecutorsSpi. The default behaviour is - if the saml-artifact-pool thread pool is configured, use that one, otherwise use the default thread pool. It does not exactly align with the proposed behaviour as it is not executed on the calling thread. That by itself seems to be an interesting idea. On the other hand, I'd like to avoid having two implementations, one for sync and one for async mode, as this would open room for errors. Would AsyncResponse logic work on the same thread? Also, how to ensure that in sync there is some prevention measure of DoS explained above?I think the code from Michal is already processing on the same thread when not doing artifact binding.That's correct, it remained the same as it used to be for POST and Redirect bindings.There other ways to prevent and reduce the impact that does not require any application-level logic, but on the network level. Not different than other places that DoS can happen.Apologies, I got lost. Could you please elaborate more?
People should be able to switch accordingly whether or not they want to use sync vs async. And use accordingly to their needs. Thus my question on how much you gain in terms of throughput. For some, sync should be fine.By switching to another thread, contextual data might be lost too because you are running in a different thread. For JPA, this is also a problem as the entity manager is not thread-safe. You end up having to do thins like https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L1342.This is acceptable for this use case.Sure, it works. But it is one of the things that made me give up on using async responses. The code is very unnatural and susceptible to errors when the code you are relying on changes to require additional data from the context. IMO, it also makes it hard to maintain.At the beginning of authz I tried to use async responses to process policies in parallel. That is why we have support for async in our filter. It did not work very mainly due to contextual data being lost and issues with JPA.Btw, you shouldn't need to manually handle the transaction for async responses if you support that in our filer. I just realized that we removed support for async responses from there, but it was enough to properly handle async responses and delay closing sessions by using a listener. Looks more natural and more generic than the approach taken based on a custom transaction. See https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L390.We did that a long time ago. Although Bill was not very confident about the usage :) And he was right, at least for the use cases I had in mind.This looks like a good idea. The only concern is how to guarantee that the same thread which is calling the session.close() has also been the one which opened the session? IOW how to guarantee that in the case of an async response being processed in another thread, that the transaction is properly committed even for non-thread safe objects like the entity manager?The AsyncResponse basically delays the closing of responses. As we keep the reference to the session, we should be able to get any contextual data and providers created during the session lifetime from the listener.Let me rephrase the question, I think we diverged here. There is no benefit in creating an async response and then processing it synchronously. So the logic building the AsyncResponse can be (and usually is) run in another thread than the one that initially processed the request and created the Keycloak session and providers within. Some of the providers are not thread safe, specifically DefaultJpaConnectionProvider which contains reference to EntityManager instance. Could the DefaultJpaConnectionProvider instance from the original session be safely used in the AsyncResponse code running in another thread?
On Tue, Jun 22, 2021 at 11:56 AM Hynek Mlnarik <hmln...@redhat.com> wrote:On Tue, Jun 22, 2021 at 4:15 PM Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:On Fri, Jun 18, 2021 at 12:32 PM Hynek Mlnarik <hmln...@redhat.com> wrote:On Fri, Jun 18, 2021 at 2:26 PM Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:On Fri, Jun 18, 2021 at 5:27 AM Hynek Mlnarik <hmln...@redhat.com> wrote:On Thu, Jun 17, 2021 at 5:26 PM Pedro Igor Craveiro e Silva <pigor.c...@gmail.com> wrote:The short story is that what I learned when using it is that you should balance the benefits and not blindly assume that async response is going to give you what you expect.This is right. Using async everywhere leads to every kind of dirty code unless done right from the beginning. It may become hard to share the right state, it may be hard to assure thread safety (cf. the entity manager you also mentioned), if overused it makes things significantly slower. On the other hand, it has its use cases, and I am convinced SAML artifact resolution is one of them.Did you run any benchmark to see how it affects runtime in terms of throughput and memory? IIRC, you have a separate thread pool from JAX-RS, and the overhead due to context switch should be considered. But if you are sure that it works and improve things for your use case, go for it.Sorry, a correction here ... I meant "maintain a separate thread pool with JAX-RS".Throughput has been considered, and in this case up to the level of prevention of DoS type of attacks. The wait time for the external systems responding with SAML artifact can be a matter of seconds or time out after tens of seconds. So if kept in sync processing, by sending relatively few requests, the worker thread pool could be exhausted and the server would be effectively blocked. By offloading SAML artifact processing into another threads pool, the worker threads are kept operating for non SAML artifact requests, and the dedicated SAML artifact thread pool can be tuned independently for this part.That is a good point and makes sense for me too. But we should also be able to set the timeout for HTTP requests (and not assume a global timeout for every case) as well as think about the observability, fault-tolerance, and resilience aspects of external calls made by Keycloak, in addition to the thread pool.We should probably start discussing a review of the HttpClient SPI and think about enhancing it to support external/async calls. In this particular case, I can think about leveraging the http client to perform these calls using a specific thread pool (like the one used by saml artifact) together with async responses. The benefit here is that we could leverage capabilities usually provided by these libraries for some of the aspects mentioned above.All of these sound sensible to me. I would even go further and question HttpClient itself due to its issues with connection pools that have bitten us several times. Could OkHttp become the new standard? Anything else?Yeah, OkHttp can be an alternative. Not sure though if we are going to need something else to address some other aspects. It is also available from Quarkus and used by some of its extensions.Exact memory impact has not been measured. Do you think it could affect the memory footprint significantly, i.e. tens of MB or more?If we are always adding the configuration for the thread pool used by the artifact, it does. Not significantly, but IMO we should be opinionated about what we ship by default. For this particular case, I'm not sure if artifact binding will be used very often so I would avoid creating one by default and not use the default executor by default. But an executor that just runs the runnable in the calling thread.I agree, and in this case, there is no new thread pool created by default. When needed, it can be configured as any other dedicated thread pool accessible via ExecutorsSpi. The default behaviour is - if the saml-artifact-pool thread pool is configured, use that one, otherwise use the default thread pool. It does not exactly align with the proposed behaviour as it is not executed on the calling thread. That by itself seems to be an interesting idea. On the other hand, I'd like to avoid having two implementations, one for sync and one for async mode, as this would open room for errors. Would AsyncResponse logic work on the same thread? Also, how to ensure that in sync there is some prevention measure of DoS explained above?I think the code from Michal is already processing on the same thread when not doing artifact binding.That's correct, it remained the same as it used to be for POST and Redirect bindings.There other ways to prevent and reduce the impact that does not require any application-level logic, but on the network level. Not different than other places that DoS can happen.Apologies, I got lost. Could you please elaborate more?Sure. Prevention can also be done through tools or even services provided by some cloud providers to protect layers 3, 4, or even 7 through a WAF.
People should be able to switch accordingly whether or not they want to use sync vs async. And use accordingly to their needs. Thus my question on how much you gain in terms of throughput. For some, sync should be fine.By switching to another thread, contextual data might be lost too because you are running in a different thread. For JPA, this is also a problem as the entity manager is not thread-safe. You end up having to do thins like https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L1342.This is acceptable for this use case.Sure, it works. But it is one of the things that made me give up on using async responses. The code is very unnatural and susceptible to errors when the code you are relying on changes to require additional data from the context. IMO, it also makes it hard to maintain.At the beginning of authz I tried to use async responses to process policies in parallel. That is why we have support for async in our filter. It did not work very mainly due to contextual data being lost and issues with JPA.Btw, you shouldn't need to manually handle the transaction for async responses if you support that in our filer. I just realized that we removed support for async responses from there, but it was enough to properly handle async responses and delay closing sessions by using a listener. Looks more natural and more generic than the approach taken based on a custom transaction. See https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L390.We did that a long time ago. Although Bill was not very confident about the usage :) And he was right, at least for the use cases I had in mind.This looks like a good idea. The only concern is how to guarantee that the same thread which is calling the session.close() has also been the one which opened the session? IOW how to guarantee that in the case of an async response being processed in another thread, that the transaction is properly committed even for non-thread safe objects like the entity manager?The AsyncResponse basically delays the closing of responses. As we keep the reference to the session, we should be able to get any contextual data and providers created during the session lifetime from the listener.Let me rephrase the question, I think we diverged here. There is no benefit in creating an async response and then processing it synchronously. So the logic building the AsyncResponse can be (and usually is) run in another thread than the one that initially processed the request and created the Keycloak session and providers within. Some of the providers are not thread safe, specifically DefaultJpaConnectionProvider which contains reference to EntityManager instance. Could the DefaultJpaConnectionProvider instance from the original session be safely used in the AsyncResponse code running in another thread?See https://docs.oracle.com/javaee/7/api/javax/servlet/AsyncListener.html. You might also want to look at older versions of the KeycloakSessionFilter (which no longer exists but the history is there).
People should be able to switch accordingly whether or not they want to use sync vs async. And use accordingly to their needs. Thus my question on how much you gain in terms of throughput. For some, sync should be fine.By switching to another thread, contextual data might be lost too because you are running in a different thread. For JPA, this is also a problem as the entity manager is not thread-safe. You end up having to do thins like https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L1342.This is acceptable for this use case.Sure, it works. But it is one of the things that made me give up on using async responses. The code is very unnatural and susceptible to errors when the code you are relying on changes to require additional data from the context. IMO, it also makes it hard to maintain.At the beginning of authz I tried to use async responses to process policies in parallel. That is why we have support for async in our filter. It did not work very mainly due to contextual data being lost and issues with JPA.Btw, you shouldn't need to manually handle the transaction for async responses if you support that in our filer. I just realized that we removed support for async responses from there, but it was enough to properly handle async responses and delay closing sessions by using a listener. Looks more natural and more generic than the approach taken based on a custom transaction. See https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L390.We did that a long time ago. Although Bill was not very confident about the usage :) And he was right, at least for the use cases I had in mind.This looks like a good idea. The only concern is how to guarantee that the same thread which is calling the session.close() has also been the one which opened the session? IOW how to guarantee that in the case of an async response being processed in another thread, that the transaction is properly committed even for non-thread safe objects like the entity manager?The AsyncResponse basically delays the closing of responses. As we keep the reference to the session, we should be able to get any contextual data and providers created during the session lifetime from the listener.Let me rephrase the question, I think we diverged here. There is no benefit in creating an async response and then processing it synchronously. So the logic building the AsyncResponse can be (and usually is) run in another thread than the one that initially processed the request and created the Keycloak session and providers within. Some of the providers are not thread safe, specifically DefaultJpaConnectionProvider which contains reference to EntityManager instance. Could the DefaultJpaConnectionProvider instance from the original session be safely used in the AsyncResponse code running in another thread?See https://docs.oracle.com/javaee/7/api/javax/servlet/AsyncListener.html. You might also want to look at older versions of the KeycloakSessionFilter (which no longer exists but the history is there).How does AsyncListener guarantee thread-stickiness and cover the use case above?
Exact memory impact has not been measured. Do you think it could affect the memory footprint significantly, i.e. tens of MB or more?If we are always adding the configuration for the thread pool used by the artifact, it does. Not significantly, but IMO we should be opinionated about what we ship by default. For this particular case, I'm not sure if artifact binding will be used very often so I would avoid creating one by default and not use the default executor by default. But an executor that just runs the runnable in the calling thread.I agree, and in this case, there is no new thread pool created by default. When needed, it can be configured as any other dedicated thread pool accessible via ExecutorsSpi. The default behaviour is - if the saml-artifact-pool thread pool is configured, use that one, otherwise use the default thread pool. It does not exactly align with the proposed behaviour as it is not executed on the calling thread. That by itself seems to be an interesting idea. On the other hand, I'd like to avoid having two implementations, one for sync and one for async mode, as this would open room for errors. Would AsyncResponse logic work on the same thread? Also, how to ensure that in sync there is some prevention measure of DoS explained above?I think the code from Michal is already processing on the same thread when not doing artifact binding.That's correct, it remained the same as it used to be for POST and Redirect bindings.There other ways to prevent and reduce the impact that does not require any application-level logic, but on the network level. Not different than other places that DoS can happen.Apologies, I got lost. Could you please elaborate more?Sure. Prevention can also be done through tools or even services provided by some cloud providers to protect layers 3, 4, or even 7 through a WAF.In general this is correct, though lower than 7 do not apply to SAML/ARTIFACT but to network topology, and 7 is only level for HTTP. So it could be usable only for clients using the artifact resolution service in Keycloak - yet this is resolved in sync mode [1]. For level 7 filtering of SAML endpoint, all SAML requests (including plain POST and REDIRECT, irrespective of ARTIFACT) would be affected. WAFs able to distinguish types of SAML messages (AuthnRequest vs ArtifactResolution) exist but are far from simple to be configured.The use case for the async thread pool is for the other direction, where Keycloak is reaching out for an artifact resolved by an external service, and waiting for the external service to respond.Still I'd like to understand the point you are making here WRT usage of the async thread pool and the DoS. It seems to me that you suggest employing an advanced WAF and not using AsyncResponse rather than using AsyncResponse and have DoS under control of a single-line configuration in server executors config which is the current state, is this correct? If not, could you please state the point?Yes, I don't think that the strategy herein discussed is the more appropriate to prevent DoS (and its different forms). IMO, no matter what you do, you will always end up with additional security barriers other than using this executor to relieve the worker thread. So, what is the point of introducing this approach? Shouldn't we expect a much more robust/smart defense from a WAF (and from additional barriers that can prevent other forms of DoS too)?I don't want to block your plans, I'm just expressing my opinion though. I might be wrong and missing the real benefits from the approach you guys are proposing. As I said, the idea makes sense but I'm not sure how far it would help in practice. And I do think we should rather focus on discussing first how to properly make external calls to external services.And yeah, you are solving the problem of worker thread pool exhaustion but not the problem of system resource consumption. Even though you have a separate and configurable thread pool specific for this purpose. Again, I might be wrong but that is my first impression.
People should be able to switch accordingly whether or not they want to use sync vs async. And use accordingly to their needs. Thus my question on how much you gain in terms of throughput. For some, sync should be fine.By switching to another thread, contextual data might be lost too because you are running in a different thread. For JPA, this is also a problem as the entity manager is not thread-safe. You end up having to do thins like https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L1342.This is acceptable for this use case.Sure, it works. But it is one of the things that made me give up on using async responses. The code is very unnatural and susceptible to errors when the code you are relying on changes to require additional data from the context. IMO, it also makes it hard to maintain.At the beginning of authz I tried to use async responses to process policies in parallel. That is why we have support for async in our filter. It did not work very mainly due to contextual data being lost and issues with JPA.Btw, you shouldn't need to manually handle the transaction for async responses if you support that in our filer. I just realized that we removed support for async responses from there, but it was enough to properly handle async responses and delay closing sessions by using a listener. Looks more natural and more generic than the approach taken based on a custom transaction. See https://github.com/keycloak/keycloak/blob/8cd56b37d36c868ff52a351b1e736027bf62aeed/services/src/main/java/org/keycloak/protocol/saml/SamlService.java#L390.We did that a long time ago. Although Bill was not very confident about the usage :) And he was right, at least for the use cases I had in mind.This looks like a good idea. The only concern is how to guarantee that the same thread which is calling the session.close() has also been the one which opened the session? IOW how to guarantee that in the case of an async response being processed in another thread, that the transaction is properly committed even for non-thread safe objects like the entity manager?The AsyncResponse basically delays the closing of responses. As we keep the reference to the session, we should be able to get any contextual data and providers created during the session lifetime from the listener.Let me rephrase the question, I think we diverged here. There is no benefit in creating an async response and then processing it synchronously. So the logic building the AsyncResponse can be (and usually is) run in another thread than the one that initially processed the request and created the Keycloak session and providers within. Some of the providers are not thread safe, specifically DefaultJpaConnectionProvider which contains reference to EntityManager instance. Could the DefaultJpaConnectionProvider instance from the original session be safely used in the AsyncResponse code running in another thread?See https://docs.oracle.com/javaee/7/api/javax/servlet/AsyncListener.html. You might also want to look at older versions of the KeycloakSessionFilter (which no longer exists but the history is there).How does AsyncListener guarantee thread-stickiness and cover the use case above?It does not guarantee thread-sickness as it usually executes in another thread. The approach here creates a session for each execution so you should have no issues with JPA.
As I mentioned in the beginning the issues around thread-safety of EM were one of the problems I faced.The listener approach might be handy in case we are sure that we can safely reuse the session in another thread. Hard to achieve due to limitations in some providers.Btw, do you expect the new storage to somehow help with this?