Hello everyone,
we have recently started working on OLMIS-4728 which tackles an issue that Malawi has started experiencing after updating to OpenLMIS 3.3. This is once again related to concurrent updates.
In simple words - requisition update can currently overwrite most
of the requisition properties (including status) if it's run
concurrently with other requests that change requisition state.
This is especially noticeable for updates that take longer time
(several seconds). Our current mechanism to prevent concurrent
updates is based on the timestamp property that is manually set
and validated during updates (the received timestamp must match
the database one). This works, but only for a case that doesn't
involve any concurrency. For concurrent requests, this validation
can pass for multiple requests and all the updates will be
successfully carried, since the timestamps will match until one of
the updates commits to the database.
The main problem therefore is that we are validating this at the application logic level. The proposed solution is moving this to the database level and introducing optimistic locking in one of the following ways:
1. Use optimistic concurrency control using @Version annotation.
A requisition would have a new property called "version". This
field is automatically updated by Hibernate as it's modified. Both
incremental numbers and timestamps are supported, but Hibernate
recommends dedicated version number property
(https://docs.jboss.org/hibernate/orm/4.0/devguide/en-US/html/ch05.html#d0e2251).
This essentially adds the version property to the "where" clause
of the database update query and in case no rows are affected, a
org.hibernate.StaleStateException is returned, indicating the
version has changed in the meantime. This ensures that no two
updates with the same version will ever be commited into the
database. We would still keep our timestamp validation in order to
fail faster if the timestamp mismatch is detected. The
version-based validation would primarily protect against
concurrent updates. This also means it wouldn't need to be exposed
to the REST API. This approach is also well documented here:
https://dzone.com/articles/version-based-optimistic
2. Implement custom post-update trigger
We would implement a custom trigger that is fired after each
update on the requisition and validate the timestamps there. This
means that we would remove updating the timestamp from the
application logic level and move it to the post-update trigger as
well (only if the validation passed). This would allow us to
re-use the timestamp field that we already have. We could also
keep the check on the application level to fail faster if the
mismatch is already detected. Custom trigger gives us more
flexibility, but may also require more work.
"Why do you have concurrent requests in Malawi?"
I can see this question coming and I've been looking/asking for
the reason as well. My best guess as of now is a mix of poor
Internet connection + users refreshing and re-saving/re-submitting
requisitions when it's taking too long. Whatever the reason is,
our API should be able to handle concurrent requests (even due to
the fact that multiple users can work on the same requisition).
I've also done a basic proof of concept for proposed solution #1
- I've added the version field to the requisition, added
artificial 5 seconds wait in the update endpoint and was able to
get the error as expected. Of course more testing and work is
required if we decide to go with this solution.
Sebastian
Brudziński
Senior Software Developer / Team
Leader
sbrud...@soldevelo.com
--
You received this message because you are subscribed to the Google Groups "OpenLMIS Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev...@googlegroups.com.
To post to this group, send email to openlm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openlmis-dev/8469f3a4-b54e-1047-dafd-5097ea2d1094%40soldevelo.com.
For more options, visit https://groups.google.com/d/optout.
Thanks for the input, Josh.
I'd agree that we don't need to think or consider pessimistic
locking for now.
On idempotent requests, that would definitely be helpful as well. I think we need something more than a random UUID generated at the page load as the idempotency-key. I'm not sure yet what would work better and possibly include some context though... I'll follow up on that. Two immediate thoughts at this point: How would we want to handle the case with two requests and the same idempotency-key? Would we immediately return an error code (eg. 409 Conflict) for the second and following requests? Also, would the idempotency-key be required in the requests or just supported? It looks like Stripe API simply supports idempotency but doesn't enforce it, which sounds like a good solution for us as well. We could use it on UI, but still give other clients option to call the API without generating random numbers.
It sounds like we would want to implement both optimistic locking and idempotent requests as a part of OLMIS-4728, am I understanding this correctly? As for the optimistic locking solution - are we happy with the incremental version field on the requisition object?
Best regards,
Sebastian.
The cached idempotency-key associated with the requisition object
would be sufficient, but we would need to consider exceptional
cases. As you mention, we would need to hear back from the API to
know that the key has been used and is no longer valid. If we
refresh the page or close the website completely, we will never
hear back from the API and we will keep the cached key. The API
request, however, will eventually get processed, causing the key
to no longer be valid (provided the response is 200 OK). I think
our UI client would need a way to somehow recognize that the
requisition has changed and that a new key needs to be generated.
Agreed that any validation errors don't need to mark the idempotency-key as used.
Best regards,
Sebastian.
Also, a follow-up thought on idempotent requests is whether we should mark the keys as used at the very beginning of the request and then immediately free them, in case the response wasn't 2xx. If we only mark keys as used when the request has finished executing, we aren't really solving anything, since the endpoints would still be accepting new requests with the same key as they are executing (and therefore have concurrent requests). An alternative would be to check whether the key exists at the end of the endpoint execution, rather than at the beginning, and rollback the whole transaction with an exception if the key exists. Thoughts?
Best regards,
Sebastian.
Thanks Josh, this all makes sense.
One doubt I still have is how we should handle a case when we
receive a second request (same Idempotency-Key) when the first one
is still being executed. If we were to immediately return 409
(without Location set, since the request hasn't finished yet),
this may give client a wrong impression that something went wrong
and it's safe to retry the same with another Idempotency-Key.
It seems that we need a way to distinguish between a 409 response caused by another request in progress and a 409 from a completely processed request. Would you agree? This could potentially be custom response header or different error message.
Best regards,
Sebastian.
Best,Josh
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev+unsubscribe@googlegroups.com.
Thanks Josh,
I think we can start with the version that doesn't involve saving
the response for now and easily expand our idempotent request
handling with that feature at a later point if desired. We haven't
picked our key storage tool yet, but I think it would be cool to
finally introduce Redis. Perhaps we can make more use of it at a
later point in time as well (caching?).
Best regards,
Sebastian.
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev...@googlegroups.com.
To post to this group, send email to openlm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openlmis-dev/2c0c699e-f53a-406c-b2b5-4e4ef697a1b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Sebastian
Brudziński
Senior Software Developer / Team
Leader
sbrud...@soldevelo.com
To view this discussion on the web visit https://groups.google.com/d/msgid/openlmis-dev/2c0c699e-f53a-406c-b2b5-4e4ef697a1b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Sebastian Brudziński
Senior Software Developer / Team Leader
sbrud...@soldevelo.com