Extracting reference data to its own service

92 views
Skip to first unread message

Sebastian Brudziński

unread,
Jul 29, 2016, 9:41:29 AM7/29/16
to openlm...@googlegroups.com
Hello.

Since we would like to start working on OLMIS-763 soon
(https://openlmis.atlassian.net/browse/OLMIS-763 Extract reference data
from Requisition Service into its own service), I'd like to start a
discussion concerning the handling of the relationships between
reference data and other services. Since reference data lives in the
requisition service for now, the relationship between eg. Requisition
and Facility is as simple as defining a field of the given type in the
other entity. This will obviously need to change, once the reference
data is extracted to its own service. I'm wondering how should we
proceed with the refactor of those relationships. One option is to
reference them by UUID. In that case, the service using the reference
data would be responsible for the retrieval of the actual object, by
requesting an instance of the given UUID. Something more fancy could
involve calling and retrieving the instances from reference data in
getters. This would make it more transparent to the service
implementation, as the logic in the service could just retrieve a
related object and the magic - that is the object retrieval - would take
place under the covers.

Please let me know your thoughts on this. I'm wondering if you have got
a similar vision concerning the handling of the relationships with
reference data.

Best regards,
Sebastian Brudziński.

Rich Magnuson

unread,
Aug 1, 2016, 6:17:18 PM8/1/16
to Sebastian Brudziński, openlm...@googlegroups.com
I hope the tech committee and other team members can offer their thoughts on this one.

The strategy Sebastian outlined seems sensible, as does the getter idea.

This topic raises the question of caching, something the 'Building Microservices' book has dedicated several pages to discuss. There are lots of options, and we should start talking about what makes sense. Let's start the discussion here, and we'll see if we need to make further research stories. I don't think it should be a blocker to the separation though.

Rich
--
You received this message because you are subscribed to the Google Groups "OpenLMIS Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev...@googlegroups.com.
To post to this group, send email to openlm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openlmis-dev/579B5D06.3020403%40soldevelo.com.
For more options, visit https://groups.google.com/d/optout.

Darius Jazayeri

unread,
Aug 2, 2016, 11:08:27 AM8/2/16
to Rich Magnuson, Sebastian Brudziński, openlm...@googlegroups.com
(About to get on a plane)

I think it's definitely worth building a shared library that can be used in any consuming microservice to cache the reference data.

-Darius


For more options, visit https://groups.google.com/d/optout.



--

Darius JazayeriPrincipal Architect - Global Health
ThoughtWorks

Jake Watson

unread,
Aug 2, 2016, 11:57:00 AM8/2/16
to Rich Magnuson, Sebastian Brudziński, openlm...@googlegroups.com
Another thing to factor into your thinking is what caching strategy you
will/won’t use for this data?


On 8/1/16, 3:17 PM, "openlm...@googlegroups.com on behalf of Rich
Magnuson" <openlm...@googlegroups.com on behalf of
>https://groups.google.com/d/msgid/openlmis-dev/BN1PR02MB021844104D4494FDF7
>E9E2C93040%40BN1PR02MB021.namprd02.prod.outlook.com.

Josh Zamor

unread,
Aug 3, 2016, 5:40:46 PM8/3/16
to OpenLMIS Dev, rich.m...@villagereach.org, sbrud...@soldevelo.com
Agreed that a shared library is a good route to go to reduce the work that each service would otherwise go through in terms of defining a type for a reference data representation.

Before we dive fully into caching or talking more specifically about strategies for resiliency or performance, I'd like to clarify what we mean by the getter solution.  I think one way to think about how data should be referenced from another service is to keep in mind the goal of separation of concerns.  That is if we're going to provide a shared library of types to store reference data locally within another service, I wouldn't think that we'd store everything about a piece of reference data.  If on one end of the spectrum of separation of concerns is only storing a reference data entities UUID then the other end is likely representing every field of the remote reference data in the local shared type.

I'd like to propose that the middle ground should be more minimal than complete as:
  • higher degree of separation of concerns
  • more obvious what is cached locally compared to what must be fetched explicitly
  • 80% of use-cases likely can be served by some very simple representations


I do think that most usages take a simple form, and therefore we could start with this simple pattern:

- UUID

- DisplayName


This makes it clear that any reference data I might store as a service is a persisted entity in reference data (UUID), and then gives common meta-data that most entities have: a display name aimed at the user.  Moving beyond this we might add displayOrder for when we typically want to show items in drop down lists or as part of form elements - and it can hide reference data's implementation of display order when the resultant set is a combination of different entities display orders (e.g. product category and facility type approved product on the requisition form).  We could also easily reason about and implement equals() and compareable  This would need some more thought, but I do think we should stick to keeping the representations that are universally shared and referenced simpler overall as it'll keep our expectations clear, simplify our general strategy for caching and most importantly ensure we're tending toward a higher degree for separation of concerns - a key goal of the re-architecture.


When we really need more complete representations, I think we'd be explicit about that type in the context (service) that it's needed, and not necessarily as a shared library among all services.


Best,

Josh

>To post to this group, send email to openlm...@googlegroups.com.
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/openlmis-dev/579B5D06.3020403%40soldevel
>o.com.
>For more options, visit https://groups.google.com/d/optout.
>
>--
>You received this message because you are subscribed to the Google Groups
>"OpenLMIS Dev" group.
>To unsubscribe from this group and stop receiving emails from it, send an

Darius Jazayeri

unread,
Aug 4, 2016, 5:57:27 AM8/4/16
to Josh Zamor, OpenLMIS Dev, Rich Magnuson, Sebastian Brudziński
Hi Josh,

My default assumption is that we just cache the complete over-the-wire REST representations of reference data in the other services.

I hear you that having a simplistic minimal representation will simplify in some ways, but in other ways it makes things more complex, since services now need to deal with multiple different representations of the same objects.

Are there specific types of reference data whose representations are so complex, or whose data are so voluminous, that we shouldn't just "store everything about a piece of reference data"?

-Darius

>To post to this group, send email to openlm...@googlegroups.com.
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/openlmis-dev/579B5D06.3020403%40soldevel
>o.com.
>For more options, visit https://groups.google.com/d/optout.
>
>--
>You received this message because you are subscribed to the Google Groups
>"OpenLMIS Dev" group.
>To unsubscribe from this group and stop receiving emails from it, send an
>To post to this group, send email to openlm...@googlegroups.com.
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/openlmis-dev/BN1PR02MB021844104D4494FDF7
>E9E2C93040%40BN1PR02MB021.namprd02.prod.outlook.com.
>For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "OpenLMIS Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev...@googlegroups.com.

To post to this group, send email to openlm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Josh Zamor

unread,
Aug 8, 2016, 3:56:17 PM8/8/16
to OpenLMIS Dev, josh....@villagereach.org, rich.m...@villagereach.org, sbrud...@soldevelo.com
Hi Darius,

Yup, let me clarify.  There'd been questions in other discussions about making it easy to load object graphs for ease of use and so I'd wanted to clarify how the "getter" solution would work.  In my experience I've found that trying to transfer object graphs between contexts is usually a false economy.

So if reference data had an association between classes, e.g. a Facility has approved Programs and in each of those programs there was Facility Type Approved Program Products, I wouldn't encourage us to build the general interface of retrieving a facility to also come with representations of those other associated entities. I then further wouldn't expect that the shared library of reference data types would support the object graphs - rather just the individual objects.

This isn't to say that their won't be interfaces that load multiple types of entities all at once, of course.  Just that a more minimal cached representation often gets most of the value for the least complexity.
>To post to this group, send email to openlm...@googlegroups.com.
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/openlmis-dev/579B5D06.3020403%40soldevel
>o.com.
>For more options, visit https://groups.google.com/d/optout.
>
>--
>You received this message because you are subscribed to the Google Groups
>"OpenLMIS Dev" group.
>To unsubscribe from this group and stop receiving emails from it, send an
>To post to this group, send email to openlm...@googlegroups.com.
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/openlmis-dev/BN1PR02MB021844104D4494FDF7
>E9E2C93040%40BN1PR02MB021.namprd02.prod.outlook.com.
>For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "OpenLMIS Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev+unsubscribe@googlegroups.com.

To post to this group, send email to openlm...@googlegroups.com.

Paweł Gesek

unread,
Sep 2, 2016, 10:29:26 AM9/2/16
to openlm...@googlegroups.com

Hello,

just wanted to bump this and clarify some things since we are moving the reference data to it's own separate service and currently connecting the two. Do we have a final decision on how reference data should be represented on the side of requisitions?

From what I understand the idea is to store the id and display-name of the reference data object (for example a program) in the requisition database. We will have an automagic solution on the side of requisitions, so that calling getProgram() on a requisition will retrieve that program from the reference data service.

Am I understanding this correctly?

Regards,
Paweł
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev...@googlegroups.com.

To post to this group, send email to openlm...@googlegroups.com.

Darius Jazayeri

unread,
Sep 2, 2016, 11:52:11 AM9/2/16
to Paweł Gesek, OpenLMIS Dev
I still think it's not actually less complex to cache only a minimal representation on the consumer side, because that leads to more complexity in the 10% of cases where you actually need more than just the name of a piece of metadata.

My recommendation is that in the consuming service, we fetch and cache the standard REST representation of all the metadata it needs. These standard representations shouldn't contain the full object graph, but rather links to other metadata. (E.g. facility would have a list of links to programs, but not the details of those programs. So if the requisition service cares about both, it would fetch and cache both facilities and programs. If another service only cares about the facility names, it just caches facilities, and never tries to follow the link to programs.)

(Pawel, are you suggesting that Requisition would have getProgramStub() that hits the cache and gives you just name+description, and getProgram() which makes a REST call to get the full object?)

-Darius

--
You received this message because you are subscribed to the Google Groups "OpenLMIS Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev+unsubscribe@googlegroups.com.
To post to this group, send email to openlm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Josh Zamor

unread,
Sep 2, 2016, 3:03:17 PM9/2/16
to OpenLMIS Dev, pge...@soldevelo.com
Hi Darius, you said what I was trying to say better - so long as the "normal" representation's of reference data don't typically go down a rabbit hole, then caching the full representation is good.  My minimal recommendation above was trying to convey that we shouldn't drift in the full object graph direction.

Hi Pawel,  my suggestion to cache minimally is based on the concern above in terms of the representations.  I'm still concerned about this automagic RPC sounding method from a Requisition domain object to load a Program from Reference data over the wire.  That's more difficult to reason about and harder to test than it's worth.  Also good to be thinking about here, is there is reference data in the Requisition that won't be "cached", but rather "snapshoted" for the purposes of historical record.  Such as a product's name and description as those things even if they change in the reference data, don't change for requisitions that are already initiated.

Thanks for keeping this discussion moving all.

Best,
Josh

Weronika Ciecierska

unread,
Sep 6, 2016, 11:23:24 AM9/6/16
to OpenLMIS Dev, pge...@soldevelo.com
Hi everyone,
We are currently working on connecting reference data service with requisition.
In our opinion the idea to fetch and cache the standard REST representation with links to other metadata is a good one, but there are still some parts that remain unclear to us.
We also think that the getter solution is transparent to use - I think that we can keep a part of this idea and have a getter method, that would call a method in service to retrieve information about cached reference data object or fetch the object if not cached.

Our questions regarding the caching strategy:

1) Should object in the consuming service (e.g. requisition) reference object from data service by UUID? Do we want to have only one cached entity with given UUID or to cache multiple versions of given entity?

If we want to have always only one version of given entity, we can create a table called e.g. "reference_data_objects" in the consuming service database
and "referenceDataObject" entity that would contain UUID and standard REST representation.
That would allow us to check whether the representation object with given UUID is present in the consuming service database - if yes, then use it, if not, then fetch and cache it.
If we want to cache multiple versions of one entity (e.g. some objects want to use old name of Facility, and other, newer objects want to use the updated name),
we would have to add our own UUID to the reference data object in consuming service and reference this object by our own UUID.

2) How should we know when to used cached version of an object and when to update it or retrieve a new version and save it as new entity?

Should we have a methods to get cached version and to get the latest one and update cache and then call it in the code when needed? Or do we assume that we won't need to update cached object?


I also think that it would be good to include generic methods to fetch and cache reference data objects to the template-service since it will be probably needed in almost all services, not only requisition.

Best regards,
Weronika
>To post to this group, send email to openlm...@googlegroups.com.
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/openlmis-dev/579B5D06.3020403%40soldevel
>o.com.
>For more options, visit https://groups.google.com/d/optout.
>
>--
>You received this message because you are subscribed to the Google Groups
>"OpenLMIS Dev" group.
>To unsubscribe from this group and stop receiving emails from it, send an
>To post to this group, send email to openlm...@googlegroups.com.
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/openlmis-dev/BN1PR02MB021844104D4494FDF7
>E9E2C93040%40BN1PR02MB021.namprd02.prod.outlook.com.
>For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "OpenLMIS Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev...@googlegroups.com.

To post to this group, send email to openlm...@googlegroups.com.



--

Darius JazayeriPrincipal Architect - Global Health
Telephone
+1 617 383 9369
ThoughtWorks
--
You received this message because you are subscribed to the Google Groups "OpenLMIS Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev...@googlegroups.com.

To post to this group, send email to openlm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openlmis-dev/e1d3e484-c8c0-445d-8e07-c37b5be38557%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "OpenLMIS Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev...@googlegroups.com.

To post to this group, send email to openlm...@googlegroups.com.

Darius Jazayeri

unread,
Sep 6, 2016, 6:24:47 PM9/6/16
to Weronika Ciecierska, OpenLMIS Dev, Paweł Gesek
Hi Weronika,

Like Josh, I have concerns about automagic getter methods; they would make simple things easier, but they also allows complex things to happen unexpectedly. My initial reaction is to say don't do them, but maybe it's worth a spike/example on what the code would look like with an without. Can you go multiple levels deep? E.g. in the example we talked about above, can you do facility.getProgram().getFacilityTypeApprovedProgramProducts()?

A specific issue is that if calling facility.getProgram() may sometimes trigger a local database call, and may sometimes trigger a REST call, then you can't write good unit tests for any code that calls it.

About the "snapshot" requirement:

I would think that if storing references to past versions of reference data is important, then this should be solved once in the reference data service itself, rather than offloading that complexity to every other consuming service. (E.g. if it's important to know what a product used to be called last month when the requisition was place, then I presume we'll need the same thing for orders, and for shipments, etc.)

So I suggest solving this requirement in the reference data service itself, by having "versions" for all the metadata, each version identified by a UUID (as well as the product itself having a UUID). Then, in the requisition service you'd be saving the UUID reference to a "product version" instead of a product.

About knowing when to use cached versions:

I assume that calls between services are cheap (they're over a fast network), so I would expect the other services to periodically and frequently refresh their cached versions of reference data, so that they can update versions before any user request actually requires it. One approach here would be if we're fetching all of a particular type of reference data at once, to have the ETag on a call to .../products be the timestamp of the last update to any product.

-Darius

To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev+unsubscribe@googlegroups.com.

To post to this group, send email to openlm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Darius Jazayeri

unread,
Sep 7, 2016, 6:59:13 PM9/7/16
to Weronika Ciecierska, OpenLMIS Dev, Paweł Gesek
Hi All,

Thanks for the discussion so far! Josh, Jake, and I spoke about this today, and agreed to a plan for the immediate future. Basically, let's be agile/iterative and ruthlessly simplify what must be done in the first iteration; we'll defer a lot of the complexity to future iterations.

Specifically, please do the first iteration of this work with the following choices:
  • The requisition service stores just UUID references to reference data.
  • No caching (yet). In the first pass, make a request over the wire every time you need a piece of reference data.
  • No snapshots/versioning (yet).
  • No automagic getters (yet?)
These simplifications will allow devs to play this story much faster, while focusing on only what's truly necessary to separate the services, and write appropriate tests. And then we can bring back the complexity piece by piece.

Does this clarify and unblock?

-Darius

PS- For later iterations, someone should put stories in the backlog about:
  • Ensuring that if I edit the name of a product today, a requisition from last month still refers to the old product name. (could be handled by snapshotting the state of a piece of reference data, or introducing reference data versioning; should be considered alongside any requirements about audit trail)
  • Caching copies of reference data within the requisition service.
Also, after there's a concrete example of code that is annoying to write without the getters, we can talk about a specific proposal, and whether the extra testing complexity introduced by automagic getters is worth it.

Weronika Ciecierska

unread,
Sep 8, 2016, 3:59:37 AM9/8/16
to OpenLMIS Dev, wciec...@soldevelo.com, pge...@soldevelo.com
Hi Darius! Thank you for the clarification, all points sound reasonably and are clear to us.
One more thing is the shared library matter - I think that it also should be done in the first iteration, so that the consuming service knows what type to expect from the reference data service.
What do you think about it?

Regards,
Weronika



--

Darius JazayeriPrincipal Architect - Global Health
ThoughtWorks

Darius Jazayeri

unread,
Sep 8, 2016, 6:00:22 PM9/8/16
to Weronika Ciecierska, OpenLMIS Dev, Paweł Gesek
Hi Weronika,

In the first iteration you should not write a reusable shared library. I presume we'll do this later (and that might involve refactoring/repackaging some code you do write in the first iteration).

This article mentions some approaches to avoid writing an entire java class hierarchy purely to be able to deserialize, since typically most web service clients only need a small number of the available properties: http://martinfowler.com/articles/refactoring-document-load.html

(Unless the reference data service already contains DTO-style java classes that we can repackage into a library without repeating every class, and then reuse on the client side. I'm not familiar enough with the code to know if this would work now.)

-Darius





To unsubscribe from this group and stop receiving emails from it, send an email to openlmis-dev+unsubscribe@googlegroups.com.

To post to this group, send email to openlm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages