Merging data from multiple data sources

287 views
Skip to first unread message

efr...@twitter.com

unread,
Dec 5, 2017, 1:33:28 PM12/5/17
to sangria-graphql
We have an admittedly exotic setup here and I need some advice.
We're merging two Sangria Schemas.

Schema A: Hand-coded Schema that uses the sangria.derive macros. Values are fetched from thrift services and marshalled into Scala objects.
Schema B: Dynamic schema that represents the values as nested Map[String, Any]. Field values are resolved by name from the Context.value Map.
The goal is to eventually remove the hand-coded Schema A and have Schema B deliver the same data.

We start with Schema A, walk down from the Query object, and merge ObjectTypes when they're compatible.
 
I've been using a MappedSequenceLeafAction to merge the results of the Actions from both schemas for an ObjectType.
Something like 

resolve = ctx => MappedSequenceLeafAction(
  Seq(objectTypeFromA.resolve(ctx), objectTypeFromB.resolve(ctx)), 
  { case Seq(l, r) => MergedValue(l, r) }
)

So the data in the Context.value field for a merged ObjectType is a MergedValue(left: Any, right: Any) 

The resolve function for the ObjectType in Schema A is wrapped with a short function that extracts the right side of the MergedValue and passes it to the macro-derived resolve function.
If the ObjectType contains fields that originated in Schema B, the resolve function for those fields already contains code to unpack the MergedValue.left field and use that.
Schema B fields use the last part of the Context.path to look up the value in the Map[String, Any] contained in the left field.

This all works well with one level of nesting. The problem comes up when fields from Schema B are two-levels deep inside an ObjectType that has been merged.
When we resolve the value for the field representing the merged ObjectType we hand the "left" side into the original resolve function.
When the field from Schema B is nested inside of this ObjectType we've lost the right side. All we have is the marshalled Scala object representing the ObjectType.

I can think of two ways to fix this:

1. Bake the ability to unpack the MergedValue into the macro-derived sangria schema. Then use the MergedValues *everywhere*.
I'm not sure how to modify the resolve functions though and it makes an already complicated code path even more hairy.

2. Put the results from any field fetched from Schema B into a Map that's stored in the QueryContext itself. 
Then we don't need the MergedValue at all anymore. Schema B fields would traverse the nested maps (or a similarly convenient data-structure) based on the path in Context.path.
The problem with this is that I'm not sure if there's a hook available in Sangria that would allow me to do this. 
I've looked into using a sangria.middleware.Middleware or a QueryReducer but there's no hook for "the data has been resolved, but the fields haven't yet".
 
All of the Schema B fields are resolved via a custom DerferredResolver so I'm not sure if an UpdateCtx action can be used. 
I'd need to update the context after the deferred values were fetched but fields hadn't yet been assigned values.

Sorry for the long post. You're a star if you've gotten this far ;)

Erik

Example:

################
# Schema A
type User {
  foo: String
}

# The logged-in user
type Viewer {
  user: User
}

query {
  viewer: Viewer!
}

################
# Schema B
type User {
  foo: String
  bar: String
}

type Viewer {
  user: User
  someOtherViewerData: String
}

query {
  viewer: Viewer!
}

##############
# Merged Schema
type User {
  foo: String                                   # From both, prefer A for now, eventually B
  bar: String                                   # From schema B but called after User.resolve has stripped the MergedValue to satisfy the macro resolvers                        
}

# The logged-in user
type Viewer {
  user: User                                  # Fetch User from both Schema A and B providers. Resolvable when ctx.value = MergedValue(l: User, r: Map("viewer" -> Map("user" -> ...., "someOtherViewerData": "hello")))
  someOtherViewerData: String   # From schema B
}

query {
  viewer: Viewer!                          # No real backing data. The logged in user is derived from data in the Context
}

Oleg Ilyenko

unread,
Dec 5, 2017, 6:15:28 PM12/5/17
to sangria-graphql
Hi Eric,

Thanks a lot for a detailed explanation! I think I understood the issue. 

In general, I would prefer the first solution since it does not rely on the context and works based on pure value propagation (relying too much on the context value and can be error-prone). The thing about macros is that they often are designed for a specific use-case and it's hard to re-use it for a slightly different scenario. That said, the macro itself is relatively small (about 300 LOC) and most of its code is there for the sake of configuration and customizability. It is also quite easy to introduce new variations of macro that are use-case specific. An example would be `deriveContextObjectType` - it is very similar to the standard macro but adds extra logic in the resolve functions.

I think it makes sense to create a variation of the macro just for this special use-case, at least we can try. In fact, I experimented a bit and implemented just this in a separate branch, just to demonstrate the idea. It would be great if you could check this test and share your opinion:


Nice things about this approach:

* It is recursive and it propagates the wrapped type that holds extra information
* if the field is not supposed to be wrapped (e.g. it is a simple scalar value), no wrapping takes place (only unwrapping). It is determined based on the type-class, so it should be type-safe (at least I think it is, I haven't spent much time on it)
* It does not rely on the execution path or the context value

It's just an example, but maybe it can help you. You can try to copy-paste the macro code in your project (just one or two files, the macro itself is self-contained and quite isolated from the rest of the codebase) and make the adjustments for your use-case. Hopefully, my example can provide a starting point, it is not finished though. (maybe this variation of the macro can be a worthy inclusion in the library at some point, but I'm not yet 100% sure. It can also be shipped in a separate library)

Let me know what you think about it :)

Cheers,
Oleg

efr...@twitter.com

unread,
Dec 6, 2017, 2:37:38 PM12/6/17
to sangria-graphql
I'll give it a shot, thanks!

efr...@twitter.com

unread,
Dec 7, 2017, 4:55:09 PM12/7/17
to sangria-graphql
Oleg thanks so much for this. I think it's going to work well for us.

There's one wrinkle, we're implementing the Node interface for refetching. Now that the graphql type for Foo is MergedValue[Foo], sangria isn't able to determine that Foo implements Node.

A sanitized stracktrace 

   Unhandled exception: 
sangria.execution.UndefinedConcreteTypeError: Can't find appropriate subtype of an interface type 'Node' for value of class 'com.company.graphql.common.types.Foo' at path 'node'. Possible types: Foo (defined for 'com.company.project.graphql.MergedValue'), ....
at sangria.execution.Resolver.resolveValue(Resolver.scala:911)
at sangria.execution.Resolver.resolveValue(Resolver.scala:807)
at sangria.execution.Resolver$$anonfun$48$$anonfun$apply$33.apply(Resolver.scala:609)
at sangria.execution.Resolver$$anonfun$48$$anonfun$apply$33.apply(Resolver.scala:604)
at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:253)
at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:251)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)

Would we need to modify sangria type resolution to recognize this?

Oleg Ilyenko

unread,
Dec 7, 2017, 6:04:01 PM12/7/17
to sangria-graphql
Great, glad that it was useful! I think it should pretty easy to fix the issue with polymorphic types. In general, you can always use `myObjectType.withInstanceCheck((value, thisClass, thisGraphqlType) ⇒ ... /* your check whether value is the instance of this particular type*/)`. Since the actual domain class is wrapped, it have lost its identity and the type that was captured is a wrapper type and not of the actual entity type. So the default check gets confused.

One of the ways to solve this is to capture the entity type:

case class MergedValue[T : ClassTag](value: T, extras: Map[String, Any]) {
  lazy val entityClass
= classTag[T].runtimeClass
 
def isOfMatchingType = entityClass.isAssignableFrom(value.getClass)
}

and then use this information to override the instance check:

deriveWrappedObjectType[Repo, MergedValue, Product]()
 
.withInstanceCheck((value, _, _) value.asInstanceOf[MergedValue[_]].isOfMatchingType)

You can also bake this into the macro itself, just to avoid repeating it for every type. You can also use other criteria to implement this check (especually considering that inside of the macro you have full type information about the entity and wrapper types), but in any case, you need to use `withInstanceCheck` to tell the library whether the provided `value` (accessed via `Node` interface) can be handled by this particular `ObjectType`.  

Erik Froese

unread,
Dec 8, 2017, 2:36:24 PM12/8/17
to sangria-graphql
Thanks again Oleg! I baked it into the deriveWrappedObjectType macro.

One more thing...

I had to do a bit of ugly hacking on our Node interface itself so I could pull the id out of the value. 
Previously the value type for the interface below was our Node trait. In order to extract the id I had to change it to Any and match on it to support MergedValue[T]'s and T's.
Is there something cleaner I could be doing?

val NodeType: InterfaceType[QueryContext, Any] = InterfaceType(
"Node",
"An entity with a globally unique ID",
fields[QueryContext, Any](
Field("id", IDType, Some("Globally unique identifier"),
resolve = ctx => ctx.value match {
case MergedValue(node: Node, _) => makeGlobalId(node)
case node: Node => makeGlobalId(node)
}
)
)
)

Oleg Ilyenko

unread,
Dec 13, 2017, 6:10:47 AM12/13/17
to sangria-graphql
The `Node` trait is more of a convenience for some of the use-cases. The library code depends on `Identifiable` and `IdentifiableNode` type classes, so it should be possible to describe `MergedValue` as a "node" by providing an implicit instance of either of these type classes.
Reply all
Reply to author
Forward
0 new messages