Resolving DBRef in mongo-spark

83 views

Skip to first unread message

Péter Király

unread,

Feb 28, 2018, 1:05:42 PM2/28/18

to mongodb-user

Dear Mongo users,

I would like to analyze a Mongo database which intensively uses DBRefs to link different parts of a record to the central point. Sometimes these links form a multilevel hierarchy. In total there are about 20 collections in the database which are linked with the DBRefs. The mongo-spark settings requires to specify one collection, and for me it seems that there is no was to resolve these references within the library. I tried to add the normal java connector, as well, but some of its classes don't implement the Serializable interface so the can not be used within Spark. I tried different methods, and spend quite some time with searching for solutions, but at the end I come of with a strange solution. I've created a REST interface which resolves the links and builds the JSON of the whole object, and I call it for each record in Spark:

JavaRDD<String> baseCountsRDD = rdd.map(record -> {
   String id = record.get("about", String.class);
   String jsonString = client.getRecord(id);
   return analyzer.analyze(jsonString);
});

Since the database is large, this REST layer is an overhead on the system.

what I would like achive is something like this:

JavaRDD<String> baseCountsRDD = rdd.map(record -> {
   String jsonString = resolver.resolveLinks(record);
   return analyzer.analyze(jsonString);
});

Unfortunatelly DBRef does not appear at all in the mongo-spark code base and the documentation and all the examples are based on a single collection.

Do you have any idea I could try?

Best,
Péter

Wan Bachtiar

unread,

Mar 27, 2018, 2:01:53 AM3/27/18

to mongodb-user

Do you have any idea I could try?

Hi Péter,

Please see the mapping of Datatypes in the MongoDB Connector for Spark documentation.
For DBRef example, you can utilise StructField("oid", StringType, true) mapping.

You may also find SPARK-143 useful.

Regards,
Wan.

Reply all

Reply to author

Forward

0 new messages