Hi Janaka,
Sounds like an exciting project!
I'm the maintainer of the MongoDB Hadoop connector. I can tell you that the connector does have to map some MongoDB-specific types onto types that are native to another system (e.g.
Hive and
Pig), which is likely one problem that you'll have to solve when integrating MongoDB with Tajo. You can see how the connector deals with transforming these types here:
https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage#serialization-and-deserialization. In this case, Hive already has some notion of "nested types," so the transformation here is fairly straightforward.
I have no experience with Tajo whatsoever, so I can't give much advice on that front. However, you might look into what other storage engines for Tajo do, when their source/sink is a non-relational data source. Do the storage engines allow the user to configure every detail of how data transformations are applied (e.g. do they require that users declare fields/types in advance)? What kinds of assumptions do the storage engines make (e.g. do they assume that every document in a collection looks roughly the same)?
MongoDB also has a "BI" (Business Intelligence) connector, which has to map MongoDB documents onto a relational structure. Perhaps the documentation for this will give some inspiration or guidance:
https://docs.mongodb.org/bi-connector/schema-configuration/.
Best of luck on your project! It sounds like a lot of fun. Please feel free to ask for help anytime.
Luke