We have 2 databases: variome and annotationsource. We first load data into the variome database, then using the info stored in annotationsource tables to do annotation, and store corresponding annotated fields into the variome table.
Our annotationsource data are of the format (i.e. table as1)
{_id: {chr: 1, pos: 12345}, xxx: xxx,... feature: [{field1: value1, field12: value12,...}...]}
Our annotated variome table data are of the format
{_id: {chr: 1, pos: 12345}, xxx: xxx, ... annotation: { as1: [{field1: value1, field12: value12,...}...], as2: [{...}]}
so annotation.as1[0] in variome table is copied from table as1, feature[0].
Our original variome table size is 1,468,515,568, after annotation is 11,938,505,952, so the original table is only 12% of the annotated table. There could be significant savings if we use dbref instead of copying over the annotated information.
We were thinking of storing the annotated data in the following format:
{_id: {chr: 1, pos: 12345}, xxx: xxx, ... annotation: as1: [$ref: 'as1', $id: {_id: {chr: 1, pos: 12345}}, $ref: 'as2'. ...]}
We simply store the _id of the annotationsource table, then during search, we refer back to annotationtable as1 for data.
Our search are of the format db.variome_table.find({"annotation.as1.field1": /as100/}).
We would like to know if mongodb will automatically refer to the annotationsource documents for all rows, instead of us to fetch() each document. We also would like to know if the Java driver supports this automatically reference back to the original source, instead of doing it in client.
I hope I have describe the situation clearly. Thank you very much for your help in advance!