Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Mongoimport of large data
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Nicolas  
View profile  
 More options May 23 2012, 1:48 pm
From: Nicolas <dugue....@gmail.com>
Date: Wed, 23 May 2012 10:48:01 -0700 (PDT)
Local: Wed, May 23 2012 1:48 pm
Subject: Re: Mongoimport of large data
Thanks for your answer Jesse.

Yes, we are aware of the size limit. that's why, currently, we don't
use the "in" array with 3,5M integers. We just use the "out" array
which maximum size is 500 000 integers.
So, we're still wondering why there is an error, it may work.

Our graph is directed and contains cycles.

On 23 mai, 19:02, "A. Jesse Jiryu Davis" <je...@10gen.com> wrote:

> Be aware of the 16MB-per-document limit -- if you have 3.5M integers in an
> array, each int will require a type header (1 byte), an index (about 8
> bytes, if your indexes are in the millions), and the int (4 bytes),
> resulting in around 16 bytes to store each int. So you can't store more
> than about 1M integers in a single document's array.
> Seehttp://bsonspec.org/#/specification

> This may be the reason for your "Assertion: 10263:unknown error reading
> file mongodb " message.

> How is your graph structured? Is it a tree, a directed acyclic graph, or
> can it be cyclic?

> On Wednesday, May 23, 2012 12:11:03 PM UTC-4, Nicolas wrote:

> > Hi everybody,

> >    We try to store a graph by using MongoDB.
> >    We want to store our graph as an adjacency list : each node has an
> > identifier and a list of nodes which he's linked to.
> >    { id : identifier_of_the_node,
> >     out : [ list_of_identifier_of_the_nodes_he_s_linkedto]
> >    }

> >    However, the link of nodes stored in the "out" field can be really
> > large : 500 000 nodes for example.
> >    Thus, we don't know how to load efficiently our graph into mongoDB.
> >    Using mongoimport allows us to load really fast our data (Json
> > format) but it seems not to work with large "out" array : User
> > Assertion: 10263:unknown error reading file mongodb
> >    And using the JAVA API, it's too slow...
> >    How can we do ?

> >    Moreover, if we want to add a field "in" in our object to know
> > which node are linked to the current node :
> >     { id : identifier_of_the_node,
> >     out : [ list_of_identifier_of_the_nodes_he_s_linkedto],
> >     in :
> > [ list_of_identifier_of_the_nodes_which_are_linked_to_this_one],
> >     }
> >    we may have 3,5millions of integer in the "in" array.
> >    Is it possible to store it in MongoDB ? I saw that we may use
> > GridFS for heavy documents... should we use it ? Can we do Map/reduce
> > on GridFsFile, can we query it easily and fastly ?

> > Thanks a lot


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.