BulkInsert on a ShardedDocumentStore - can't get it working...

71 views
Skip to first unread message

David Callander

unread,
Aug 28, 2015, 1:16:43 AM8/28/15
to RavenDB - 2nd generation document database
I can't get a BulkInsert on a ShardedDocumentStore working...  you'd think it should be simple, but my code compiles, then when run it complains:

Unhandled Exception: System.NotSupportedException: Sharded document store doesn't have a database commands. you need to explicitly use the shard instances to get access to the database commands
   at Raven.Client.Shard.ShardedDocumentStore.get_AsyncDatabaseCommands() in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Shard\ShardedDocumentStore.cs:line 113
   at Raven.Client.Document.BulkInsertOperation..ctor(String database, IDocumentStore documentStore, DocumentSessionListeners listeners, BulkInsertOptions options, IDatabaseChanges changes) in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Document\BulkInsertOperation.cs:line 55
   at Raven.Client.Shard.ShardedDocumentStore.BulkInsert(String database, BulkInsertOptions options) in c:\Builds\RavenDB-Stable-3.0\Raven.Client.Lightweight\Shard\ShardedDocumentStore.cs:line 293

Is there some bug I'm not aware of (despite extensive searching), or is there something else specific I should be doing here to try and use BulkInsert on a set of sharded databases?

I'm using VS2013, RavenDB 3690 (was using 2.5 on .net4 and got the same error, so copied my code into a new VS2013 install with the latest client dlls etc)

I can get sharding working, using a standard insert e.g. 
... define sharding stuff
var _documentStore = new ShardedDocumentStore(shardStrategy);
var _session = _documentStore.OpenSession(_dbName);
... for loop ...
_session.Store(LucFile);

I can get BulkInsert working, with no sharding, e.g.
_documentStore = new DocumentStore() With {.Url = "http://dev04-server:4053/"};
var _bulk = _documentStore.BulkInsert(_dbName);
... for loop ...
_bulk.Store(LucFile); 
 
But putting the two together, as follows, doesn't work as outlined initially...
... define sharding stuff
var _documentStore = new ShardedDocumentStore(shardStrategy);
 var _bulk = _documentStore.BulkInsert(_dbName);
... for loop ...
_bulk.Store(LucFile);
 
Is there some bug I'm not aware of (despite extensive searching), or is there something else specific I should be doing here to try and use BulkInsert on a set of sharded databases?

p.s. using standard insert isn't an acceptable option - I've got around 300 million documents, and using the standard insert shows up a memory problem after about 50-100k inserts, which can only be worked around by tearing down and re-initialising the documentStore, resulting in terrible performance (well under 1k rows/sec, even with no indexes).

Oren Eini (Ayende Rahien)

unread,
Aug 28, 2015, 5:58:10 AM8/28/15
to ravendb
Please use the latest build (3780), we have an implementation for shared bulk insert there.



Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Callander

unread,
Aug 30, 2015, 8:11:12 PM8/30/15
to RavenDB - 2nd generation document database
OK, so with (3780) I can get code to compile and run, but nothing ends up in the databases...  here's my insert loop:

                using (var _session = _documentStore.ShardedBulkInsert(_dbName))
                {
                    String line = reader.ReadLine();
                    do
                    {
                        lucFile = JObject.Parse(line);
                        _session.Store(lucFile);
                        completed += 1;
                        if ((completed % 25000) == 0)
                        {
                            Console.WriteLine("Completed {0}, in {1} => {2} rows/second", completed, sw.Elapsed, completed / sw.Elapsed.TotalSeconds);
                        }
                        line = reader.ReadLine();
                    } while (line != "" && line != null);
                    //_session.SaveChanges(); // doesn't exist for ShardedBulkInsert
                    Console.WriteLine("Completed {0}, in {1} => {2} rows/second", completed, sw.Elapsed, completed / sw.Elapsed.TotalSeconds);
                }

As noted above in the code, there is no SaveChanges() method for ShardedBulkInsert - when doing a BulkInsert into a non-sharded DB,I'd always call SaveChanges at the end of my loop to make sure the data is all saved...  is that the problem, or...?

Grisha Kotler

unread,
Aug 31, 2015, 4:48:22 AM8/31/15
to rav...@googlegroups.com
There is no SaveChanges in the non-sharded bulk insert.

What is your shard strategy?


Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Grisha Kotler l RavenDB Core Team Developer Mobile: +972-54-586-8647

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/

Oren Eini (Ayende Rahien)

unread,
Aug 31, 2015, 7:25:23 AM8/31/15
to ravendb
Also, can you send the SAZ output from Fiddler? 

Oren Eini (Ayende Rahien)

unread,
Aug 31, 2015, 7:52:41 AM8/31/15
to ravendb
Note that there is a test for it that shows how it works:

David Callander

unread,
Sep 2, 2015, 10:07:47 PM9/2/15
to RavenDB - 2nd generation document database
Seems it all went into the System database, not the one I specified using ShardedBulkInsert(_dbName)

I haven't tried setting the default database, and re-running, but it seems a bug to me that it ignored the explicit _dbName I passed in...

Grisha Kotler

unread,
Sep 3, 2015, 10:09:22 AM9/3/15
to rav...@googlegroups.com
http://issues.hibernatingrhinos.com/issue/RavenDB-3860

As a temporary workaround, you need to specify the full url for each document store:

var shards = new Dictionary<string, IDocumentStore>
{
{"Shard1", new DocumentStore {Url = SERVER1_URL + "databases/" + DATABASE_NAME_ON_SHARD1}},
{"Shard2", new DocumentStore {Url = SERVER2_URL + "databases/" + DATABASE_NAME_ON_SHARD2}}
};


Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Grisha Kotler l RavenDB Core Team Developer Mobile: +972-54-586-8647

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/


--
Reply all
Reply to author
Forward
0 new messages