Inserting Data into Cassandra using C# Driver

1,217 views
Skip to first unread message

Paul Palumbo

unread,
Aug 13, 2013, 9:52:19 AM8/13/13
to csharp-dr...@lists.datastax.com
How am I supposed to be inserting records into a Cassandra Database using the C# Driver?
 
I'm doing:

for (int i=0; i<25000; ++i) 
       context.GetTable<KittyMovie>().AddNew(new KittyMovie() { Movie = "Serenity"+i.ToString(), Director = "Joss Whedon"+i.ToString(), MainActor = "Nathan Fillion"+i.ToString(), Year = 2005 }.
       EntityTrackingMode.DetachAfterSave);

To insert 25000 records. I'm then doing this to commit this data:
var taskSaveMovies = Task.Factory.FromAsync(context.BeginSaveChangesBatch, context.EndSaveChangesBatch, TableType.Standard, ConsistencyLevel.Default, null);
taskSaveMovies.Wait();
 
How is this different than:
context.SaveChanges(SaveChangesMode.Batch);
 
Also, in either of these cases, after I do this large Insert and then attempt to do another insert into another table using the same context, it seems to take a long time even when I'm only doing one INSERT. When I trace the code, it seems my previous inserts are being cached and reexamined when I do these new inserts. In particular, the MutationTracker::_table list seems to contain my previous INSERTS and the AppendChangesToBatch operation takes a long time reexamining this list.
 
When I change to using this, these long insert times on FUTURE inserts aren't evident but then THIS save operation takes a long time:
context.SaveChanges(SaveChangesMode.OneByOne);
 
 
The issue seems to have something to do with EntityTrackingMode.DetachAfterSave not being used when doing a Batch operation but is being used in a OneByOne operation (in MutationTracker::SaveChangesOneByOne). Do I understanding this correctly?
 
Can somebody explain the best way for me to insert many records?
 
Paul P.

 

 

 

 

 

 

 

 

 

Paweł Kapłański

unread,
Aug 18, 2013, 4:38:42 PM8/18/13
to csharp-dr...@lists.datastax.com
Hi Paul,

- When inserting large set of data I recommend to use SaveChangesMode.Batch. In this mode all changes are sent to the cassandra node with single query. If you do it with the OneByOne mode, the driver executes the query for each insert separatelly. In can have some benefits when dealing with large number of nodes (then due to the RoundRobin over the nodes) each query is performed by different cassandra node.

- context.SaveChanges is makeing Begin+End+Wait operations, so there is no difference with Task.Factory.FromAsync+Wait

- I belive that you found a bug "EntityTrackingMode.DetachAfterSave not being used when doing a Batch operation but is being used in a OneByOne operation (in MutationTracker::SaveChangesOneByOne)." - please report it  to https://datastax-oss.atlassian.net/browse/CSHARP - It seems to be easy to fix it.

- You can also check another approach to data manipulation (withiout the need of using context) - try Session.CreateBatch();

Regards
Pawel
Reply all
Reply to author
Forward
0 new messages