Help : YCSB Tests with MongoDB 3.0.6 throws up Client view of cluster state is {type=Unknown, servers=[{address=:27017 Error

279 views
Skip to first unread message

prasad2019

unread,
Sep 3, 2015, 8:21:43 PM9/3/15
to mongodb-user
YCSB tar file downloaded from https://github.com/brianfrankcooper/YCSB/releases/download/0.3.0/ycsb-0.3.0.tar.gz   works fine it conencts to Mongo server from Client machine,  though it shows the following warning message

Cluster created with settings {hosts=[X.X.X.X:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
Updating cluster description to  {type=UNKNOWN, servers=[{address=X.X.X.X:27017, type=UNKNOWN, state=CONNECTING}]


It connects to MongoDB Server from YCSB Client and I am getting a good performance with WiredTiger storage engine.

 But I wanted to see better performance using YCSB code from    https://github.com/10gen-labs/YCSB   (Due to Batchsize changed from 1 to 100) 

It compiles with no issues but when I execute workloadA  it throws up the error message 

"Timed out after 10000 ms while waiting for a server that matches AnyServerSelector{}. Client view of cluster state is {type=Unknown, servers=[{address=1:27017, type=Unknown, state=Connecting"


I even HardCoded  MongoDB server IP address so that it can pick IP but inspite of that I get the same error message. 

mongodb/src/main/java/com/yahoo/ycsb/db/MongoDbClient.java

I was checking to see if you guys run into similar issues in the past and how did you overcome.  Any pointers will be greatly appreciated.


Thanks
Prasad


Rob Moore

unread,
Sep 3, 2015, 9:31:17 PM9/3/15
to mongodb-user
Prasad -

I cannot help you with the 10gen-labs repo but the 0.3.0 version of YCSB supports batched inserts via the mongodb.batchSize parameter.

You can see all of the supported options here:

HTH,
Rob.

prasad2019

unread,
Sep 4, 2015, 6:53:29 PM9/4/15
to mongodb-user
Hi Rob,

  Thanks for the pointer, I did used BatchSize parameter for YCSB load tests and results went down.


Documents Size Operations/Sec YCSB Threads AverageLatency(us) 95thPercentileLatency(ms) 99thPercentileLatency(ms)   Remarks
30 Million 61004 64 1036 0 1   Batchsize=Default
30 Million 50119 64 1236 2 3   Batchsize=100

I also tried setting  Read preferred to Primary  "db.getMongo().setReadPref('primaryPreferred')"   and  replaced the MongoDB Driver file from 3.0.2 to latest 3.0.3 

I am following this Blog from Aysa and get my baseline right before I start doing all the YCSB tests.


Since this is not clear this Blog has compression set to "Snappy"  or None,  I tried changing the compression with the below command run into Mongo startup issues 

numactl --interleave=all mongod --storageEngine wiredTiger --wiredTigerJournalCompressor none --journal --config /etc/mongod.conf 


Mongod.log  shows this error message

 

“ log_read: Compressed record with no configured compressor: WT_ERROR: non-specific WiredTiger error”





Thanks
Prasad

Rob Moore

unread,
Sep 5, 2015, 11:34:26 AM9/5/15
to mongodb-user
On Friday, September 4, 2015 at 6:53:29 PM UTC-4, prasad2019 wrote:
  Thanks for the pointer, I did used BatchSize parameter for YCSB load tests and results went down.


Documents Size Operations/Sec YCSB Threads AverageLatency(us) 95thPercentileLatency(ms) 99thPercentileLatency(ms)   Remarks
30 Million 61004 64 1036 0 1   Batchsize=Default
30 Million 50119 64 1236 2 3   Batchsize=100

Can you describe your setup a little bit better. What is the MongoDB configuration? How many servers? Where is the client in relation to the servers? What version of MongoDB? 

On paper batching in a higher latency environment should improve throughput but I never saw the data to prove it. Asya's blog does not talk about using it and the configuration for the feature talks about improving the "load" of the dataset not the run.

Also, to match Asya's performance you need to make sure you shrink the YCSB document size from the default of ~1KB to 151 bytes. In the workload file set/add:

fieldcount=1
fieldlength=100
 
I also tried setting  Read preferred to Primary  "db.getMongo().setReadPref('primaryPreferred')"   and  replaced the MongoDB Driver file from 3.0.2 to latest 3.0.3

FYI - You can set the read preference via the MongoDb URI:

  

“ log_read: Compressed record with no configured compressor: WT_ERROR: non-specific WiredTiger error”


I suspect that switching compression will require you to wipe of the data files.


prasad2019

unread,
Sep 5, 2015, 6:58:28 PM9/5/15
to mongodb-user
Rob,

 I have 4 physical servers each with 28 Cores/256GB RAM and SSD Storage.   MongoDB 3.0.6 with one Primary and two replica set and YCSB  and YCSB client triggered from fourth server.

 Another interesting point I observed with BatchSize=1  the datadisk location around 400MB per second writes and Log around 30 to 40MB close to 12 to 15% CPU Utilization with consistently 60K ops/sec.

Whereas with BatchSize=1 the datadisk location writes around  50 to 450MB per second and Log around 10 to 20 MB CPU 4 to 15% CPU and 15K to 72K ops/sec variation. Bottomline BatchSize with 100 performance 

variation quite lot.

Well regarding YCSB document size 1KB to 151 bytes that's good point I kind of overlooked. Will try to change and check out. 


Finally can you please point how to switch compression and didn't understood wipe out data means drop the default database and change the compression and load it again.



Thanks
Prasad

Asya Kamsky

unread,
Sep 9, 2015, 6:22:00 PM9/9/15
to mongodb-user
Hi Prasad:

As Rob pointed out - you can do batch inserts on either branch - in fact, the 10gen-labs branch was created because YCSB seemed to have gone very dormant as a project, but recently they became very active again, merging pull requests, removing old crufty stuff that hasn't worked, and in general revitalizing the project.   So since Rob has submitted a number of improvements similar to what we created in the labs version, it's probably better to use the master for regular YCSB testing.   Our long term plan is to use our own branch to add some functionality that is missing in YCSB - ability to use data that's more realistic (not random byte streams), using secondary indexes, and queries by secondary indexes, as well as more flexibility in types of reads and writes.

When we originally added batch inserts it was to reduce the time it takes to wipe and repopulate a cluster to then run some sort of mixed workload.  Depending on the configuration of the cluster, batching could gain anywhere from 2x to 6x in my experience.   But how much you gain heavily depends on where the bottleneck is for the single document insert case.  If you're already writing as much as the server can write, then that's your limit - but if the server is underloaded because the back-and-forth on the network is slowing things down, batching can help hugely.

Hope that helps,
Asya


--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/85e254c9-0fd4-49a5-acd0-9c774ecb00ed%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

prasad2019

unread,
Sep 10, 2015, 5:00:13 PM9/10/15
to mongodb-user
Hi  Aysa,

   Thanks I realized that too after couple of tests system CPU was just 15% so plenty of room, it was the IO which was getting maxed out (450MB /Sec Writes) during Data load, To confirm this I am planning to add additional disks to realize the benefit of Batchsize from default 1 to see if this helps in increasing helps for data loading. 

Well now I am encountering checksum error on one of the datafile, Beacuse of this primary MongoDB instance crashes and two replicas are not catching up with the primary.  Even after primary restart hoping with Journaling on, it will correct no need to run "Repair Database" command.  

"2015-09-10T13:03:45.678-0600 E STORAGE  [conn47] WiredTiger (0) [1441911825:678595][11777:0x7f7a6aa06700], file:collection-12--8249752617800779708.wt, cursor.next: read checksum error for 32768B block at offset 24421470208: calculated block checksum of 179228071 doesn't match expected checksum of 2102742815"
"2015-09-10T13:03:45.678-0600 E STORAGE  [conn47] WiredTiger (0) [1441911825:678683][11777:0x7f7a6aa06700], file:collection-12--8249752617800779708.wt, cursor.next: collection-12--8249752617800779708.wt: encountered an illegal file format or internal value"
"2015-09-10T13:03:45.678-0600 E STORAGE  [conn47] WiredTiger (-31804) [1441911825:678705][11777:0x7f7a6aa06700], file:collection-12--8249752617800779708.wt, cursor.next: the process must exit and restart: WT_PANIC: WiredTiger library panic"

Can you please help me how do I recover this block checksum error, is there block repair command that I can use?


Thanks
Prasad

Asya Kamsky

unread,
Sep 11, 2015, 3:41:35 AM9/11/15
to mongodb-user
This strongly suggests an issue with underlying storage layer - if the file on disk gets corrupt there isn't any way journaling can help.

Since this is YCSB data you can just wipe out the directory and repopulate - that would be my suggestion but I would verify the file system first.

Asya


prasad2019

unread,
Sep 14, 2015, 7:41:16 PM9/14/15
to mongodb-user
Hi Aysa,

   Thanks it was issue with XFS file system on RHEL 7.1, yes now I have repopulated with increased dataset from 30 Million to 300 Million documents. Data loading went pretty fine but realized later the replica sets were not able to catch up to the data loading 

rate ( Writing 180 MB/s of data (this time it was Hard Disk with RAID 10 on it and With SSD it was 420 MB/s) with the default oplogSize=1024 MB  ). To sync up  I brought down the mongo instances and on replicaset servers dropped 

the datapath directories.   OplogSize=5GB changed on both primary and replicas with this change replicaset caught up with Primary (Including Index build) for a dataset size of 358GB size in  1 Hour 51 Minutes (Against 3 Hours with OPLOGSIZE=1024 MB even with 

SSD). 

So having a larger Oplogsize made lot of difference,  starting liking these internal operations that explains why MongoDB engine ranking is going up rapidly (http://db-engines.com/en/ranking)


BTW I wanted to validate these YCSB test results with MongoDB how do I do it, any pointers.



Thanks
Prasad




Thanks
Prasad



Thanks
Prasad

Asya Kamsky

unread,
Sep 15, 2015, 12:23:56 PM9/15/15
to mongodb-user
BTW I wanted to validate these YCSB test results with MongoDB how do I do it, any pointers.

One of the problems with YCSB is the data is random and arbitrary - so by validate if you mean checking which/how many documents are updated/correctness of updates - it's not very easy to do that.

This is partly why I recommend creating your own benchmark similar to your actual application requirements - that makes it easier to validate the results...

If you meant something else by validate, please clarify with more details.

Asya


prasad2019

unread,
Sep 16, 2015, 5:05:20 PM9/16/15
to mongodb-user
Asya,

   I meant folks from MongoDB looking at YCSB test results for validation. This helps to make sure we have accurate YSCB test results with best practices of MongoDB implemented. 

  Let me know how to proceed on this.



Thanks
Prasad
Reply all
Reply to author
Forward
0 new messages