Stale Data vs numloadthreads

0 views
Skip to first unread message

Sayat Satybaldiev

unread,
Nov 25, 2013, 3:50:43 AM11/25/13
to BG-Social...@googlegroups.com
During my benchmarking MongoDB I found that BG output produces produces a lot of stale data even thought I specify writeConcern strict for Mongo. After a bunch of trial and errors I found that numloadthreads parameter mainly responsible for producing stale data for workloads. I'm wondering if such correlation in fact might exists or I'm doing something wrong with benchmarking?

I'm using the latest BGClient and MongoDB 2.4.8

In the first experiment I set numloadthreads to 10 and the second to 1. I've recreated the schema and populate data for each run. 

This is my main params:

THREAD_COUNT=10

INSERT_IMAGE=true

MAX_EXEC_TIME=600 # 200 for the first one 600 for the second run

USER_COUNT=10000

RES_PER_USER=100

FRIEND_PER_USER=100


PopulateData(){
    java -cp $BG_HOME/build/bg.jar:$BG_HOME/db/MongoDB/lib/* edu.usc.bg.BGMainClass \
    onetime -load -db mongoDB.MongoDbClient \
    -p mongodb.url=$HOST_IP:27017 -p insertimage=$INSERT_IMAGE \
    -p mongodb.writeConcern=strict -p mongodb.database=$DB_NAME \
    -p usercount=$USER_COUNT -p useroffset=0 -p confperc=1 \
    -p resourcecountperuser=$RES_PER_USER -p friendcountperuser=$FRIEND_PER_USER \
    -p numloadthreads=1 \ # I change only this param in the next experiments to 10
    -P $BG_HOME/workloads/populateDB


Workload(){
    java -cp $BG_HOME/build/bg.jar:$BG_HOME/db/MongoDB/lib/* edu.usc.bg.BGMainClass \
    onetime -t -db mongoDB.MongoDbClient \
    -P $BG_HOME/workloads/SymmetricHighUpdateActions -s \
    -p mongodb.url=$HOST_IP:27017 -p threadcount=$THREAD_COUNT -p mongodb.writeConcern=strict \
    -p mongodb.database=$DB_NAME -p exportfile=export_file.log  \
    -p maxexecutiontime=$MAX_EXEC_TIME -p initapproach=deterministic  \
    -p usercount=$USER_COUNT -p useroffset=0 -p confperc=1 \
    -p resourcecountperuser=$RES_PER_USER -p friendcountperuser=$FRIEND_PER_USER -p numloadthreads=1 \ #also changed this param to 10
    -P $BG_HOME/workloads/populateDB

Results of the First run:

        -- 10 secs: Reads are still being validated... NumReadOpsProcessed till now:235833 -- 10 secs: Reads are still being validated... NumPruned till now:222783 -- Done reading read files... -- ReadValidationDuration(ms):10002 [0, 20000] =96.18736383442265% [20000, 40000] =96.06322117715504% [40000, 60000] =96.08162058258169% [60000, 80000] =97.06757331066723% [80000, 100000] =98.05886036318097% [100000, 120000] =97.87045252883763% [120000, 140000] =97.60956175298804% [140000, 160000] =98.11946902654867% [160000, 180000] =97.40259740259741% [180000, 200000] =96.36363636363636% 0.0% of reads observed the value of updates before 1300.0 milliseconds from the completion of the update TotalReadOps = 458616 ,staleReadOps=30729 ,staleness Perc (gran:user)=0.06700376785807735


Second Run:

       -- 10 secs: Reads are still being validated... NumReadOpsProcessed till now:575422

       -- 10 secs: Reads are still being validated... NumPruned till now:264130

       -- 20 secs: Reads are still being validated... NumReadOpsProcessed till now:905851

       -- 20 secs: Reads are still being validated... NumPruned till now:418016

       -- Done reading read files...

       -- ReadValidationDuration(ms):20003

[0, 60000] =100.0%

[60000, 120000] =100.0%

[120000, 180000] =100.0%

[180000, 240000] =100.0%

[240000, 300000] =100.0%

[300000, 360000] =100.0%

[360000, 420000] =100.0%

[420000, 480000] =100.0%

[480000, 540000] =100.0%

[540000, 600000] =100.0%

0.0% of reads observed the value of updates before 1300.0 milliseconds from the completion of the update

        TotalReadOps = 1323867 ,staleReadOps=0 ,staleness Perc (gran:user)=0.0


Sumita Barahmand

unread,
Nov 25, 2013, 5:59:02 PM11/25/13
to BG-Social...@googlegroups.com
Hi,

The parameter name for the threadcount in the load phase is is "threadcount" not "numloadthreads".
So for the load phase use -p threadcount=x and in the benchmark phase use -p numloadthreads=x -p threadcount=y where y is the number of threads emulating actions against the data store,

Also make sure you use -loadindex instead of -load so the index structures will be created.
Sumita


--
You received this message because you are subscribed to the Google Groups "BG Social Benchmark" group.
To unsubscribe from this group and stop receiving emails from it, send an email to BG-SocialBenchm...@googlegroups.com.
Visit this group at http://groups.google.com/group/BG-SocialBenchmark.
For more options, visit https://groups.google.com/groups/opt_out.

Sayat Satybaldiyev

unread,
Nov 25, 2013, 6:03:57 PM11/25/13
to BG-Social...@googlegroups.com
Hi Sumita,

Thank you for reply! That's tricky, I didn't know that. I'll retry the experiments with this new changes.
--
Best Regards,
Sayat Satybaldiyev


Sayat Satybaldiev

unread,
Nov 26, 2013, 1:36:55 AM11/26/13
to BG-Social...@googlegroups.com
Hi Sumita,

Thanks again for your help! I misinterpreted "threadcount" as "numloadthreads" during the workload was causing a stale data.
To unsubscribe from this group and stop receiving emails from it, send an email to BG-SocialBenchmark+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "BG Social Benchmark" group.
To unsubscribe from this group and stop receiving emails from it, send an email to BG-SocialBenchmark+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages