cfq Tunables

Zach (malaroth)

unread,

Apr 5, 2013, 11:27:08 PM4/5/13

to

So... tentatively I've got some numbers on cfq. I'll go back and run default numbers through Christensen io app but these are the io numbers I'm getting with just basic tweaks to start with

Screenshot_2013-02-20-07-29-48.png

Zach (malaroth)

unread,

Feb 20, 2013, 9:17:48 AM2/20/13

to francos...@googlegroups.com

Scratch that back to the drawing board

Message has been deleted

Zach (malaroth)

unread,

Feb 22, 2013, 4:54:27 PM2/22/13

to francos...@googlegroups.com

Cfq is gonna have to wait awhile. I'm getting systemui closes, reboot and frozen screen right after reboot so I'm leaving it be for now.

franciscofranco1990

unread,

Feb 22, 2013, 7:52:34 PM2/22/13

to francos...@googlegroups.com

That doesn't seem to be caused by the IO scheduler.

Zach (malaroth)

unread,

Feb 22, 2013, 8:04:27 PM2/22/13

to francos...@googlegroups.com

That's strange because I haven't changed anything except to tweak cfp

Zach (malaroth)

unread,

Mar 2, 2013, 8:33:36 PM3/2/13

to francos...@googlegroups.com

http://www.linux-mag.com/id/7572/
Cfq explained

Chris (osm0sis)

unread,

Mar 3, 2013, 7:48:23 AM3/3/13

to francos...@googlegroups.com

Oh boy oh boy. So are we going for cfq over bfq? I seem to recall Francisco patching bfq for flash media way back and I definitely preferred it for awhile until Francisco went back and fixed deadline. On the other hand people say it's got too much overhead. cfq has all the same tunables though, so is it also in the same boat?

Zach (malaroth)

unread,

Mar 3, 2013, 9:46:32 AM3/3/13

to francos...@googlegroups.com

Bfq is a variation on cfq. I had this in mind when I posted the other day about HDD rpm and wear and tear. I didn't quite realize that I was leading into this but it's on the same page. We don't have the penalties with flash storage in seeking that we do in older drives so my question is: are the values still applicable to the function performed and can we gain anything by nudging them glum position? Or are they superfluous and this one set of values tied to the allocation equations has nothing in common with something else that might work better. Cfq and bfq should perform the same with only the "slices" being a different portion

Chris (osm0sis)

unread,

Mar 3, 2013, 8:26:34 PM3/3/13

to

I went back and looked at them again and bfq has a number of other variables over cfq that I guess are why many call it bloated with too much overhead. I wonder if Francisco could make the same tweaks he did to bfq to cfq. That would be a good place to start with it if so. :)

I also noticed between 4.2.1 and 4.2.2 some of the default cfq settings changed, and actually became closer to the bfq versions of each.

slice_sync went from 100 to 93, slice_async from 40 to 39, slice_idle from 10 to 7, and fifo_expire_sync from 120 to 125.

I also just realized we don't have bfq on the N7 anymore, so might as well stick with cfq since it's on them all as Google's default and as such I think has the most potential as well.

Zach (malaroth)

unread,

Mar 4, 2013, 7:35:50 AM3/4/13

to francos...@googlegroups.com

I didn't realize bfq wasn't on grouper but I figured if we started with cfq then bfq would be simple. So... y'all have been busy while I slept. Gotta catch up. At work right now but I'm uploading the new defaults to my sig, 950io only, I straight copied your script and deleted mine. And I'd lowered sleep on my script because it doesn't take 25 sec for my phone to boot up LOL so a sleep 25 I'm sitting there waiting. I may on some occasions get 3 full revolutions of Franco boot ani but normally it's 2

Chris (osm0sis)

unread,

Mar 4, 2013, 2:33:51 PM3/4/13

to francos...@googlegroups.com

init.d happens in the background so shouldn't increase the boot time itself. 25 is required for grouper.

Chris (osm0sis)

unread,

Mar 5, 2013, 5:03:28 PM3/5/13

to

bfq is apparently better than cfq with the way it handles things, and both are criticized for overhead.

Here's what I dug up on xda so far to try, note these are a concensus of various scripts out there to give us some options, not a script itself. ;)

#general
echo 0 > /sys/block/mmcblk0/queue/rotational; #done
echo 1 > /sys/block/mmcblk0/queue/rq_affinity; #done

echo 0 > /sys/block/mmcblk0/queue/iostats; #disable iostats to reduce overhead - idea by kodos96

echo 256 > /sys/block/mmcblk0/queue/read_ahead_kb; # Optimize for readahead

echo 0 > /sys/block/mmcblk0/queue/nomerges; #merges are good
 
echo 2048 > /sys/block/mmcblk0/queue/nr_requests;
echo 1024 > /sys/block/mmcblk0/queue/nr_requests; # for starters: keep it sane
echo 512 > /sys/block/mmcblk0/queue/nr_requests;
echo 128 > /sys/block/mmcblk0/queue/nr_requests; #default

#deadline
echo 4 > /sys/block/mmcblk0/queue/iosched/fifo_batch;
echo 4 > /sys/block/mmcblk0/queue/iosched/writes_starved;

#cfq
echo 16 > /sys/block/mmcblk0/queue/iosched/quantum;
echo 8 > /sys/block/mmcblk0/queue/iosched/quantum; #default

echo 1000000000 > /sys/block/mmcblk0/queue/iosched/back_seek_max;
echo 0 > /sys/block/mmcblk0/queue/iosched/back_seek_max;
echo 16384 > /sys/block/mmcblk0/queue/iosched/back_seek_max; #default

echo 1 > /sys/block/mmcblk0/queue/iosched/back_seek_penalty; #removed the penalty for back-seeks
echo 2 > /sys/block/mmcblk0/queue/iosched/back_seek_penalty; #default

echo 3 > /sys/block/mmcblk0/queue/iosched/slice_idle; #Lowered the idle wait
echo 1 > /sys/block/mmcblk0/queue/iosched/slice_idle;
echo 0 > /sys/block/mmcblk0/queue/iosched/slice_idle;
echo 7 > /sys/block/mmcblk0/queue/iosched/slice_idle; #default

echo 1 > /sys/block/mmcblk0/queue/iosched/low_latency; #default, low latency mode of cfq

Someone in the f.K N4 thread had slice_idle 0 back_seek_max 0 and back_seek_penalty 1 and said it felt good. But I don't entirely understand how removing the penalty and then telling it that it can't seek accomplishes something useful. Increasing the max makes more sense since flash storage has no limit to head distance.

Edit: UNLESS 0 basically = infinity. That would be handy and exactly what we want for back_seek_max. From reading Zach's link above, back_seek_penalty 1 does remove the penalty and is a definite go.

Lowering the idle wait before switching from an empty sync queue sounds smart too, but again 0 doesn't make much sense to me, you'd think some wait would still be good logically so that you aren't jumping around on something that requires synchronous requests. 1 also sounds low, so my money's on 3 for now, or maybe splitting the difference at 2.

Higher sounds better for quantum to avoid the tubes getting clogged but there most scripts I saw don't touch that. From the documentation it sounds like raising quantum might also require a higher slice_async_rq to make sure async doesn't get left behind. No idea what would be smart for these.

Chris (osm0sis)

unread,

Mar 5, 2013, 4:55:38 PM3/5/13

to

I'm wondering is if we should reenable merges since cfq operates with a "dual elevator" of synchronous and asynchronous request queues being switched between, and merges would help keep those relevant. I am also sort of wondering if the rq_affinity and nomerges tweaks are exacerbating the Play Store lag problem. Hopefully one of those guys in the GN thread can provide some useful feedback on this.

Zach (malaroth)

unread,

Mar 5, 2013, 8:09:48 AM3/5/13

to francos...@googlegroups.com

I like having the single script... Chris you're great at creating scripts... what about an # "scheduler" = "cfq, bfq" set nomerges 0. Same with rq_affinity?

Chris (osm0sis)

unread,

Mar 5, 2013, 4:50:45 PM3/5/13

to

Zadeis is indicating it might be our nomerges policy that is affecting the Play Store.. so he likes nomerges 1 which enables simple merges only; that or he just still had his read_ahead_kb to 1024. ;)

Here is some good documentation: http://www.mjmwired.net/kernel/Documentation/block/queue-sysfs.txt

nomerges (RW), nr_requests (RW) and rq_affinity (RW), general + group_idle

Edit: More =

http://www.mjmwired.net/kernel/Documentation/block/cfq-iosched.txt

https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/ch06s04.html

http://lwn.net/Articles/395769/

Still trying to find something that tells me what back_seek_max 0 will do though. The redhat guide gives some nice stuff for SSD which should also apply to Flash. :)

Chris (osm0sis)

unread,

Mar 5, 2013, 5:18:47 PM3/5/13

to

Okay, here are some based on the above reading. I've tested them on both devices and they seem good to get the ball rolling with:

# cfq tweaks
echo 64 > /sys/block/mmcblk0/queue/iosched/quantum;
echo 120 > /sys/block/mmcblk0/queue/iosched/fifo_expire_sync;
echo 250 > /sys/block/mmcblk0/queue/iosched/fifo_expire_async;
echo 1000000000 > /sys/block/mmcblk0/queue/iosched/back_seek_max;
echo 1 > /sys/block/mmcblk0/queue/iosched/back_seek_penalty;
echo 93 > /sys/block/mmcblk0/queue/iosched/slice_sync;
echo 39 > /sys/block/mmcblk0/queue/iosched/slice_async;
echo 16 > /sys/block/mmcblk0/queue/iosched/slice_async_rq;
echo 0 > /sys/block/mmcblk0/queue/iosched/slice_idle;
echo 10 > /sys/block/mmcblk0/queue/iosched/group_idle;
echo 1 > /sys/block/mmcblk0/queue/iosched/low_latency;

There are some differences of scale between the devices again so some of these resolve to outcome values (120N7 -> 125GN; 93GN -> 100N7; 39GN -> 40N7; 10N7 -> 15GN). I'm still not sure about 0 vs. 1000000000 for bsp, not sure if it's just placebo or not at this point.

I messed with nomerges and some of the other general tweaks just now too using deadline as a guide:

echo 1024 > /sys/block/mmcblk0/queue/nr_requests;
echo 512 > /sys/block/mmcblk0/queue/read_ahead_kb;
echo 2 > /sys/block/mmcblk0/queue/rq_affinity;
echo 1 > /sys/block/mmcblk0/queue/nomerges;
echo 0 > /sys/block/mmcblk0/queue/iostats;

I went with nomerges 1 since it will allow some merges. My launcher seems to like it too but it requires more testing from everyone, I'm sure. :)

Zach (malaroth)

unread,

Mar 5, 2013, 7:26:16 PM3/5/13

to francos...@googlegroups.com

Well the extremely high back seek penalty basically means it won't penalize until it reaches that far of a seek.and with flash memory there's no distance to seek so no penalty... gotta configure this new rom for init.d scripts but then I'll be back testing

Chris (osm0sis)

unread,

Mar 5, 2013, 9:32:22 PM3/5/13

to

Yup, if 0 meant infinity then that would be ideal, but setting it extremely high should be functionally the same.

I found the upper limit to back_seek_max is actually 2147483647. (2 × 1024³) - 1. So if 0 doesn't work that way we've got as close to infinity as we can get.

I've also lowered slice_async_rq to 2.

Steve (Gingerbread Man)

unread,

Mar 6, 2013, 8:44:48 AM3/6/13

to francos...@googlegroups.com

So just to be sure we are all singing off the same hymn sheet this is the correct values for CFQ?

# cfq tweaks
echo 64 > /sys/block/mmcblk0/queue/iosched/quantum;
echo 120 > /sys/block/mmcblk0/queue/iosched/fifo_expire_sync;
echo 250 > /sys/block/mmcblk0/queue/iosched/fifo_expire_async;
echo 1000000000 > /sys/block/mmcblk0/queue/iosched/back_seek_max;
echo 1 > /sys/block/mmcblk0/queue/iosched/back_seek_penalty;
echo 93 > /sys/block/mmcblk0/queue/iosched/slice_sync;
echo 39 > /sys/block/mmcblk0/queue/iosched/slice_async;

echo 2 > /sys/block/mmcblk0/queue/iosched/slice_async_rq;

Chris (osm0sis)

unread,

Mar 6, 2013, 2:23:10 PM3/6/13

to francos...@googlegroups.com

Almost. 1000000000 to 2147483647, and change the general tweaks at the top of the script to the ones I posted. :)

Chris (osm0sis)

unread,

Mar 14, 2013, 7:16:31 PM3/14/13

to

I'll post this here too so we don't get too fragmented. Basically what we need to get started is the following variables worked out with the GN:

nr_requests (1024, 512, 256, 128),
back_seek_max (2147483647, 1000000000, 0),
group_idle (15, 7, 0),
and quantum (64, 32, 16, 8).

Start with the script. Then please work on one of those variables at a time to try and decide which value you find is the best. I went with my initial feeling on nr_requests (1024 felt best to me with some quick testing), but I'm really not sure about the other 3 tunables.

I'm also interested to know if disabling iostats has any effect on I/O benchmarking apps/results. ;)

Edit: Reuploaded the script on 3/7/13 thanks to Khrushy and the guys in the xda thread for noticing nomerges was still being overwritten by Franco's setting. Also added rotational 0 just so we're sure that's set correctly on all ROMs.

Edit 2: Removed script.

Zach (malaroth)

unread,

Mar 8, 2013, 6:40:46 PM3/8/13

to francos...@googlegroups.com

I'll grab quantum though you'll have to let me know about variables on the n7

Chris (osm0sis)

unread,

Mar 8, 2013, 7:01:33 PM3/8/13

to francos...@googlegroups.com

group_idle is the only one that would be different. It works just like read_idle did with row. So once we figure out what's good on the GN we can just see which neighboring value works best and find a common value that resolves to both. :)

Zach (malaroth)

unread,

Mar 8, 2013, 7:50:46 PM3/8/13

to francos...@googlegroups.com

Running group_idle 23 so far so good. I'll run this for a day or two then bump down to 15 then 7 then try 31 for S&G's

Steve (Gingerbread Man)

unread,

Mar 9, 2013, 5:11:26 PM3/9/13

to francos...@googlegroups.com

From what I can tell with my current testing cfq doesn't feel any faster or slower than row. I know that's not really particularly helpful but just a summery of my findings of my normal / general useage patterns

Zach (malaroth)

unread,

Mar 9, 2013, 8:42:59 PM3/9/13

to francos...@googlegroups.com

What we need though is for you to run normal cfq for a bit... then see if the new numbers are faster or slower than default cfq. Where you prefer row, others prefer cfq

Chris (osm0sis)

unread,

Mar 9, 2013, 9:42:10 PM3/9/13

to

Yeah upping the quantum might have been a shitty idea. I went back to defaults and it was faster. I think those general queue tweaks I added really improved the defaults even. And try bfq, it fucking flies!

Anyway, I agree. We should start from defaults again and examine each of the logical ideas I had for the tunables so far.

Chris (osm0sis)

unread,

Mar 10, 2013, 1:30:44 PM3/10/13

to

group idle is definitely also part of what was slowing it down. Back to 0 for that for sure. Having a hard time seeing a difference for most of the others though.

Chris (osm0sis)

unread,

Mar 10, 2013, 1:27:31 PM3/10/13

to

group idle at 0 with slice idle 0 still seems to be a problem, I'm back at slice idle 7 and it seems slightly better again.

I lowered my max and played around with nr_requests and a few other things. Whar I got from it was nr_requests should either be 1024 or maybe 4096, but I'm leaning toward 1024 still, and that back_seek_max actually performs better at 1000000000 than 0 or 16384 or 2147483647 or even 1073741824. Simpler calculations perhaps?

So right now I'm at 8 / 125 / 250 / 1000000000 / 1 / 93 / 39 / 2 / 7 / 0 / 1

Steve (Gingerbread Man)

unread,

Mar 10, 2013, 4:15:33 PM3/10/13

to francos...@googlegroups.com

This set of values feels more smooth for launcher behaviour than the previous, if not long loaded this set to my n7. Thanks Chris

Zach (malaroth)

unread,

Mar 11, 2013, 11:30:15 AM3/11/13

to francos...@googlegroups.com

I haven't "felt" changes so much as I did with deadline and row... so I've been running benchmark and tuning. Here's consistent results

Screenshot_2013-03-11-10-21-12.png

Screenshot_2013-03-11-10-20-50.png

Zach (malaroth)

unread,

Mar 14, 2013, 3:13:58 PM3/14/13

to francos...@googlegroups.com

Other than benchmarks I'm not feeling any significant change drastic enough to say --THESE values are winner--

Chris (osm0sis)

unread,

Mar 14, 2013, 7:17:57 PM3/14/13

to

I agree. Benchmarks don't offer a lot of direction here either. At least not with the ones we're testing in the GN thread. All the results are within the range of normal variability if you run the test over again multiple times. You'd probably have to run it 100 or 1000 times and take the averages to know anything. But honestly it's probably not worth it since if the averages are likely as close as they seem they'd be the gains from tweaking will be minor as well. :/

I've got 950iosettings hosted here now so we don't have to keep hunting around here for an attachment: http://d-h.st/users/osm0sis

Chris (osm0sis)

unread,

Mar 21, 2013, 2:21:32 AM3/21/13

to francos...@googlegroups.com

pkg really liked 16gb for bsm on his 32gb GN, I didn't like 8gb on my 16gb GN over something larger and pkg didn't find anything larger to be as good as 16gb, so we split the difference at 12gb and we both seem to agree.

12582912 for bsm ties up all the easy ones, so now on to phase two to see if there's anything we can improve with the sync and async values. :)

Message has been deleted

Chris (osm0sis)

unread,

Mar 23, 2013, 4:57:05 PM3/23/13

to francos...@googlegroups.com

The latest for testing from the GN thread:

On Thursday, March 21, 2013 11:46 PM UTC-3, Khrushy wrote:

I actually just did that with slice_sync/async and there's definitely something to be found in this one.
Ballpark halved both values to slice_sync = 54 and slice_async = 23 and saw about a 3% improvement in I/O. More importantly though, it was damn consistent with it, and the low values for both read/write were consistently better than the low values using the previous settings.
I don't have time now to push this any further, so I'm going to run it for the rest of the day with these settings to see how it feels in use, then take a look at it again later to see if there's a sweeter point than just "halve it and see" :p
Edit:
Oh, and I found this on IO Schedulers - has info on Deadline, Anticipatory and CFQ - some interesting reading that's actually 99% plain english if people want to check it out.
https://www.dropbox.com/s/g0ua5u3wx2x6vqs/IO%20Schedulers.pdf

Paul (pkgnex)

unread,

Mar 24, 2013, 6:15:01 PM3/24/13

to francos...@googlegroups.com

I tried halving the "slice" values, except I forgot which ones they were and halved both the "fifo_expire" values instead. When I did this, my phone ran like a raped ape (fast).

Realizing I halved the wrong parameters, I also halved the "slice" values, and now my phone runs like crap - really choppy. I'll try undoing the "fifo_expire" changes and will report back. Maybe just halving one or the other is the answer, or maybe there is an interrelation that is important between them...

Stu (Khrushy)

unread,

Mar 24, 2013, 7:47:02 PM3/24/13

to

I'm no longer totally sure on the slice values. I like the "feel" of slightly lower slice_sync/async (70/31), but the difference is so minor it could just be confirmation bias (I'm seriously considering underclocking the phone while testing because everything is so smooth at stock settings that it's almost impossible to notice any differences).

In testing over the weekend I found that as the free space on my SD card decreased, using the stock values performed better and the lower sync/async values performed worse - but the difference here is tiny.

Following on from Paul's fifo values - I concur that halving both the slice and fifo values together was worse. Halving just the fifo values made no discernible difference to me. I'm currently running stock slice and fifo - depending on how busy the market is today I'm going to progressively test each one singly and work my way from there.

Has anyone seen any documentation on exactly how fifo expire works? I mean the name is pretty self explanatory, but I'd like to be certain.

Chris (osm0sis)

unread,

Mar 24, 2013, 8:43:12 PM3/24/13

to francos...@googlegroups.com

I posted a couple of links farther up here that might help. Halving the fifo ones did feel faster to me. Since we've all been running a lot of benchmarks it would help to run Lag Fix (fstrim) to make sure we're getting good results.

Paul (pkgnex)

unread,

Mar 24, 2013, 10:21:31 PM3/24/13

to francos...@googlegroups.com

Here's a good CFQ read:

http://www.linux-mag.com/id/7572/

At least the first page is awesome. 2nd page is meh... I had higher hopes after the great descriptions. Maybe one of you guys posted it and that's how I found it.

Halving the fifo's is definitely not placebo, I'll try khrushy's slice values (I had been going significantly lower than that when my phone started acting all choppy before.

Going to be pretty busy tomorrow, and it looks like we're in the middle of an update blitz anyway (with Franco working to figure out some IO issue he thinks he induced), so I may wait a day or two for things to settle before going craze on the CFQ tuning again.

Stu (Khrushy)

unread,

Mar 24, 2013, 10:28:55 PM3/24/13

to francos...@googlegroups.com

I read the links from Chris a few posts up, I'll check out that one now too. Cheers man.

I just found out my lead developer is off for 3 days, so I'm going to have some free time for a few days it appears.

Stu (Khrushy)

unread,

Mar 25, 2013, 12:41:25 AM3/25/13

to

So PK's latest settings are this:

quantum = 8
fifo_expire_sync = 62
fifo_expire_async = 125
back_seek_max = 12582912
back_seek_penalty = 1
slice_sync = 70
slice_async = 31
slice_async_rq = 2
slice_idle = 0
group_idle = 7
low_latency = 1

For comparison - my testing today has yielded the following - we both seem to be going in the same direction on a lot of values:

quantum = 8
fifo_expire_sync = 101
fifo_expire_async = 125
back_seek_max = 12582912
back_seek_penalty = 1
slice_sync = 93
slice_async = 54
slice_async_rq = 2
slice_idle = 0
group_idle = 0
low_latency = 1

I found a much bigger difference in lowering expire_async than expire_sync, and after playing with those values, I ended up putting the slice_sync/async back to previous settings of 93/54 - minimal change in throughput, but it just seemed more consistent.

Paul (pkgnex)

unread,

Mar 25, 2013, 7:59:22 AM3/25/13

to francos...@googlegroups.com

Since we've really got it down to the last 4 settings we're playing with (fifo_expires and slice_sync/async), is anyone game for a small designed experiment to at least capture the benchmark data semi-scientifically?

I'll need two or three volunteers to do some runs with various combinations of values for the 4 items were still working on, in your benchmark tool of choice (just needs to be an i/o test!).

I can design the experimental array and do the post-processing. We'll need to all agree on what values to try - I'd suggest stock, half stock, and 1.5*stock, but half stock on slice_async has me worried. Maybe we can do stock and +/- 1/3?

Let me know if anyone else thinks this would be useful and if you have an evening this week to volunteer and send me some numbers.

Chris (osm0sis)

unread,

Mar 25, 2013, 8:00:26 AM3/25/13

to francos...@googlegroups.com

Awesome. I'm in.

Zach (malaroth)

unread,

Mar 25, 2013, 8:32:34 AM3/25/13

to francos...@googlegroups.com

I'm game. Let me know what settings I need to half and if vmcache/mount/lmk/cron/zipalign will interfere and I'll take them out

Stu (Khrushy)

unread,

Mar 25, 2013, 8:53:26 AM3/25/13

to francos...@googlegroups.com

Let's just do -1/3 +1/3 on the slice values to start with. I think you and I both tried half on them and decided it was too much right PK?

Chris (osm0sis)

unread,

Mar 25, 2013, 9:34:08 AM3/25/13

to francos...@googlegroups.com

Zach you should probably disable those extra things so you aren't creating any intervening variables. We'll get to vm in good time, no need to toy too hard with it now and cloud our cfq progress.

Stu (Khrushy)

unread,

Mar 25, 2013, 10:15:53 AM3/25/13

to francos...@googlegroups.com

So, everyone goes back to r372 stock VM settings while testing?

What about other non-kernel set variables?

Paul (pkgnex)

unread,

Mar 25, 2013, 10:27:22 AM3/25/13

to francos...@googlegroups.com

I think we all need to be on 372. You can leave your personal vm settings as-is... just don't change them between tests. The factorial array should average out any non-tested static unchanged variable effects anyway.

I'll (hopefully) post the experimental design and testing instructions this evening. If you guys could turn it around by the end of the week, that would be a good goal, I think. Then we can take the best predicted result from the benchmark data and try running that way for a few days of real-world usage as a confirmation of sorts.

Zach (malaroth)

unread,

Mar 25, 2013, 10:28:45 AM3/25/13

to francos...@googlegroups.com

I just took my vmsettings out completely. Along with all the rest. So if I go down 1/3 on all 4 I wind up with the values of 8/85/171/12582912/1/62/31/2/0/0/1
Did I do it right? Some were fractional so I either rounded up or down.

Paul (pkgnex)

unread,

Mar 28, 2013, 11:20:04 PM3/28/13

to

Here's the data entry form with instructions, guys.

This will entail 32 trial runs, so read the instructions twice, then take your time. I hope the effort is worth it.

To keep it (relatively) simple, were running a 2-factor array. I chose the variable set points 33% below the stock value and 33% above the stock value for each parameter.

The end result of this will be good data on which direction is best for each parameter, how much bang-for-the-buck changing each parameter yields, if any of the values are interrelated (i.e. synergistic effect), and if so, how strongly. I'll do the data processing and post the results within a day or so of having all the data back (target by the end of the week or sooner), along with conclusions on what the next steps might be.

Thanks for your help!

Paul (pkgnex)

unread,

Mar 25, 2013, 5:40:39 PM3/25/13

to francos...@googlegroups.com

CRAP!

I derped the file! Please use this one. The last columns for slice async should be 23 and 54.

cfq_tunable_doe_data_entry_form_and_instructions.xlsx

Paul (pkgnex)

unread,

Mar 25, 2013, 6:51:21 PM3/25/13

to francos...@googlegroups.com

Here's a teaser with just half my data... Don't rush to conclusions, just want you guys to see what the output will look like (there's actually more, this is just the first 2 pages with main effects and simple cross-products). Once I get all the data and normalize for the various benchmark tools, it will look different.

cfq_data_teaser.pdf

Zach (malaroth)

unread,

Mar 25, 2013, 6:59:13 PM3/25/13

to francos...@googlegroups.com

I'm using basic quadrant and benchmark and tuning pro. If I have to switch to ant I will but lemme know

Paul (pkgnex)

unread,

Mar 25, 2013, 7:13:23 PM3/25/13

to francos...@googlegroups.com

That's fine. Just let me know what you used, and boil it down to a single number for each cell (at least for now). I'm going to "normalize" the difference between apps used anyway. If you want to use both quadrants and tuner pro (64 tests), that would be awesome!

I think I'm going to do 32 with antutu (done) and androbench (likely tomorrow). The more data, and the more different benchmark tools, the better!

Paul (pkgnex)

unread,

Mar 25, 2013, 10:46:05 PM3/25/13

to francos...@googlegroups.com

Very early to say too much (only have data from myself and Khrushy), but the DOE is crazy interesting already (to a stats nerd like me, at least).

So far, it's looking like changing any of the factors individually does not make a huge difference on average i/o speed (i.e. the "main effects" are minor). This kind of tracks with our gut feeling - and Osmosis' out-right statement on XDA - that there is a lot of noise and not much of a signal in the benchmark data.

However, looking at the interaction (synergy) effects shows they have a much larger amplitude and the cross-plots are nearly orthogonal. I've seen this a time or two before, and at first it is kind of disconcerting because it's not the "textbook" case. What it usually means is not (unfortunately) that tuning two factors will get you great results, but rather that you can really screw things up if you tweak two of the parameters in the wrong direction simultaneously! Or, more succinctly, it looks like this DOE may tell us more definitively what NOT to do, rather than give us a sea-changing optimum configuration. There will still be a best combination, it just won't be a huge difference over several other possibilities.

Time (and more data) will tell if that is indeed the case this time around, though.

I need a beer.

Chris (osm0sis)

unread,

Mar 25, 2013, 11:08:26 PM3/25/13

to francos...@googlegroups.com

Paul you deserve a beer! I'll do it with Androbench after I'm done these 2 night shifts. Could it be worth tackling deadline again with this systematic approach too now that we're not so sure about it?

Paul (pkgnex)

unread,

Mar 25, 2013, 11:16:40 PM3/25/13

to francos...@googlegroups.com

We can probably can look at deadline, although I'm not sure how much there is to gain. Maybe after we're done with CFQ we'll think there's a lot left to gain. I think we only had 3 or 4 factors in that one that were in question in Deadline, too, right?

The reason I ask is that having any more than 4 factors at a time means one of two things - larger matrices (and therefore more test runs), or the interaction effects get confounded with the main effects "in the matrix" and you lose them. I generally prefer not to confound the interactions, and after my preliminary look at the CFQ factors we're collecting data on, I REALLY recommend against it for our i/o scheduler parameter trials. However, you can always test 3 or 4 factors, then test another 3 or 4 separately if we have 6 or more to look at in Deadline. I think we'll learn enough from CFQ that we can group larger numbers of tunables together in a way so as to minimize the interactions.

Chris (osm0sis)

unread,

Mar 26, 2013, 1:04:56 AM3/26/13

to francos...@googlegroups.com

Yeah there are only 5 tunables in total for deadline so we could definitely factor one out. frontmerges maybe seems like an obvious choice since it's a Boolean.

Paul (pkgnex)

unread,

Mar 26, 2013, 7:27:19 AM3/26/13

to francos...@googlegroups.com

Sounds good. We can attack Deadline in the same fashion as we currently are with CFQ, then. I'll send out instructions for Deadline after we are done with CFQ.

Zach (malaroth)

unread,

Mar 26, 2013, 7:55:03 AM3/26/13

to francos...@googlegroups.com

What numbers are you looking for with the tested values? On ant I can give you my io score but on benchmark and tuning it is totally different and I don't think I can break it down into "751" like I get on ant. On benchmark and tuning my io operations scores like 2762 so that's kinda throwing me off. I'm going to save this -1/3 set of values and test with ant every couple hours. No consecutive testing.

Paul (pkgnex)

unread,

Mar 26, 2013, 8:03:24 AM3/26/13

to francos...@googlegroups.com

Don't worry about the magnitude of the numbers, I'm going to normalize them all together anyway so it will be apples to apples. If the 2762 is just for io operations, that should be fine. I do need to know if bigger is better or smaller is better in that benchmark, though. Hopefully bigger is better, but if it's the other way around I can invert the numbers and re-normalize them.

Thanks.

Zach (malaroth)

unread,

Mar 26, 2013, 8:09:15 AM3/26/13

to francos...@googlegroups.com

Yes benchmark and tuning bigger is better. Cool... I can break down just the io scores on both. I don't have office suite or anything like it so mine will be a plain txt doc. Sorry bout that. Hopefully we'll know this weekend. Now can I test consecutively different values? I.e. each of the 8 different sets 1-8, record values, wait awhile and test again? I'll be setting them on boot with init.d so you'll have a fresh boot with each test.

Paul (pkgnex)

unread,

Mar 26, 2013, 12:05:17 PM3/26/13

to francos...@googlegroups.com

That will work, but you probably don't need to boot between each test. Sounds painful!

Zach (malaroth)

unread,

Mar 26, 2013, 12:59:47 PM3/26/13

to francos...@googlegroups.com

For anyone that can't do a spreadsheet or doesn't want to here's a text file with all the numbers and slots for test 1 and test 2 and doubled for multiple benchmark programs here.
http://db.tt/OHce3hwl

Zach (malaroth)

unread,

Mar 26, 2013, 10:39:19 PM3/26/13

to francos...@googlegroups.com

I've run 24 of 32 tests. I'll run the last 8 in the morning. To note, with Christens benchmark and tuning I'm able to run solely an io test, whereas with anTutu I have to run a full test to get the io operations. Don't know if that matters. I'll redo the benchmark and tuning with full if necessary. I just have the 4 171 variables to run a second test on each program after a waiting period and then I'll post my final numbers.

Zach (malaroth)

unread,

Mar 26, 2013, 10:44:48 PM3/26/13

to francos...@googlegroups.com

Oops... OK I'm only half done. I've only run 2 tests on 2 programs. Guess I'll have the other 40 by tomorrow night hopefully

Stu (Khrushy)

unread,

Mar 27, 2013, 12:18:15 AM3/27/13

to francos...@googlegroups.com

With Antutu you can just deselect the CPU, 2D and 3D tests before you start the test. You should come up with a number somewhere in the ballpark of 800 by just running the database and sd io tests together.

Zach (malaroth)

unread,

Mar 27, 2013, 7:05:09 PM3/27/13

to francos...@googlegroups.com

Umm.... something broke on my phone. I dunno. All of a sudden my tests are way way way off. Like I was getting numbers in the range of 2300 on Benchmark and Tuning. Now it's 800. The app did not update

Paul (pkgnex)

unread,

Mar 27, 2013, 7:12:01 PM3/27/13

to francos...@googlegroups.com

Could you have hit a thermal throttle point? Maybe try rebooting, then running lagfix (fstrim) like Os has recommended, then rebooting again and retrying?

Other than that, I can't really think of anything. Maybe take a day or so off, uninstall and reinstall the app and try again in a few days?

I'm sure you've already thought of all of this.

If you were close to done, I could probably use the numbers you had completed before your phone went psychotic.

Zach (malaroth)

unread,

Mar 27, 2013, 7:30:46 PM3/27/13

to francos...@googlegroups.com

Here's 3 tests done on 2 programs
http://db.tt/GwJYV8pX

Zach (malaroth)

unread,

Mar 27, 2013, 7:58:12 PM3/27/13

to francos...@googlegroups.com

Actually... come to think of it, it was done after I cleared fK.u app data that the numbers switched. They were consistent until then. Now they're consistent 800's. So I wonder if Christen changed something to be more consistent with ant. He and Franco are real life friends so it's not out of the question that Franco's app could have that change

Paul (pkgnex)

unread,

Mar 28, 2013, 6:41:26 AM3/28/13

to francos...@googlegroups.com

Got the data, Zach. Thanks.

I think Chris was going to pitch in with some, too, and he was the only other person to commit. I'll compile the data we have, but wait for the last of the data before a final report.

Chris (osm0sis)

unread,

Mar 28, 2013, 11:26:10 PM3/28/13

to

Yup. Done my night shifts and had my sleep day so I'll be getting on this later on today.

Edit: Forgot everything was closed tomorrow so I was running around doing errands all day. I'll do my trials over the course of the day tomorrow. Thanks for your patience fellas.

Chris (osm0sis)

unread,

Mar 29, 2013, 4:12:56 PM3/29/13

to

Email sent. The data appears to be all over the place even though I tried to keep it as consistent as possible: battery between 40 and 20%, no other apps in memory but f.Ku and AndroBench, cleared result history beforehand, turned off thermal throttle and keeping an eye on heat anyway (never went higher than 47 degrees, so the throttle shouldn't be involved).

Good thing I got this done when I did. I just got called in for another night shift haha.

Paul (pkgnex)

unread,

Apr 1, 2013, 10:42:46 AM4/1/13

to francos...@googlegroups.com

CFQ Tunable DOE Results:

First, thanks to Malaroth, Khrushy, and Osm0sis for getting me data. I know running benchmarks over and over is so boring it borders on painful.

I compiled everyone's data and ran the analysis (formal ANOVA results attached).

Here's my conclusion - in retrospect, I think it makes sense: Changing any one individual parameter doesn't have a significant effect on the benchmark results. However, lowering or raising certain pairs of factors together has a larger effect - I'm not sure how noticeable this effect would be, but I'm convinced it is real and not just noise in the data. It turns out that what I was doing before was all wrong - I was raising or lowering the "fifo_exp_*" or "slice_*" parameters together as pairs, and I was getting mixed results objectively and subjectively. The thing the DOE showed to be most significant is that the" *_sync" or "*_async" pairs should move together.

The best predicted combination is to lower both fifo_expire_sync and slice_sync, and raise both fifo_exp_async and slice_async. Reading back through the CFQ documentation, it sounds like this would tend to make the scheduler favor reads more over writes, which I believe is where Franco and the rest of you guys already went with Deadline and RoW.

Based on the results, the best combo would be: (fifo_exp_sync, fifo_exp_async, slice_sync, slice_async) = 85, 335, 62, 54

Remember, however, that we only went +/- 33% for our levels. There's nothing to say we couldn't go a little further or do other minor logical tweaks (even numbers, some being even being multiples of each other, etc). I think pushing the limits to 51, 500, 51, 125 would be the most extreme we should try... I actually tried these numbers this morning and it both benchmarked slightly worse and felt more laggy than the 85/335/62/54 number set.

So, I would recommend everyone starting with the 85/335/62/54 values, mess around with any minor changes you want to for a few days, and then let me know what you guys think. Hopefully, we can close this out by this weekend - so we can do the same for Deadline (if there is still an appetite for that), and/or be prepared for Franco's next update.

Let me know if anyone has any questions.

cfq_benchmark_doe_results.docx

Chris (osm0sis)

unread,

Apr 1, 2013, 3:20:58 PM4/1/13

to francos...@googlegroups.com

Nice! Thanks Paul! I'll run these for a bit to see how they feel. Zach has some secret settings he's been working on too so it'll be interesting to see how they compare with your analysis.

Chris (osm0sis)

unread,

Apr 1, 2013, 4:12:05 PM4/1/13

to francos...@googlegroups.com

Decided to play around a bit and I'm seeing pretty good results with 85/250/125/70 here.

Paul (pkgnex)

unread,

Apr 1, 2013, 4:18:04 PM4/1/13

to francos...@googlegroups.com

Those definitely sound reasonable. The CFQ tunable effects aren't that big, so if you find something you like with nice round numbers, it should be fine.

If someone wants to second Chris' numbers, we can all try them for a few days and then make them "final".

Chris (osm0sis)

unread,

Apr 1, 2013, 5:14:59 PM4/1/13

to francos...@googlegroups.com

Nah I wouldn't offer those up as final. Just spitballing. As much as I hate cfq because of how negligible the differences are, I don't want to rush it. Besides, Zach was getting crazy AnTuTu scores with his secret ones he was working on, so hopefully those are reproducible.

Paul (pkgnex)

unread,

Apr 1, 2013, 6:16:29 PM4/1/13

to francos...@googlegroups.com

OK, Zach, you've officially been called out!

If you have some secret CFQ tunable settings that are giving you awesome results, now would be a good time to share them to see if it's reproducible on anyone/everyone else's device(s)!

Zach (malaroth)

unread,

Apr 1, 2013, 11:05:36 PM4/1/13

to francos...@googlegroups.com

I lost the numbers :'( I can't reproduce them but they were awesome. I just tweaked tons of times to find em. I'll try again

Chris (osm0sis)

unread,

Apr 1, 2013, 11:38:17 PM4/1/13

to francos...@googlegroups.com

Bro! Should have told me when you had the chance! ;P

Stu (Khrushy)

unread,

Apr 2, 2013, 12:43:44 AM4/2/13

to francos...@googlegroups.com

By my reading, lowering the sync settings, and raising the async you're favouring synchronous read/writes over async (as opposed to either read or write). Conceptually I have an issue with that given this is about usability, not total throughput, but we'll see how it goes ^^

Been running your recommendations all day, and I'm going to run it for another day before I make my final judgement, but I'm currently leaning towards what I had previously, which was 171 / 171 / 125 / 23 because it gave consistently high read speeds.

Anyway, I'll report back in another day ^^

Paul (pkgnex)

unread,

Apr 2, 2013, 7:06:48 AM4/2/13

to francos...@googlegroups.com

My reading seemed to indicate that, on average, most reads are synchronous requests, and many more writes are asynchronous than not, but you're right that they are not a 100% match by any means. I think the difference between the DOE numbers and your favorites would have to be minor, based on both the data and all the things I've tried. I just don't think CFQ responds that drastically to changing the tunables. I'm open for anything reasonable for CFQ at this point, if more than 1 or 2 people can agree on a number set!

Zach (malaroth)

unread,

Apr 2, 2013, 7:30:14 AM4/2/13

to francos...@googlegroups.com

What little progress we made with cfq I think is all that will be made. Unless someone wants to come up with a new algorithm to assign time slices to processes, we've done all we can. That's the nature of "completely fair queuing". About the same with bfq. If someone wants to email Jens Axboe and ask him for the source on how he wrote bfq we could rewrite it for solid state. Other than that, great job on what HAS been accomplished on cfq, awesome job and continued success on deadline and row, and let's get some badass numbers for vm!

Stu (Khrushy)

unread,

Apr 2, 2013, 7:57:29 AM4/2/13

to francos...@googlegroups.com

Yeah, so I've been the DOE suggestions all of today, and I honestly don't think I can tell any difference between any of the "not obviously spiking" settings.

I think Zach's right. CFQ has come about as far as it can already - which is to say it's very smooth, all the time.

Chris (osm0sis)

unread,

Apr 2, 2013, 2:02:45 PM4/2/13

to francos...@googlegroups.com

Yeah. Totally burnt out on cfq. So if we can find something that at least doesn't feel shitty to all of us I think that'll be fine.

Paul (pkgnex)

unread,

Apr 2, 2013, 2:44:49 PM4/2/13

to francos...@googlegroups.com

That makes two of us. I'm burnt out on CFQ, because it doesn't seem to be responding to any more tweaks. As Zach said, it's smooth, decent, dependable, but I think we've got it to the best place we reasonably can. What we have so far is:

Stock: 125/250/93/39

DOE results: 85/335/62/54 (limited to stock +/- 33%, and there was no huge statistical advantage to these)

Os's recent like: 85/250/125/70

Stu's fav's: 171/171/125/23

I honestly think we can mix-and-match any of these and nobody could tell a difference in a blind test. I'm almost in favor of just leaving it "stock", since that is almost an average of all 4 sets anyway!

Chris (osm0sis)

unread,

Apr 2, 2013, 2:49:22 PM4/2/13

to francos...@googlegroups.com

I still kind of like 70/120/93/39 too, seems slightly quicker in feel than stock. Again, all hard to tell.

The general queue tweaks and other cfq tunables we adjusted seem to have made the majority of the difference that can be made for it.

Paul (pkgnex)

unread,

Apr 2, 2013, 2:53:44 PM4/2/13

to francos...@googlegroups.com

That was my initial favorite, too, but like you said it's really impossible to tell for sure. That, and the only thing the DOE really showed was that if you changed variable 1 or 2 without making a corresponding change to 3 or 4 (respectively), you are actually very slightly degrading the (benchmark) performance!

Chris (osm0sis)

unread,

Apr 2, 2013, 3:22:21 PM4/2/13

to

Things always seem a little more dramatic with feel on the N7, so perhaps Joaquin, Steve and I could try out those combos or anything else that strikes our fancy for those tunables within reason (following the DOE results perhaps) in the next few days on the N7 and see if we can come to a consensus. Then we try those on the GN and inevitably they make no difference and then we'll be done. :P

Basically, if it works on the fat cow that is grouper, it should work everywhere.

Stu (Khrushy)

unread,

Apr 2, 2013, 10:05:39 PM4/2/13

to francos...@googlegroups.com

Been running the 70/120/93/39 settings today. It's a tiny change, but I think I like it. Tossing my not-super-bothered-either-way hat into the ring behind that one.

Paul (pkgnex)

unread,

Apr 3, 2013, 7:14:26 PM4/3/13

to

Guys, do me a quick favor. I know everyone's burnt out on CFQ, and I hope the Deadline DOE results have you trying some things out, but sometime over the next few days, try these for CFQ (other parameters unchanged from 95iosettings script):

fifo_exp_sync 70

fifi_exp_async 125

slice_sync 125

slice_async 54

The reason is that since everyone seems to be liking 70 and 125 for the first two (lowered from stock), the best predicted response from the DOE would be to actually raise the last two a little.

I'm going to take a break from deadline and run this for a day or so and let you know what I think.

Stu (Khrushy)

unread,

Apr 3, 2013, 7:38:57 PM4/3/13

to

Giving it a go now. First look seems to give ever so slightly better results, mostly noticeable on writes. I'll run it for a chunk of today to check it out.

Chris (osm0sis)

unread,

Apr 3, 2013, 8:00:58 PM4/3/13

to

Going to try those out in a bit. But I thought the DOE results said the corresponding *sync/*async should move in the same direction.. so shouldn't the slice numbers be decreased, not increased?

Steve (Gingerbread Man)

unread,

Apr 3, 2013, 8:19:21 PM4/3/13

to francos...@googlegroups.com

70/120/93/39 (which turns out at 100/40 on n7) is scroring higher in terms of writes, just now I got 26.1 write 30.4 read. I ran the SD test several times for each, didn't do an average but this set had more consistently in higher scores.

Paul (pkgnex)

unread,

Apr 3, 2013, 8:33:48 PM4/3/13

to francos...@googlegroups.com

Yeah. I REALLY screwed up my description of the CFQ interaction effects, huh?

What I should have said (or at least meant to) is that lowering or raising the first two (fifo_expire) together is counter-productive, as is raising or lowering the last two (slice) parameters in concert. It turns out that ALL the factors in CFQ have weak inverse interactions with each other (which is why it is pissing all of us off to tune it, I think). Every time you adjust one, you "should" move the other three in the opposite direction. Then when you go to tune the next number, you have to un-adjust or over adjust the previous ones. It's circular, maddening, and almost none of it has any significant effect anyway.

It happened that the "predicted best response" within the ranges we tested was yielded by setting both the sync values to the low level and the async values to the high level, but unfortunately this doesn't mean that if you are playing with one number that you should move it's partner in the same direction!

Clear as mud? I'm really sorry I didn't explain it correctly the first time around.

So... since we're starting to coalesce around lowering the first two numbers, it turns out that the last two should actually be increased... the effect will be small, but at least in the correct direction with the study.

Paul (pkgnex)

unread,

Apr 3, 2013, 8:37:11 PM4/3/13

to francos...@googlegroups.com

Steve,

You're saying 70/120/93/39 is testing better than 70/120/120/54? Just checking.