Odd profiling result

Jon

unread,

Nov 1, 2011, 11:24:35 AM11/1/11

to TheCodeShop

With this setup:

* built with DevKit 4.5.2
* installing to `C:\ruby-test\193`
* AQtime project configured with modules `C:\ruby-test\193\bin
\ruby.exe` and `C:\ruby-test\193\bin\msvcrt-ruby191.dll`
* AQtime project configured with `Run -> Parameters... ->
Parameters:` value of `C:\projects\measurements-git\workloads
\core_rd_filelines_lf.rb` (all others empty)

I get the following results when running measurements
`core_rd_filelines_lf.rb` workload which reads a 500,000 line test
file in ASCII mode three times.

http://www.mediafire.com/imgbnc.php/b423fddd8461d9f16e014de374995e389fddd646b3593b8e7581d034e23f849d6g.jpg

Given that (a) AQtime warned that it wasn't able to instrument many
`msvcrt-ruby191.dll` routines (like `rb_get_path` from `file.o` due to
`no ret instruction`), (b) AQtime ran for a very long time before
generating results, and (c) gave results like the above screenshot,
I'm not yet convinced the source build instructions are correct. It
may also be something in my configuration, or the results may truly be
valid.

How do these results compare with what you're seeing?

Jon

---
http://thecodeshop.github.com | http://jonforums.github.com/
twitter: @jonforums

Dušan D. Majkić

unread,

Nov 1, 2011, 3:04:22 PM11/1/11

to theco...@googlegroups.com

> With this setup:

I get same here. This is where it gets milions of calls (acording to AQTime):

file: io,c

rb_io_getline_1(...)

-> appendline(...) which is called in while().

-> that calls more_char(...)

-> that calls() fillc_buf which re-encodes string
with rb_econv_convert().

what is strange is that number of hits (conversions) is too big, like
multiple times for every single char in file.
It doesn't happen with "rb" file open mode.

There is a side note that on Linux, binary is default, always on mode.
And on Windows,
you have to insist on binary mode, or else msvcrt.dll will mangle with
results.
There is a way to make binary mode default for mingw msvcrt - you
include one extra file in Makefile.
(Dindn't check what ruby core does there, or this will break anything)

Is there a way to check this on Linux, with similar list of function hits?

I'll try somehow to step through code. That should make clear what is
going on there.

Jon

unread,

Nov 1, 2011, 3:23:35 PM11/1/11

to theco...@googlegroups.com

> I get same here. This is where it gets milions of calls (acording to AQTime):
>
> file: io,c
>
> rb_io_getline_1(...)
>
> -> appendline(...) which is called in while().
>
> -> that calls more_char(...)
>
> -> that calls() fillc_buf which re-encodes string
> with rb_econv_convert().
>
> what is strange is that number of hits (conversions) is too big, like
> multiple times for every single char in file.
> It doesn't happen with "rb" file open mode.

Unoptimized encoding code path when in "r" mode on Windows?

I bet ruby-core didn't consider this case and that's why it's not optimized.

> There is a side note that on Linux, binary is default, always on mode.
> And on Windows,
> you have to insist on binary mode, or else msvcrt.dll will mangle with
> results.
> There is a way to make binary mode default for mingw msvcrt - you
> include one extra file in Makefile.

Sounds good, details?

I was about to dig into `file.c` or `io.c` and #ifdef in binary mode for `File.open` and potentially others. But your idea is much cleaner.

Let's try it and see what, if anything breaks :)

> Is there a way to check this on Linux, with similar list of function hits?

strace, ltrace or similar?

I've got strace and ltrace on my Arch VM but haven't used either on this issue.

Roger Pack

unread,

Nov 1, 2011, 3:30:04 PM11/1/11

to theco...@googlegroups.com

> There is a side note that on Linux, binary is default, always on mode.
> And on Windows,
> you have to insist on binary mode, or else msvcrt.dll will mangle with
> results.

By mangle you mean converting "\r\n" to "\n" I presume?
-r

Dušan D. Majkić

unread,

Nov 1, 2011, 3:39:44 PM11/1/11

to theco...@googlegroups.com

> I was about to dig into `file.c` or `io.c` and #ifdef in binary mode for `File.open` and potentially others. But your idea is much cleaner.
>
> Let's try it and see what, if anything breaks :)

I do this xx_init() part in Redis port

/* Force binary mode on all files */
_fmode = _O_BINARY;
_setmode(_fileno(stdin), _O_BINARY);
_setmode(_fileno(stdout), _O_BINARY);
_setmode(_fileno(stderr), _O_BINARY);

But MingGW have binmode.o in lib directory, and all you need to do is
link it as first. It does the same thing - sets global _fmode to
_O_BINARY, but linking it as first guarantees that it is set for all
linked code.

> By mangle you mean converting "\r\n" to "\n" I presume?

Yes.

Jon

unread,

Nov 1, 2011, 3:40:37 PM11/1/11

to theco...@googlegroups.com

> > There is a way to make binary mode default for mingw msvcrt - you
> > include one extra file in Makefile.
>
> Sounds good, details?
>
> I was about to dig into `file.c` or `io.c` and #ifdef in binary mode for `File.open` and potentially others. But your idea is much cleaner.
>
> Let's try it and see what, if anything breaks :)

I bet you're going to do something like http://www.cygwin.com/cygwin-ug-net/using-textbinary.html

:)

Jon

unread,

Nov 1, 2011, 3:44:12 PM11/1/11

to theco...@googlegroups.com

Fantastic!

What do you think of pulling the link operation up into the configure.in and Makefile stuff as part of the existing mingw behavior?

Jon

unread,

Nov 1, 2011, 5:35:02 PM11/1/11

to TheCodeShop

> > > I was about to dig into `file.c` or `io.c` and #ifdef in binary mode for `File.open` and potentially others. But your idea is much cleaner.
>
> > > Let's try it and see what, if anything breaks :)
>
> > I do this xx_init() part in Redis port
>
> > /* Force binary mode on all files */
> > _fmode = _O_BINARY;
> > _setmode(_fileno(stdin), _O_BINARY);
> > _setmode(_fileno(stdout), _O_BINARY);
> > _setmode(_fileno(stderr), _O_BINARY);
>
> > But MingGW have binmode.o in lib directory, and all you need to do is
> > link it as first. It does the same thing - sets global _fmode to
> > _O_BINARY, but linking it as first guarantees that it is set for all
> > linked code.
>
> Fantastic!
>
> What do you think of pulling the link operation up into the configure.in and Makefile stuff as part of the existing mingw behavior?

Let's update `configure.in` to support a new `--enable-binmode` flag
and get more experience with this and see what breaks.

For my usage, as long binary doesn't break core behavior in odd ways
on Windows, I'd be OK. But there's Windows use cases where the change
of default behavior wouldn't work so well. This appears to be an easy
fix to implement, but it just masks what looks to be the real problem
that Dusan summarized in his first response; too many encoding
conversions in 'r' mode on Windows and currently unknown on Linux.

Dusan:

1) do you want to update `configure.in` for `--enable-binmode` or
shall I?
2) are you confident that AQtime and our current build setup is
giving valid profiling results?

Jon

---

Dušan D. Majkić

unread,

Nov 1, 2011, 5:46:43 PM11/1/11

to theco...@googlegroups.com

> Let's update `configure.in` to support a new `--enable-binmode` flag
> and get more experience with this and see what breaks.

It will not work, since binary encoding is set in a different way in io.c.
I pushed changes to set binary mode as default.

They are on my ruby branch on thecodeshop.

With this chenge, if you File.open() with 'r' it is translated to 'rb'.
If you want textmode, you must use 't'.

All tests pass, and 17sec on core_rd_filelines_crlf is gone.

We should check if this brake anything else(?)

Jon

unread,

Nov 1, 2011, 6:15:36 PM11/1/11

to TheCodeShop

On Nov 1, 5:46 pm, Dušan D. Majkić <dmaj...@gmail.com> wrote:
> > Let's update `configure.in` to support a new `--enable-binmode` flag
> > and get more experience with this and see what breaks.
>
> It will not work, since binary encoding is set in a different way in io.c.
> I pushed changes to set binary mode as default.
>
> They are on my ruby branch on thecodeshop.
>
> With this chenge, if you File.open() with 'r' it is translated to 'rb'.
> If you want textmode, you must use 't'.
>

Just fixed my bad local repo config and pulled your branch. I see now.

[remote "codeshop"]
...
fetch = +refs/heads/dmajkic/*:refs/remotes/codeshop/dmajkic/*

> All tests pass, and 17sec on core_rd_filelines_crlf is gone.
>
> We should check if this brake anything else(?)

Nice. I'm assuming there's tests for the following and they're OK, but
I wonder how methods like these (and from Enumerable) work with the
new binmode default while working on a CRLF file?

IO#each_line
IO#gets
IO#readline[s]
IO#pos=
IO#seek

What about appending to an existing file that's CRLF? If you forget to
write with 't' mode you'll likely end up with a file that is both CRLF
and LF delimited?

Enough speculation...going to build your branch tonight!

Good stuff, thanks.

Jon

---

Roger Pack

unread,

Nov 1, 2011, 6:43:49 PM11/1/11

to theco...@googlegroups.com

We should check if this brake anything else(?)

Try reading from a file that has "\r\n" line endings...it should break from the default File.read, but I'm ok with that :)
-r

Dušan D. Majkić

unread,

Nov 1, 2011, 6:47:17 PM11/1/11

to theco...@googlegroups.com

> Try reading from a file that has "\r\n" line endings...it should break from
> the default File.read, but I'm ok with that :)

We should check how ruby skips re-encoding on Linux. That is really
the slow part.
It could be that it simply defaults to binary, but we should check.

At least, we now have something to look at.

Jon

unread,

Nov 2, 2011, 10:06:29 AM11/2/11

to theco...@googlegroups.com

> > Try reading from a file that has "\r\n" line endings...it should break from
> > the default File.read, but I'm ok with that :)
>
> We should check how ruby skips re-encoding on Linux. That is really the slow part.
> It could be that it simply defaults to binary, but we should check.

Yes, and see if we can find a way bail out of the re-encoding cycles at the earliest possible point when on Windows.

We should dive into the current Encoding tests and see if they contain anything which uses "\r\n" or whether everything is "\n". If not, let's add `Test::Unit::TestCase` tests that both check for correct behavior but also make it more apparent that the current implementation needs to be optimized. This would be a good submission to ruby-core.

BTW, I did some quick benchmarking with JRuby on both Arch and Win7 and JRuby's performance for both "\r\n" and "\n" is essentially the same. I'll publish later, but what this tells me is that the JRE has more stable and consistent code paths than MRI wrt this issue.

> At least, we now have something to look at.

Absolutely. Later today I'm going to upload a proof-of-concept 7z archive to our download page on GitHub.

FYI, I did a full build of Dušan's dmajkic/ruby_1_9_3 last night and successfully ran "make test && make test-all TESTS='openssl fiddle psych'" :)

Looks like we're tackling the right classes of issues (require performance, api selection, io performance) and I want to get a few more eyes testing what we've got so far. The proof-of-concept archive will be the dmajkic/ruby_1_9_3 branch + Yura's latest `require` optimization patch + a tweak to work better with pik.

Specifically, I'd like to see us get behavior and performance feedback in the following areas:

* Rails
* BDD/TDD (cucumber, rspec, minitest)
* GUI (gtk2 and tcl/tk)
* XML and other file/data munging apps

Reply all

Reply to author

Forward