Mosh is A Fast R6RS Scheme interpreter.
Homepage:
http://mosh.monaos.org/
Reference Manual:
http://mosh.monaos.org/
Developed by:
Higepon, kokosabu, herumi and .mjt.
About This Release
------------------
Mosh becomes R6RS compliant.
Passed all of "R6RS test suite" which is written by PLT Scheme
project.
Run like following.
mosh --loadpath=r6rs-test-suite r6rs-test-suite/tests/r6rs/run-via-
eval.sps
Added new reference manual.
http://mosh.monaos.org/
Supported build on Windows with Visual C++ by herumi.
Supported build on Mingw by .mjt.
Added Socket API.
See Manual and example/irc-client.ss
Added FFI (Foreign Function Interface).
See Manual and test/ffi.scm.
Implemented R6RS Numric tower.
Implemented R6RS Port I/O.
Implemented R6RS Simple I/O.
Imroved reader and scanner.
Added Alex Shinn's match library.
Added Tiny CLOS object system ported to R6RS by Christian Sloma.
Added MySQL API.
Added SRFIs written by Derick Eddington.
(srfi :0 cond-expand)
(srfi :1 lists)
(srfi :2 and-let*)
(srfi :6 basic-string-ports)
(srfi :8 receive)
(srfi :9 records)
(srfi :11 let-values)
(srfi :13 strings)
(srfi :14 char-sets)
(srfi :16 case-lambda)
(srfi :19 time)
(srfi :23 error)
(srfi :26 cut)
(srfi :27 random-bits)
(srfi :31 rec)
(srfi :37 args-fold)
(srfi :38 with-shared-structure)
(srfi :39 parameters)
(srfi :41 streams)
(srfi :42 eager-comprehensions)
(srfi :43 vectors)
(srfi :48 intermediate-format-strings)
(srfi :61 cond)
(srfi :64 testing)
(srfi :67 compare-procedures)
(srfi :78 lightweight-testing)
(srfi :98 os-environment-variables)
(srfi :99 records)
Added many optimizations.
Instruction combination.
Gloc lookup.
SSE flags.
Efficient Fasl.
Updated Boehm GC to 7.1.
---
MINOWA Taro(Higepon)
http://www.monaos.org/
http://code.google.com/p/mosh-scheme/
http://www.facebook.com/people/Taro_Minowa_Higepon/649065487
http://friendfeed.com/higepon
I just compiled it here (./configure && make) and ran a simple test:
(import (rnrs))
(define (fact n)
(let f ((n n) (r 1))
(if (< n 2) r
(f (- n 1) (* r n)))))
(and (fact 100000) #t)
But it gave me an annoying stack of countless:
GC Warning: Repeated allocation of very large block (appr. size
28672):
May lead to memory leak and poor performance.
Moreover, I had to kill it because it was quickly eating all memory
and about to prey on swap as well.
OTOH, perhaps it's my fault. I only have:
namekuseijin@nameku:~/dev/scheme$ ls /usr/lib/libgmp*
/usr/lib/libgmp.a /usr/lib/libgmp.so.3.4.2 /usr/lib/libgmpxx.so.4
/usr/lib/libgmp.la /usr/lib/libgmpxx.a /usr/lib/libgmpxx.so.
4.0.2
/usr/lib/libgmp.so /usr/lib/libgmpxx.la
/usr/lib/libgmp.so.3 /usr/lib/libgmpxx.so
So I:
ln -s /usr/lib/libgmp.so.3.4.2 ~/lib/libgmp.so.4.2
and LD_LIBRARY_PATH=~/lib ./configure
so that I could overcome the need for downloading and compiling the
latest GMP available... ^_^
It worked well enough for quite short facts, like (fact 100) on the
repl though.
So, I couldn't really test the GMP. OTOH, I tried some looping:
(import (rnrs) (rnrs mutable-strings))
(define (go n)
(define v (make-vector (+ 1 n)))
(do ((i 0 (+ 1 i))) ((< n i))
(do ((j 0 (+ 1 j))) ((< i j))
(do ((k 0 (+ 1 k))) ((< j k))
(vector-set! v k k)))))
(define (go-dbg n)
(define v (make-vector (+ 1 n)))
(do ((i 0 (+ 1 i))) ((< n i))
(do ((j 0 (+ 1 j))) ((< i j))
(do ((k 0 (+ 1 k))) ((< j k))
(write i)(write j)(write k)(newline)))))
(if (eq? 'dbg (read))
(go-dbg (read))
(go (read)))
namekuseijin@nameku:~/dev/scheme$ time echo go 200 | mosh loop/
loop.scm > /dev/null
real 0m0.148s
user 0m0.144s
sys 0m0.004s
namekuseijin@nameku:~/dev/scheme$ time echo dbg 200 | mosh loop/
loop.scm > /dev/null
real 0m2.547s
user 0m2.356s
sys 0m0.188s
For a comparison, here's plt-scheme:
namekuseijin@nameku:~/dev/scheme$ time echo go 200 | ~/plt/bin/plt-
r6rs loop/loop.scm > /dev/null
real 0m0.388s
user 0m0.356s
sys 0m0.028s
namekuseijin@nameku:~/dev/scheme$ time echo dbg 200 | ~/plt/bin/plt-
r6rs loop/loop.scm > /dev/null
real 1m25.141s
user 1m19.581s
sys 0m5.540s
So, yes, it looks pretty fast! :)
Sorry, your fact needs large memory not only on Mosh, but also on
other Scheme (e.g. Ypsilon).
I think Mosh should have software limit of heap like Ypsilon (default
64MB).
> So, yes, it looks pretty fast! :)
:-)
You can run gambit benchmarks like following.
mosh bench/run-mosh.scm
Cheers.
Not to impose but please consider writing up
a brief description of the implementation
with special attention to what makes it fast
and to what the low-level (sub-scheme) interface
to the run-time system is like. E.g. what are
the essential characteristics of memory mgt. in
this implementation? what are the calling conventions
(and how do they relate to the C stack / flow of control)?
etc.
-t
Sure. But I don't see those messages in other implementations.
Besides, they don't eat all memory.
For instance, for the same 100000!:
namekuseijin@nameku:~/dev/scheme$ time ~/plt/bin/plt-r6rs fact/fact.scm
real 0m39.862s
user 0m36.034s
sys 0m3.460s
or with gambit:
namekuseijin@nameku:~/dev/scheme/fact$ cp fact.scm tmp.scm
namekuseijin@nameku:~/dev/scheme/fact$ sed 's/(import .*)/\(declare
\(standard-bindings\) \(block\)\)/' tmp.scm > fact-gambc.scm
namekuseijin@nameku:~/dev/scheme/fact$ gsc -link fact-gambc.scm
namekuseijin@nameku:~/dev/scheme/fact$ gcc -O3 fact-gambc.c
fact-gambc_.c -o fact-gambc -lgambc
namekuseijin@nameku:~/dev/scheme/fact$ time ./fact-gambc
real 0m32.052s
user 0m31.734s
sys 0m0.320s
Now, mzscheme almost to the same level as gambit! Now that is amazing! :D
> You can run gambit benchmarks like following.
>
> mosh bench/run-mosh.scm
I missed that. :P
> Now, mzscheme almost to the same level as gambit!
MzScheme should be faster than what it's showing. Basically,
MzScheme uses libgmp (iirc), but so does ikarus. On the same
machine (MacBook Intel Core2Duo 2GHz), Ikarus is 6~7 times
faster than both, which is unexpected (I compared the answers
just to make sure Ikarus is not cheating).
$ ikarus
Ikarus Scheme version 0.0.4-rc1+, 64-bit (revision 1772, build 2009-05-03)
Copyright (c) 2006-2009 Abdulaziz Ghuloum
> (define (fact n)
(let f ((n n) (r 1))
(if (< n 2) r
(f (- n 1) (* r n)))))
> (time (and (fact 100000) #t))
running stats for (and (fact 100000) #t):
1194 collections
6974 ms elapsed cpu time, including 970 ms collecting
7221 ms elapsed real time, including 1005 ms collecting
9932311040 bytes allocated
#t
> ^D
$ mzscheme
Welcome to MzScheme v4.1 [3m], Copyright (c) 2004-2008 PLT Scheme Inc.
> (define (fact n)
(let f ((n n) (r 1))
(if (< n 2) r
(f (- n 1) (* r n)))))
> (time (and (fact 100000) #t))
cpu time: 45577 real time: 46396 gc time: 9529
#t
> ^D
$ gsi
Gambit v4.0.1
> (define (fact n)
(let f ((n n) (r 1))
(if (< n 2) r
(f (- n 1) (* r n)))))
> (time (and (fact 100000) #t))
(time (and (fact 100000) #t))
56927 ms real time
56234 ms cpu time (45247 user, 10987 system)
15321 collections accounting for 4615 ms real time (2809 user, 1739
system)
9979093064 bytes allocated
no minor faults
no major faults
#t
>
Holy baloney! You're are right!
namekuseijin@nameku:~$ ikarus
Ikarus Scheme version 0.0.3
Copyright (c) 2006-2008 Abdulaziz Ghuloum
> (define (fact n)
(let f ((n n) (r 1))
(if (< n 2) r (f (- n 1) (* r n)))))
> (time (and (fact 100000) #t))
running stats for (and (fact 100000) #t):
2405 collections
4988 ms elapsed cpu time, including 664 ms collecting
4990 ms elapsed real time, including 691 ms collecting
9931119600 bytes allocated
#t
It's the same if the number is printed, except you have to use an
external cronometer or else you'll miss the timing that goes before the
printed output!
This is even faster than sbcl!
namekuseijin@nameku:~$ sbcl
This is SBCL 1.0.11.debian, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.
SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
* (defun fact (n &optional (r 1))
(if (< n 2) r
(fact (1- n) (* r n))))
FACT
* (time (and (fact 100000) t))
Evaluation took:
9.696 seconds of real time
9.192575 seconds of user run time
0.504031 seconds of system run time
[Run times include 0.904 seconds GC run time.]
0 calls to %EVAL
0 page faults and
9,931,045,056 bytes consed.
T
Saint GMP, Batman!
> $ gsi
> Gambit v4.0.1
>
> > (define (fact n)
> (let f ((n n) (r 1))
> (if (< n 2) r
> (f (- n 1) (* r n)))))
> > (time (and (fact 100000) #t))
> (time (and (fact 100000) #t))
> 56927 ms real time
> 56234 ms cpu time (45247 user, 10987 system)
> 15321 collections accounting for 4615 ms real time (2809 user, 1739
> system)
> 9979093064 bytes allocated
> no minor faults
> no major faults
> #t
I was really surprised at this so I had to check it out for myself! gsi
truly gives about the same time as code compiled by gsc and gcc!
BTW, guys, this is an extensive paper on a number of algorithms for
factorial that give much better performance:
http://www.cs.berkeley.edu/~fateman/papers/factorial.pdf
You facts will never be the same again... bwahahahaa
% mosh
mosh>(import (mosh))
#<unspecified>
mosh>(define (fact n)
(let f ((n n) (r 1))
(if (< n 2) r
(f (- n 1) (* r n)))))
#<unspecified>
mosh>(time (and (fact 100000) #t))y
;;40.426059 real 40.186512 user 0.100006 sys
% mzscheme
Welcome to MzScheme v4.1.3 [3m], Copyright (c) 2004-2008 PLT Scheme
Inc.
> (define (fact n)
(let f ((n n) (r 1))
(if (< n 2) r
(f (- n 1) (* r n)))))
(time (and (fact 100000) #t))
> cpu time: 61291 real time: 61475 gc time: 10828
namekuseijin's case can be GC problem.
Mosh uses Boehm GC. And it needs proper configuration flags for each
operating systems.
Will you please show me your "uname -ar" ?
Thomas Lord wrote:
> Not to impose but
> please consider writing up
> a brief description of the implementation
>with special attention to what makes it fast
>and to what the low-level (sub-scheme) interface
>to the run-time system is like.
Sure.
Mosh uses Psyntax written by Abdulaziz Ghuloum for frontend.
It helps to expand R6RS library system and syntax-case.
Good library thanks.
Backend is a simple stack VM with many optimizations.
Eg.
- procedure inlining
- constant folding
- jump destination peephole optimaztion
- instruction unification
- direct threded code
- gloc
Source files http://code.google.com/p/mosh-scheme/source/browse/trunk
- instruction.scm: Mosh's instruction set.
- VM-Run.cpp: VM loop.
- compiler.scm: compiler
- vm.scm: subset of VM written in C++
It uses Boehm GC for memory manament.
Cheers.
On May 3, 11:03 pm, higepon <hige...@gmail.com> wrote:
> Sorry I tried my Mosh 0.1.0 on my Ubuntu 9.04 (libgmp.so.3.4.4)
I'm on an "ancient" Ubuntu 8.04 on Intel Q6600.
I tried again, this time on mosh repl itself, but had to kill it
again:
GC Warning: Repeated allocation of very large block (appr. size
53248):
May lead to memory leak and poor performance.
Killed
At first I thought it might be because of my lousy trick of faking to
have the latest libgmp by ln -s /usr/lib/libgmp.so.3.4.2, but I just
did the same with ikarus, compiled it with the same old lib, and it
runs fine.
> namekuseijin's case can be GC problem.
> Mosh uses Boehm GC. And it needs proper configuration flags for each
> operating systems.
> Will you please show me your "uname -ar" ?
namekuseijin@nameku:~/dev/scheme/loop$ uname -ar
Linux nameku 2.6.24-23-generic #1 SMP Mon Jan 26 00:13:11 UTC 2009
i686 GNU/Linux
BTW, perhaps this discussion should be better suited for something
like:
http://code.google.com/p/mosh-scheme/issues/list
You're right.
Will you please register the issue?
If you would register, the issue tracking system will notify you when
I comment or change the status of bug.
Cheers.
Done.
Please use '--heap-limit' flag to run fact program on Ypsilon. :)
$ time ypsilon --heap-limit=384 fact.scm
real 0m35.847s
user 0m36.166s
sys 0m0.256s
Yoshikatsu Fujita
I can finally compare! :)
IronScheme takes about 55 seconds, still not major fast, but getting
there!
I dont understand however why so much memory is needed in Ypsilon/
Mosh.
During the evaluation, extra memory (beyond what IronScheme uses by
default) never went higher than 16MB. Odd 9000 GC's though...
Cheers
leppie
BTW, does .NET provide bignums?
$ time ypsilon fact.scm
real 0m29.678s
user 0m54.735s
sys 0m0.260s
Thank you again. :)
-- fujita
No, I am using an implementation (a bad one at it) they provided with
the DLR.
Word is that .NET 4.0 will include a decent BigInteger class.
> $ time ypsilon fact.scm
> real 0m29.678s
> user 0m54.735s
Is that running on 2 CPU's (cores)?
Yes, that is on Intel Core2 2.33GHz.
Ypsilon uses concurrent GC by default, so 'user' is generally higher
than 'real' on multi-core CPU.
Since fact.scm allocates many many bignums, the GC thread is running
nearly all the time during calculation, and two busy cores bump 'user'
close to 2x (1.84 in this case) of 'real'.
-- fujita
I haven't looked at your code, but I suppose you allocate bigints with
gc_malloc() and Bohem complains that tracing through these large
memory blocks is expensive (since it doesn't know what regions of a
block are pointers it needs to examine it all).
I presume that could be solved by replacing gc_malloc() with
gc_malloc_atomic() for bigints.
namekuseijin wrote:
> > Will you please register the issue?
> Done.
Thank you!
fujita-y wrote:
> Please use '--heap-limit' flag to run fact program on Ypsilon. :)
Cool option. It offers safe and stable runnig. :-)
leppie wrote:
> I can finally compare! :)
Yeah. IronScheme is an older brother of Mosh and a rival.
On Mosh 0.0.1, Biginteger library GMP is not faster version (not asm
version).
It may be slower than on Linux.
I will fix this next release.
BTW, backend of IronScheme (DLR) may have JIT. So it can run very
faster?
Vend wrote:
> I presume that could be solved by replacing gc_malloc() with
> gc_malloc_atomic() for bigints.
Good point. I tried to use gc_malloc_atomic.
And it runs well w/o any warnings. :)
I'm not sure all of allocations in GMP is pointer free or not.
But with gc_malloc_atomic, Mosh passes all tests.
I will realease the fixed version of Mosh tonight.
Cheers.
"GMP may use allocated blocks to hold pointers to other allocated
blocks. This will limit the assumptions a conservative garbage
collection scheme can make. " -
http://gmplib.org/manual/Custom-Allocation.html
You can't use gc_malloc_atomic() for internal GMP allocations, it
would introduce leaks.
If there is some way of calling the destructor when deallocating a C++
class, then you should leave the default memory allocation functions
of GMP and call mpz_clear() in the destructor of your Bignum class.
> blocks. This will limit the assumptions a conservative garbage
> You can't use gc_malloc_atomic() for internal GMP allocations, it
> would introduce leaks.
Thank you.
> If there is some way of calling the destructor when deallocating a C++
> class, then you should leave the default memory allocation functions
> of GMP and call mpz_clear() in the destructor of your Bignum class
We can do this like.
class Bignum : public gc_cleanup // destructor will be called
I've try above, but same warings are issued.
I'll check whether the size of GC_malloc-ed is correct or not.
Best regards.
> collection scheme can make. " -http://gmplib.org/manual/Custom-Allocation.html
On 32bit architecture, using GMP Bignum creates many "false pointer",
which seem to be pointer for GC but is internal representation of
Bignum (array of word).
It will take long time to fix.
Cheers.
But if you don't redefine GMP allocation function shouldn't GMP object
be ignored by the tracer?
Yes.
I tried it today.
But much slower, because GMP calls realloc too many times.
Doesn't it do that if garbage collection is used?
Relesed Mosh 0.1.2
http://code.google.com/p/mosh-scheme/downloads/list
What I've done for the problem are following.
(1) For GMP, use malloc/realloc/free instead of GC_malloc/realloc/
free.
This never make "false pointer".
(2) Cleanup the malloc-ed mpz_t with destructor of Bignum class
(extends gc_cleanup).
(3) hook the realloc, call GC_gcollect on right timing.
I also cleanup the code and reduce memory allocations for Bignum.
Mosh becomes much faster I think. :-)
*** Mosh takes about 9sec.
% mosh -v
Mosh R6RS scheme interpreter, version 0.1.2
sewashi% mosh
mosh>(define (fact n)
(let f ((n n) (r 1))
(if (< n 2) r
(f (- n 1) (* r n)))))
#<unspecified>
mosh>(time (and (fact 100000) #t))
;;9.126263 real 8.396524 user 0.676042 sys
#t
*** Ikarus takes about 6.5 sec
% ikarus
Ikarus Scheme version 0.0.4-rc1+, 64-bit (revision 1773, build
2009-05-13)
Copyright (c) 2006-2009 Abdulaziz Ghuloum
> (define (fact n)
(let f ((n n) (r 1))
(if (< n 2) r
(f (- n 1) (* r n)))))
> (time (and (fact 100000) #t))
running stats for (and (fact 100000) #t):
1194 collections
6536 ms elapsed cpu time, including 648 ms collecting
6568 ms elapsed real time, including 637 ms collecting
9932311040 bytes allocated
Thank you for your kind advice.
Cheers.
---
MINOWA Taro(Higepon)
http://www.monaos.org/
http://code.google.com/p/mosh-scheme/
http://www.facebook.com/people/Taro_Minowa_Higepon/649065487
http://friendfeed.com/higepon
Sweet! That's one fast interpreter!
Can you do a timing with Chez Petite too, for comparison's sake? :)
Great, man! I'll download it once I'm home... :)
Thanks.
> Can you do a timing with Chez Petite too, for comparison's sake? :)
Here is Petite Chez Scheme (threaded version)
% petite
Petite Chez Scheme Version 7.4
Copyright (c) 1985-2007 Cadence Research Systems
> (define (fact n)
(let f ((n n) (r 1))
(if (< n 2) r
(f (- n 1) (* r n)))))
> (time (and (fact 100000) #t))
(time (and (fact 100000) ...))
635 collections
23133 ms elapsed cpu time, including 116 ms collecting
23429 ms elapsed real time, including 118 ms collecting
9931612296 bytes allocated, including 9928743496 bytes reclaimed
On my machine Iron Scheme takes about 80 seconds, while Mzscheme
freezes for a long time then returns an error.
Welcome to MzScheme v4.0.2 [3m], Copyright (c) 2004-2008 PLT Scheme
Inc.
> (define (fact n)
(let f ((n n) (r 1))
(if (< n 2) r
(f (- n 1) (* r n)))))
> (fact 7)
5040
> (time (and (fact 100000) #t))
cpu time: 55255 real time: 55639 gc time: 7468
#t
Core 2 Duo here on Windows at work. I remember I also benchmarked it
at home Linux with a more up-to-date mzscheme.
You're using the latest download? I believe the PLT guys were
alerting people to report strange behavior with the latest release,
it's going multicore or something and may not be without bugs...