for fun I've written md5sum using IMCC, and have attached my first cut.
It will need some further testing, but in the most part it works well.
There are some limitations:
* 200K file limit imposed when reading in the file
* 512MB(?) limit imposed by shortcut in algorithm
* Haven't tested on 64 bit processors
* Haven't tested on big endian
It would be nice to make sure that it works on all architectures as it
would fit well into the parrot test suite; it would be good for both
benchmarking and functionality testing.
The only problems have been:
* Using macros which call one another seemed to hang if argument names
were reused
(There have been discussions on the future of macros, I know)
* /Possible/ IMCC register allocation bug which I need to
reinvestigate (see earlier post)
(May actually have been the following JIT problem?)
* Results from i386 JIT can be corrupt and needs investigating (lots
of FFs)
* Seem to get ICU/string errors on some binary files
* Some checksum discrepencies which I need to investigate
Cheers,
Nick
> for fun I've written md5sum using IMCC, and have attached my first cut.
Wow.
> * Results from i386 JIT can be corrupt and needs investigating (lots
> of FFs)
Yep. Seems to be a problem with this line in swap:
$I13 = $I13 >>> 24
i.e. JITed lsr/shr seem to be broken or (partly) swapped.
> Nick
leo
Fixed.
leo
Fixed before I even got to look at it. You're brilliant, thanks,
Nick
On Tuesday 27 April 2004 16:58, Nick Glencross wrote:
> for fun I've written md5sum using IMCC, and have attached my first cut.
Cool :-)
Should we add it as a library?
jens
That should be easy, shouldn't it?
I'll have a go at devising some tests too. If anyone can let me know
whether it works on big endian/64 bit architectures, and even better
provide me with a fix, then that would be really cool. [Both should work
I hope, although there might need to be some anding with 0xffffffff to
work on 64 bit]
Nick
> for fun I've written md5sum using IMCC, and have attached my first cut.
Yet another f'up. Some enhancements and nit picks.
- buffer should be a IntvalArray (or IntList still)
(I've copied the splice over to intlist.c)
- extracting the chars out of the string can use this opcode:
#$S0 = str[$I5]
substr_r $S0, str, $I5, 1
which reuses $S0
Both give a considerable performance improvement.
- getting argv is better written as:
.sub _main
.param pmc args
leo
> * 200K file limit imposed when reading in the file
The C<read> opcodes truncates files at 64K. Needs fixing.
And we need C<stat> - at least the filesize.
E.g.
read S0, P0 # slurp whole file
And of course: the string really shouldn't get created as UTF-8 in the
first place and then get downscaled().
> * /Possible/ IMCC register allocation bug which I need to
> reinvestigate (see earlier post)
Any signs for that?
> Nick
leo
On Wednesday 28 April 2004 14:31, Nick Glencross wrote:
> Time for an update.
>
> I've now split the code into a library, an example and a test.
>
> * runtime/parrot/include/Digest_MD5.imc (_md5sum and _md5_print calls)
runtime/parrot/library/Digest/MD5.imc should IMO be better.
You should put the functions into a seperate namespace with
.namespace["Digest::MD5"]
to avoid problems with duplicated function names.
I would also remove the underscore from "_md5sum" and "_md5_print".
If you load the code with load_bytecode, you can call the sub with
$P0 = find_global "Digest::MD5", "md5sum"
$P0( ... )
which is AFAIK the offical way to call external functions.
> * examples/assembly/md5sum.imc
> * imcc/t/syn/md5.t
>
> I don't know if the test lives in the most appropriate directory.
It has nothing to do with imcc.
Library tests will go into t/library: t/library/Digest/MD5.t
> Investigating why some checksums were coming out wrong, I've now seen
> that 'read' operation only reads at max 64K of data, so I've now limited
> the file size that the example runs on.
>
> Also, I tried using the @MAIN pragma, this seemed to break with JIT
> (I've left it in the example harness, but not the tests). Should I have
> expected this to work? If not, I'll remove it from the example.
>
> Could I be getting close to having this included with parrot?
Of course!
Leo, what is the status of the pending changes WRT library paths?
Can I help somehow?
> Cheers,
>
> Nick
jens
> I've now split the code into a library, an example and a test.
Good.
> * runtime/parrot/include/Digest_MD5.imc (_md5sum and _md5_print calls)
> * examples/assembly/md5sum.imc
> * imcc/t/syn/md5.t
> I don't know if the test lives in the most appropriate directory.
Rather not. t/lib or such. Jens?
> Also, I tried using the @MAIN pragma, this seemed to break with JIT
> (I've left it in the example harness, but not the tests). Should I have
> expected this to work? If not, I'll remove it from the example.
Should work, but is broken. Dunno yet why.
> Could I be getting close to having this included with parrot?
Yep. We really should start with some lib directory structure though.
> Cheers,
> Nick
leo
t/lib works for me.
> > Could I be getting close to having this included with parrot?
>
>Yep. We really should start with some lib directory structure though.
I think I have a spec. I'll fire it off.
--
Dan
--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk
> .namespace["Digest::MD5"]
Ack
> Leo, what is the status of the pending changes WRT library paths?
Mainly a missing portable way to find the library. *But* searching in
C<runtime/parrot/include> is working for current platforms, so we can
do the same for /library. We just need to abstract it a bit and put a
wrapper into the platforms file(s). It can get refined always and
gradually.
> Can I help somehow?
We would need a more general version of dynext.c:get_path() [1] that just
tests for existence of the file:
STRING * Parrot_get_runtime_path(interp, file_type, filename)
or some such. File type is one of [dynamic/shared extension, include_file,
library] currently.
The searchorder should be configurable finally, for now current directory
and runtime/parrot/* should do it.
> jens
leo
[1] and imcc/imcc.l:include_file()
Thanks all,
Nick
> Also, I tried using the @MAIN pragma, this seemed to break with JIT
Brr. Fixed. A bug hanging around really long and waiting for mainly
integer stuff and @MAIN not at byte_code[0].
The JIT optimizer, which also looks at branches didn't look at anything
before _main. No registers got reloaded after branch targets.
Thanks for md5sum
leo
I can implement it in bytecode, but then this bytecode needs to be linked with
parrot. The advantage is of course that it is then very easy to extend the
mechanism.
> The searchorder should be configurable finally, for now current directory
> and runtime/parrot/* should do it.
Is runtime/parrot/include and runtime/parrot okay for the moment?
Code that currently uses load_bytecode "library/foo.imc" will continue to
work, even if we move the library directory to runtime/parrot/library.
> [1] and imcc/imcc.l:include_file()
How can I regenerate the corresponding C files?
I looks like there's no Makefile rule for it, intentionally not?
jens
There is no portable way to test if a file exists. A generic version
could just try to open it (which BTW can protect from races between
access(2) and open). Platform specific versions can do a better job if
needed.
>> The searchorder should be configurable finally, for now current directory
>> and runtime/parrot/* should do it.
> Is runtime/parrot/include and runtime/parrot okay for the moment?
Yep. We can always extend it. We need something for ICU though, e.g.
keep headers and blib/lib but move the libs into runtime/parrot/icu.
>> [1] and imcc/imcc.l:include_file()
> How can I regenerate the corresponding C files?
> I looks like there's no Makefile rule for it, intentionally not?
You have to Configure.pl --maintainer to get the appropriate Makefile
rules. And you need a working flex.
> jens
leo