I was looking for a Common LISP implementation of MD5, but found none. So I wrote a silly port of MD5 in CL. ftp://Samaris.tunes.org/pub/lang/cl/fare/md5.lisp On CMUCL, the result runs about 5.5 times slower than the equivalent optimized C code; that said, on CMUCL, it can also call the external md5sum utility for fast processing of large files or streams. It purports to abide by (and extend) the MD5 interface defined in ACL6.
Excellent things that helped: CL macros. Horrible things that get in the way: CL characters, and lack of builtin modular operators.
The problem is that CL purports to be a high-level language, but actually provides gratuitously incompatible and subtly unusable access to what is ought to be low-level constructs. The world has standardized on low-level byte streams as the universal medium for communication of data, including text. Yet, CL strings are based on a pseudo-high-level characters that are not portably interoperable with worldly text, much less efficiently. That there can be direct support for >8 bit characters and for character attributes is great, but, particularly when efficient portable text-processing is meant, the only way is using (unsigned-byte 8), and suddenly, all builtin support for any text-processing at all vanish. ACL6's SIMPLE-STREAMs are a definite step in the right direction, and I hope other vendors will adopt similar interfaces.
CommonLISPers often diss Scheme for being such a small language, which forces development of incompatible implementation-specific extensions for any interesting work. Well, we have to face the fact that in today's world, CL is also a small language by this criterion. Much much smaller than C, SML, OCAML, Haskell, Mercury, Oz, Perl, Python, or whatever.
[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] [ TUNES project for a Free Reflective Computing System | http://tunes.org ] Racism consists in attributing to genetics what is due to memetics. -- Faré
I experienced similar behavior when I ported a C version of md5 to Lisp, in my case Lispworks. I tried a variety of declarations and compiler safety settings, and profiled it as well. I also asked about the results here, and as I recall, the cause was related to the relative slowness of CL in bit-twiddling register width integers. The issue seemed to hinge on dpb returning values rather than affecting storage cells directly, instead causing a lot of setf activity in the innermost portions of the algorithm's loop. Fixnum optimizations might help, but nconc-style variations of dpb would likely be a faster way to approach the problem.
I found macros to be helpful, but I didn't run into any particular trouble with CL characters. I used (read-sequence) into a simple-array and it seemed to work out well enough. Profiling suggested the vast majority of time was spent in the md5 code.
Bit-twiddling may well be slow in Lisp, but there is the other side of the question. In a recent project, I would have gladly accepted an order of magnitude decrease in performance to get Lisp's rational and bignum support instead of slaving my way around the inadequacy of C's 32 bit ints and double floats. I suspect the algorithm would be pretty much as fast or faster in Lisp than in C on the given hardware. It would certainly have been simpler.
> So I wrote a silly port of MD5 in CL. > ftp://Samaris.tunes.org/pub/lang/cl/fare/md5.lisp > On CMUCL, the result runs about 5.5 times slower than the equivalent optimized > C code; that said, on CMUCL, it can also call the external md5sum utility > for fast processing of large files or streams. > It purports to abide by (and extend) the MD5 interface defined in ACL6.
> Excellent things that helped: CL macros. > Horrible things that get in the way: CL characters, > and lack of builtin modular operators.
* Francois-Rene Rideau <f...@tunes.org> | I was looking for a Common LISP implementation of MD5, but found none.
Several implementations are available for those who ask, some as fast as those written in C, and about as low-level as C versions, too.
| Horrible things that get in the way: CL characters, and lack of builtin | modular operators.
Huh? You probably mean that you open a file with element-type character and since you get what you asked for, but not what you wanted, you blame the language instead of your own incompetence at expressing your wishes. If you really want to read a file of 8-bit bytes, specify it, and you get exactly what you want.
| The problem is that CL purports to be a high-level language, but actually | provides gratuitously incompatible and subtly unusable access to what is | ought to be low-level constructs.
Huh? This sounds like a disgruntled programmer more than a language problem. Perhaps we would be able to judge for ourselves if you posted the actual _code_ you wrote to arrive at these weird conclusions?
| The world has standardized on low-level byte streams as the universal | medium for communication of data, including text.
Which is a mistake, since they run into an enormous amount of trouble with supporting more than one character encoding. The Unix model is one of those "simpler than possible" models that do not actually work when pushed too hard. That some operations require punning on the lowest level of representation and that this is not only possible, but easy in C, is not necessarily a good thing. Such representational issues should be explicit. Even C++ has discovered the truth in this, these days.
What has stopped you from using "low-level byte streams" in _your_ code?
| Yet, CL strings are based on a pseudo-high-level characters that are not | portably interoperable with worldly text, much less efficiently.
Whatever that means. I think you are simply seriously confused and would have come a lot further if you had asked for help or read _all_ the fine documentation before you became so frustrated, but that seems to be incompatible with the tunes to which some people's egos play.
| That there can be direct support for >8 bit characters and for character | attributes is great, but, particularly when efficient portable | text-processing is meant, the only way is using (unsigned-byte 8), and | suddenly, all builtin support for any text-processing at all vanish.
The myopia suffered by people who think 8 bits is sufficient is probably never going to be discovered as long as they stare at their data from a distance of only 1 inch. Just because you think you need a byte for a particular operation does not mean that you do, nor that anything else should conform to this particular requirement.
Look, you are not doing text processing when you process bytes. It works in some environments and under some assumptions, but if you are dealing with text, you deal with characters, not their coding, and if you deal with bytes, you are not dealing with characters. It is that simple. Since the C mindset is so insidious and unconscious with those who suffer frm it, including some long-time (not Common) Lisp programmers,
| CommonLISPers often diss Scheme for being such a small language, which | forces development of incompatible implementation-specific extensions for | any interesting work. Well, we have to face the fact that in today's | world, CL is also a small language by this criterion. Much much smaller | than C, SML, OCAML, Haskell, Mercury, Oz, Perl, Python, or whatever.
What does this mean? Your conclusions seem to be drawn from a lot of bad experiernce, but there is no way to determine whether that is due to your incompetence or to whatever it is you conclude it must have been, blaming the language for your problems. Just post the evidence: The code and let people help you figure out what the real problem is.
/// -- Norway is now run by a priest from the fundamentalist Christian People's Party, the fifth largest party representing one eighth of the electorate. -- Carrying a Swiss Army pocket knife in Oslo, Norway, is a criminal offense.
Francois-Rene Rideau <f...@tunes.org> writes: > The problem is that CL purports to be a high-level language, but actually > provides gratuitously incompatible and subtly unusable access to what is > ought to be low-level constructs. The world has standardized on low-level > byte streams as the universal medium for communication of data, including > text. Yet, CL strings are based on a pseudo-high-level characters that are > not portably interoperable with worldly text, much less efficiently. That > there can be direct support for >8 bit characters and for character > attributes is great, but, particularly when efficient portable text-processing > is meant, the only way is using (unsigned-byte 8), and suddenly, all > builtin support for any text-processing at all vanish. > ACL6's SIMPLE-STREAMs are a definite step in the right direction, > and I hope other vendors will adopt similar interfaces.
You've done the work on md5 and I haven't so this is guess work, but my first thought where the problem would lie is that a lot of these algorithms assume efficient 32 bit operations, which in CL will cons. Given sufficient declarations the character->(unsigned-byte 8) conversions with char-code should be negligable.
-- Lieven Marchand <m...@wyrd.be> She says, "Honey, you're a Bastard of great proportion." He says, "Darling, I plead guilty to that sin." Cowboy Junkies -- A few simple words
* Lieven Marchand <m...@wyrd.be> | You've done the work on md5 and I haven't so this is guess work, but my | first thought where the problem would lie is that a lot of these | algorithms assume efficient 32 bit operations, which in CL will | cons. Given sufficient declarations the character->(unsigned-byte 8) | conversions with char-code should be negligable.
MD5 seems to be designed with hardware implementations in mind. This is not a bad idea, but it tends to make software implementations weirder.
I have written an MD5 function in Allegro CL, using several low-level features (which are still at a higher level than C) and decided to split the 32-bit numbers in two 16-bit parts. Emacs has done the same thing, but in a slightly different way. I wanted to add support for a bitwise rotate function that would end up using such instructions if available, but it is probably easier to write MD5 in assembly on each platform than to optimize it better. In the application I used this, MD5 hashes were a serious bottleneck, so it had to be better than using FFI to a C function.
After having worked with the MD5 functions on and off for a month or so, I came to conclude that it _should_ be written in assembly, and started to do that, but it turned out to be extremely time-consuming work. The Intel processors are shy too many registers, so all the fun in trying to make it superfast was replaced by increasing frustration over the design of those processors. Still, initial estimates indicated that a hand- tuned assembly version would be about twice as fast as the code that gcc produced for the naive implementation found in the RFC, so it would be worth it at significant cost.
I also think the new streams design from Franz Inc makes for a good way to deal with the 64-byte buffers. It is wrong to try to work with MD5 at the character level, and the typical use of MD5 is to ensure that some data stream is intact. If the application that consumes the stream can do both MD5 hashing and its real work at the same time, and can roll back the work if the MD5 hash turns out bad, much will be saved compared to making two passes, since we must assume that the MD5 hash is usually good.
/// -- Norway is now run by a priest from the fundamentalist Christian People's Party, the fifth largest party representing one eighth of the electorate. -- Carrying a Swiss Army pocket knife in Oslo, Norway, is a criminal offense.
> | I was looking for a Common LISP implementation of MD5, but found none. > Several implementations are available for those who ask, some as fast as > those written in C, and about as low-level as C versions, too.
Ask whom? If someone was willing to publish code for others to use, he'd already have done it. Or at least announced it on a webpage. So far, the only advertised Common LISP implementation of MD5 is Franz' ACL6's - which doesn't quite fulfill my needs, and is embedded in their application rather than available as portable common lisp. So good for them - no good for me. The ACL version is about twice slower than C. Mine, using CMUCL is about 5.5 times slower (but it calls an external C program for files or large strings).
> | Horrible things that get in the way: CL characters, and lack of builtin > | modular operators. > Huh? You probably mean that you open a file with element-type character > and since you get what you asked for, but not what you wanted, you blame > the language instead of your own incompetence at expressing your wishes.
No, I blame the language for not allowing me to express my wishes. I can, certainly, open files in octet mode - mind you, that's precisely what I do. But then, I'm not interoperable with all the body of character-based text, SEXP or (shudder) XML processing. It is certainly possible to build translation layers from one to the other, but it's clumsy, inefficient, not portable, and underspecified.
Certainly, each implementation specifies (more or less precisely) what happens when you use code-char and such, but then, you have the same kind of incompatible extension hell as in Scheme. Once again, reading their specification, I happen to like the recent things done by Franz (SIMPLE-STREAM), but it's unhappily not directly applicable to me.
For performance, lack of supported modular integer operators is also a big problem.
> Perhaps we would be able to judge for ourselves if you posted > the actual _code_ you wrote to arrive at these weird conclusions?
I consider it bad practice to post large code files on USENET. I posted the URL to that code, which ought to be enough for anyone interested. I repeat it here (and add a second one), in case you missed it: ftp://Samaris.tunes.org/pub/lang/cl/fare/md5.lisp http://tunes.org/cgi-bin/cvsweb/fare/fare/lisp/md5.lisp [I made very minor cleanups and enhancements since yesterday; those who downloaded, beware that two of the transient revisions that I committed a few hours ago (1.3 and 1.4) had buggy typos - sorry; latest released (1.5) is ok]
> | The world has standardized on low-level byte streams as the universal > | medium for communication of data, including text. > Which is a mistake, since they run into an enormous amount of trouble > with supporting more than one character encoding.
It is not a mistake - it is the natural thing to do in a world of proprietary black-box devices and software.
> The Unix model is one of those "simpler than possible" models that > do not actually work when pushed too hard.
I agree, but this is beside the point. I hate UNIX about as one can (does the UNIX haters mailing-list still exist?) - and I use it daily.
> What has stopped you from using "low-level byte streams" in _your_ code?
I did it. But it's not portably interoperable with the character-based SEXP code that I have. Using MD5 to portably support code version tagging, etc., becomes "interesting". By no means impossible. Just a PITA.
> Look, you are not doing text processing when you process bytes.
No I'm not. And sometimes, I want to do only one. Sometimes, I want to do only the other. Sometimes, I am happy with the character processing done by my implementation (though it's not portable). Sometimes, I need precise control on the processing that happens (e.g. because I'm precisely transcoding stuff from one protocol to another). And sometimes, I want to do both byte-processing and text-processing at once on the same stream (at once: switching from one to the other, or even doing both character processing AND md5sum'ing on the same chunk).
In the latter cases, any implicit character processing done by the implementation is an abstraction inversion, to me.
> if you are dealing > with text, you deal with characters, not their coding,
Not even. I could be dealing with words, with sentences, with layout elements. In such contexts, characters are too low-level and not what I like. Yet I don't resent of CL not standardizing on high-level protocols -- it's things that can easily be implemented on top of them. However, wrongly standardizing on low-level things while adding restrictions to them is an abstraction inversion and it's wrong - it's the language getting in the way rather than helping.
One reason C has success is because it has little abstraction inversions (it does, with the implicit call stack and unavailability of user-defined safe-points with respect to temporary variable allocation). Open single-implementation languages (ocaml, perl, python, etc.) can also be fixed with respect to any abstraction inversion that may creep - and indeed you'll find that they have little that gets in the way.
> | CommonLISPers often diss Scheme for being such a small language, which > | forces development of incompatible implementation-specific extensions for > | any interesting work. Well, we have to face the fact that in today's > | world, CL is also a small language by this criterion. Much much smaller > | than C, SML, OCAML, Haskell, Mercury, Oz, Perl, Python, or whatever. > What does this mean?
I'm sorry I don't speak norvegian (yet). It means precisely what it says. Despite the superiority of LISP on some matters (macros, dynamism, object system), it is an inferior language on many other matters (static safety, efficiency, modularity, concurrency/distribution, resource control, modular numbers, interface to the real-world, openness, etc.). If you pick a particular implementation and consider it your world, you may gain back some of it - but then you'll see that you're no better than Scheme, and that the weight of the LISP standard is something that drags you back rather than helps you forth. [This reminds me of the situation of FORTH, too.]
[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] [ TUNES project for a Free Reflective Computing System | http://tunes.org ] If you could ask a unique question to a computer during a Turing test, what would you ask? -- Douglas Hofstader, Metamagical Themas
Lieven Marchand <m...@wyrd.be> writes: > You've done the work on md5 and I haven't so this is guess work, > but my first thought where the problem would lie is that a lot of these > algorithms assume efficient 32 bit operations, which in CL will cons.
Indeed, the *performance* problem is lack of efficient modular integer operations, with all the consing, unconsing, size-checking, etc., that goes with it.
The character issue is much worse: it's a *semantic* problem. It might not be in the speed bottleneck of this particular code (still, a typical character-based application doing MD5 in *portable* Common LISP would have a least 4 or 5 layers of buffers, just in the LISP side -- which does slow things down). But it's a major PITA to code around this nasty abstraction inversion.
[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] [ TUNES project for a Free Reflective Computing System | http://tunes.org ] Motto for a society of free individuals: LIBERTY + RESPONSABILITY = PROPERTY
| If someone was willing to publish code for others to use, he'd already | have done it. Or at least announced it on a webpage.
That has actually been done. I will certainly not blame you for not finding things on the Net, however. That is why asking those wetware search engines called "humans" is still a very good idea.
| So far, the only advertised Common LISP implementation of MD5 is Franz' | ACL6's - which doesn't quite fulfill my needs, and is embedded in their | application rather than available as portable common lisp.
It is not an implementation in Common Lisp, but written in C as a low-level system function. There seem to be serious performance improvements in the upcoming 6.1 release.
| The ACL version is about twice slower than C.
Well, that much is fairly odd, considering that it is written in C. I only noticed that it consed like mad because it used C space for a copy of every string, and that C space was neither freed nor garbage collected.
| No, I blame the language for not allowing me to express my wishes.
I can sympathize with this. I frequently blame the planet earth for exhibiting more gravitation than I wish it had, and I wish magic worked, too, because I certainly do not want to do so much _work_ all the time.
The wishes of good engineers are subjugated to the reality in which they need their wishes to come true. The wishes of really bad engineers are completely disconnected from any reality. I think yours are of the latter kind.
| I can, certainly, open files in octet mode - mind you, that's precisely | what I do. But then, I'm not interoperable with all the body of | character-based text, SEXP or (shudder) XML processing.
You really need to clue in that MD5 does not operate on characters. That some languages have no idea what a character is, but treat it like a small integer is really not something you can blame either MD5 or Common Lisp for. If you really need a stream to exhibit characters while you do MD5 processing on its underlying byte stream, you can do that by creating a new stream class that reads from those 64-byte buffers that you read from the input source and give to MD5. It is really not that hard if you want to _work_ with the language and avoid "wishing" for things that are incompatible with the language you use.
| It is certainly possible to build translation layers from one to the | other, but it's clumsy, inefficient, not portable, and underspecified.
MD5 is specified to work on blocks of 64 bytes. How you get from what you have to 64-byte blocks is really a question of quality of programmer and implementation. I still think you are simply massively incompentent.
| Certainly, each implementation specifies (more or less precisely) what | happens when you use code-char and such, but then, you have the same kind | of incompatible extension hell as in Scheme.
Wrong.
| Once again, reading their specification, I happen to like the recent | things done by Franz (SIMPLE-STREAM), but it's unhappily not directly | applicable to me.
Well, what makes it impossible for you to take their good ideas and implement them on your own?
| For performance, lack of supported modular integer operators is also a | big problem.
I actually agree with this in general, but for MD5, just break the 32-bit integers in half and operate on 16-bit values, instead. I assure you that no significant performance loss is caused by this, and the algorithm does not increase in complexity because of it. This is a fairly simple engineering tradeoff that good engineers will do to get the work done and bad engineers will refuse to do because they wish they did not have to.
| I consider it bad practice to post large code files on USENET.
Well, if your MD5 function is a large code file, then you have even more problems.
| I posted the URL to that code, which ought to be enough for anyone | interested.
There is no way to ascertain that what is pointed to by a URL will remain the same after it has been criticized. We have seen how some massively dishonest _frauds_ on this newsgroup have altered the text of published URLs (even without updating the version) in order to make their critics look bad. Post the code and it will be very hard to "update" it.
I find it rather odd that you cannot destill your problems down to a few simple cases. Everybody can look elsewhere for the full context, but it _should_ be possible to show people your problems with an appropriate excerpt. A good bug report contains a destilled example. A bad bug report contains a million lines of rotten code with a single line "this code is perfect, but your compiler does not conform to my wishes".
| > | The world has standardized on low-level byte streams as the universal | > | medium for communication of data, including text. | > Which is a mistake, since they run into an enormous amount of trouble | > with supporting more than one character encoding. | It is not a mistake - it is the natural thing to do in a world of | proprietary black-box devices and software.
Huh? You have a special knack for non sequiturs. TEXT IS NOT BYTES. You _really_ need to understand this. Failing that, you will run into all sorts of problems, and there will be no end to your complaints, as I suspect you have already noticed.
| I did it. But it's not portably interoperable with the character-based | SEXP code that I have.
Really? Tell you what, when I wrote my MD5 functions for Allegro CL 5.0, I stuffed it between the I/O system and the reader, and I reset and grab the md5 hashes while reading characters from the stream. If I can do this in Allegro CL, so can you in CMUCL. If you naively implement the fairly stupid C model used for MD5, sending blocks of 64 "bytes" down to a new function, instead of grabbing the input buffers of the stream, and run into problems that you do not look into seriously in order to find a better way, you are simply incompentent at what you do and should not blame anyone else, _especially_ not the language.
| Using MD5 to portably support code version tagging, etc., becomes | "interesting". By no means impossible. Just a PITA.
I have no idea what you are trying to talk about.
| No I'm not. And sometimes, I want to do only one. Sometimes, I want to | do only the other. Sometimes, I am happy with the character processing | done by my implementation (though it's not portable). Sometimes, I need | precise control on the processing that happens (e.g. because I'm | precisely transcoding stuff from one protocol to another). And | sometimes, I want to do both byte-processing and text-processing at once | on the same stream (at once: switching from one to the other, or even | doing both character processing AND md5sum'ing on the same chunk).
I honestly fail to see the problem. In my view, it takes a fairly dense programmer to fail to deal with these things intelligently. If you need both byte stream and character stream, as you would do in HTTP, there are two ways of doing that: so-called "bivalent" streams, from which you can read both bytes and characters, but which introduces serious problems in maintaining state information about non-trivial character codings, or some means to switch the type of the stream between byte and character, which communicates to the lower levels what you intend to do from now on.
If you need precise control, ask for it. Your system does _not_ do a lot of weird magic that you have no control over. Just trust me on this, OK? Go read the fine documentation and discover for yourself that you _can_ trust the implementation.
| In the latter cases, any implicit character processing done by the | implementation is an abstraction inversion, to me.
Yes, to you, because you have _already_ inverted the model. You do _not_ convert from bytes to characters to bytes in order to do MD5 hashing on the bytes -- the danger of losing the original byte values is too high no matter how you do things. You have to get at the bytes where they are actually found, just after they have been read, and just before they are written. Anything else is pretty damn stupid.
| > if you are dealing with text, you deal with characters, not their | > coding, | Not even. I could be dealing with words, with sentences, with layout | elements. In such contexts, characters are too low-level and not what I | like.
I am sure you think this is relevant to something. Could you try to make it a bit more clear what it is might be relevant to? No, never mind.
| Yet I don't resent of CL not standardizing on high-level protocols -- | it's things that can easily be implemented on top of them. However, | wrongly standardizing on low-level things while adding restrictions to | them is an abstraction inversion and it's wrong - it's the language | getting in the way rather than helping.
Have you at all considered that _you_ might be wrong in any of this? If not, I would actually like to hear your arguments for why your way of seeing things is correct. I am getting tired of your non sequiturs and the randomness of your "conclusions".
| I'm sorry I don't speak norvegian (yet). It means precisely what it says.
Oh, geez, you really _are_ a typically French retard. Well, thank you for proving that my guess that your whining is due to your incredible incompetence and the arrogance that only rabid ignorants who have no desire to listen to anyone other than the voices in their head. You really made it clear that you lack the
Francois-Rene Rideau <fare+NOS...@tunes.org> writes: > The character issue is much worse: it's a *semantic* problem. > It might not be in the speed bottleneck of this particular code > (still, a typical character-based application doing MD5 in *portable* > Common LISP would have a least 4 or 5 layers of buffers, just in the > LISP side -- which does slow things down). But it's a major PITA to > code around this nasty abstraction inversion.
md5 is defined on a stream of octets. You should calculate it at that level. That later on that stream of octets is converted into something with meaning at a higher level, such as characters in a certain encoding, is neither relevant nor an abstraction inversion.
-- Lieven Marchand <m...@wyrd.be> She says, "Honey, you're a Bastard of great proportion." He says, "Darling, I plead guilty to that sin." Cowboy Junkies -- A few simple words
Francois-Rene> Lieven Marchand <m...@wyrd.be> writes: >> You've done the work on md5 and I haven't so this is guess work, >> but my first thought where the problem would lie is that a lot of these >> algorithms assume efficient 32 bit operations, which in CL will cons. Francois-Rene> Indeed, the *performance* problem is lack of efficient modular integer Francois-Rene> operations, with all the consing, unconsing, size-checking, etc., Francois-Rene> that goes with it.
Well, you can get rid of the consing size-checking, etc., by doing, as Erik says, 16-bit chunks.
Or lie to the compiler so that your function
(defsubst ub32-add/2 (x y) "Return 32-bit modular sum of 32-bit integers X and Y." (declare (type ub32 x y)) (enforce-ub32 (+ x y)))
becomes something like
(defun ub32-add/2 (x y) (declare (type ub32 x y)) (the ub32 (+ x y)))
With the right speed and safety, this will probably be converted to a single 32-bit add instruction, which is what you wanted.
For 16-bit chunks, something like the following might work (barely untested):
Erik Naggum <e...@naggum.net> writes: > | Ask whom? > This newsgroup, for instance.
Ok. Do you (or someone else in this newsgroup) have some MD5 code that you (they) are willing to publish without any licensing issues?
That said, the remark I made "I was looking for a Common LISP implementation of MD5, but found none" just was context, not a reproach to you or to anyone. You seem to be looking for things to argue over.
> It is not an implementation in Common Lisp, but written in C as a > low-level system function.
I was imprecise in my phrasing. I meant "implementation available to Common LISP programmers". This was indeed not enough to satisfy my standards (and even less so considering that this implementation is not available to me but for testing purposes). I suppose I could have also written a FFI to C for CMUCL or CLISP, but I didn't feel like doing both of it, and wanted my code to run on both, as well as on Genera, ThinLisp, etc. In other words, I purported to support the (maybe illusive) idea that there is a usable portable Common LISP language.
[Skipped ad-hominem attacks]
> You really need to clue in that MD5 does not operate on characters.
Who said it did? (actually, the ACL6 interface does, but that's why I extended it to work on octets arrays, too). Still, it operates on streams that I also have to interpret as characters, in a different context (notably so as to use, e.g. WRITE, FORMAT, and all those text processing libraries). CL forces me to do double buffering (actually, treble, or more), which sucks because of coding nightmares even more than for the slowness.
> That some languages have no idea what a character is, but treat it like a > small integer is really not something you can blame either MD5 or Common > Lisp for.
I actually don't blame languages that treat characters as small integers. They provide limited functionality indeed, but at least, they provide some functionality you can portably build upon. I do blame CL for providing an implementation-dependent notion of character that is both too low-level to portably do any interesting thing with it, and too high-level to allow for portable interaction with standard low-level protocols.
> creating a new stream class that reads from those 64-byte buffers > that you read from the input source and give to MD5.
Sure. That's why I was talking about double, treble, whatever, buffering, which is overhead in programming time as well as in running time and space, and qualifying that as "clumsy, inefficient, not portable, and underspecified".
> | Once again, reading their specification, I happen to like the recent > | things done by Franz (SIMPLE-STREAM), but it's unhappily not directly > | applicable to me.
> Well, what makes it impossible for you to take their good ideas and > implement them on your own?
Sure I can. (Well, someone else seems to be doing that for CMUCL). However, so it would be no more portable CL, and so that it remains "ported CL", I'd have to do it on each of the implementations I use (although most of it could hopefully be done in portable CL).
> but for MD5, just break the 32-bit integers in half and operate on 16-bit > values, instead.
Yes, I have thought about this. However, I began with 32-bit because it was more natural, and hoped CMUCL could do well with it. It could do better. I was happy enough with the implementation so as to decide to publish it before to optimize it, if I ever do it (but then instead of writing ugly implementation-specific LISP - some of my friends call that "bugware" - I'd rather write efficient assembly and FFI code, and/or develop a special-purpose subLISP compiler that would do that for me).
> I assure you that no significant performance loss is caused by this,
Thanks for the tip.
> and the algorithm does not increase in complexity because of it.
Sure, but the discussion matters only if we're already into constant factors.
> | I consider it bad practice to post large code files on USENET. > Well, if your MD5 function is a large code file, then you have even more > problems.
It is small by for a code file (700 lines, including lots of comments), but I consider it large and noisy by USENET post standards.
> There is no way to ascertain that what is pointed to by a URL will remain > the same after it has been criticized. We have seen how some massively > dishonest _frauds_ on this newsgroup have altered the text of published > URLs (even without updating the version) in order to make their critics > look bad. Post the code and it will be very hard to "update" it.
Wow, you seem have really low expectations about people, at least on USENET. Well, if this is the kind of things you fear, I have no shame posting the MD5 of my current CVS release 1.8: dd512185e07d1aeeab11454568eb14e9 md5.lisp Unless I can break MD5, or modify bits on every USENET archive, there is no way I can now retract my code. The file I had when I originally posted was not yet on CVS, and I didn't keep a copy (one of the tens of people who downloaded might have it), but apart from a few type declarations, added documentation and reordering of definitions, it is essentially the same as the current one. Notice that now that the code is on CVS, you can browse old versions at any moment - so even in a year, you can check that the MD5 is the one I said. http://tunes.org/cgi-bin/cvsweb/fare/fare/lisp/md5.lisp
> I find it rather odd that you cannot destill your problems down to a few > simple cases.
Here's my problem #1 in a simple case: (ub32-add x y z) Here's my problem #2 in a simple case: (char-code c) Sometimes, the few paragraphs of explanation is as simple as it can get.
> A good bug report contains a destilled example.
Words are a good vector for distilled ideas.
> A bad bug report contains a million lines of rotten code [...]
Which is another reason why I chose not to post my code on USENET.
> with a single line "this code is perfect, > but your compiler does not conform to my wishes".
My code is not perfect, and part of the reason it isn't is that the language has deficiencies.
> | > | The world has standardized on low-level byte streams as the universal > | > | medium for communication of data, including text. > | > Which is a mistake, since they run into an enormous amount of trouble > | > with supporting more than one character encoding. > | It is not a mistake - it is the natural thing to do in a world of > | proprietary black-box devices and software.
> Huh? You have a special knack for non sequiturs. TEXT IS NOT BYTES.
You have a knack for straw man arguments. Of course, text is not bytes. The point is, it is easy to agree on bytes, and impossible to agree on text. Since "binary-compatible" proprietary software and hardware cannot be mended after-the-fact to account for a moving low-level extensional representation for the high-level intentional agreement on what text is, proprietary vendors can but standardize on low-level protocols. So once again, of course, text is not bytes. But any world-standard protocol for communicating text will be based on bytes.
> Really? Tell you what, when I wrote my MD5 functions for Allegro CL 5.0, > I stuffed it between the I/O system and the reader, and I reset and grab > the md5 hashes while reading characters from the stream. If I can do > this in Allegro CL, so can you in CMUCL.
Sure I can. But I don't want to do it for CMUCL, SBCL, CLISP, Genera, ACL, ThinLisp, and every other implementation. In other words, if I pick one implementation and stick to it, I'd be fine - but then, so would I picking a Scheme implementation, OCAML, or a system
> If you need precise control, ask for it. Your system does _not_ do a lot > of weird magic that you have no control over. Just trust me on this, OK?
Sure. But each system does its own non-weird magic, in an incompatible way. That's called non-portability. I'm not criticizing ACL, CMUCL, CLISP, Genera, or anything - each does its stuff correctly. I'm just criticizing CL, and there unhappily seems to be no one to ask about fixing that. (Well, actually, raising the problem in c.l.l will hopefully attract the attention of implementers on the problem and its known solutions, so they might provide efficient APIs to multivalent strings/streams and modular integer operations).
> You have to get at the bytes where they are > actually found, just after they have been read, and just before they are > written. Anything else is pretty damn stupid.
Sure. Who said otherwise? This just means more buffering to handle (costs development, performance, space, semantic gap, etc.) Actually, it calls for doing everything with bytes, and nothing with characters (unless wrapped in a macro for immediate conversion), in any portable program sensitive to such problems -- except that this precludes the use of many text-processing libraries, which is the abstraction inversion.
> | > if you are dealing with text, you deal with characters, not their > | > coding, > | Not even. I could be dealing with words, with sentences, with layout > | elements. In such contexts, characters are too low-level and not what I > | like. > I am sure you think this is relevant to something. Could you try to make > it a bit more clear what it is might be relevant to? No, never mind.
Your claim all throughout was that text-processing was not about bytes. Well, it isn't about characters either, you know. For instance, I am currently writing a small lisp program that dumps annotated diagrams in LaTeX+MetaPost, which can be considered text-processing, and, mind you, nowhere in it does the concept of character even appear. Does that mean that a language ought to not standardize on characters, since they are NEVER an interesting human-level concept, and instead, standardize on TeX-style glyphs and graphic or symbolic structures? No. Languages should standardize on bricks with
...
Raymond Toy <t...@rtp.ericsson.se> writes: > >>>>> "Francois-Rene" == Francois-Rene Rideau <fare+NOS...@tunes.org> writes:
> Francois-Rene> Lieven Marchand <m...@wyrd.be> writes: > >> You've done the work on md5 and I haven't so this is guess work, > >> but my first thought where the problem would lie is that a lot of these > >> algorithms assume efficient 32 bit operations, which in CL will cons. > Francois-Rene> Indeed, the *performance* problem is lack of efficient modular integer > Francois-Rene> operations, with all the consing, unconsing, size-checking, etc., > Francois-Rene> that goes with it.
> Well, you can get rid of the consing size-checking, etc., by doing, as > Erik says, 16-bit chunks.
> Or lie to the compiler so that your function
> (defsubst ub32-add/2 (x y) > "Return 32-bit modular sum of 32-bit integers X and Y." > (declare (type ub32 x y)) > (enforce-ub32 (+ x y)))
> becomes something like
> (defun ub32-add/2 (x y) > (declare (type ub32 x y)) > (the ub32 (+ x y)))
> With the right speed and safety, this will probably be converted to a > single 32-bit add instruction, which is what you wanted.
Probably not. The commercial lisps generally use two or three low bits as a tag (and set them to zero), so although the compiler may emit a 32-bit add instruction, it is really performing a 29 bit add.
> This shouldn't cons except for the result and probably runs faster > than what you have because it doesn't have to call out to a generic + > routine.
The problem is that an (unsigned-byte 32) won't fit in a boxed value on a 32-bit machine. Your ub32-add/2 will cons a bignum when you ash the high sum by 16 bits.
> [ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] > [ TUNES project for a Free Reflective Computing System | http://tunes.org ] > The fundamental class division in any society is not between rich and poor, or > between farmers and city dwellers, but between tax payers and tax consumers. > -- David Boaz, CATO Institute
Ordinarily I refrain from making politics comments on c.l.l., but a quote from somebody from the CATO institute merits at least a sarcastic remark. Especially when such Think-Tank essentially work as a pundit for the rich tax payers who do not want to pay taxes on behalf of the poor tax consumers.
There! I said it!
Cheers
-- Marco Antoniotti ======================================================== NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488 719 Broadway 12th Floor fax +1 - 212 - 995 4122 New York, NY 10003, USA http://bioinformatics.cat.nyu.edu "Hello New York! We'll do what we can!" Bill Murray in `Ghostbusters'.
Marco Antoniotti <marc...@cs.nyu.edu> writes: > You all have been warned! > Francois-Rene Rideau <fare+NOS...@tunes.org> writes: > > [ François-René ÐVB Rideau | Reflection&Cybernethics | > > http://fare.tunes.org ] > > The fundamental class division in any society is not between rich > > and poor, or between farmers and city dwellers, but between tax > > payers and tax consumers. -- David Boaz, CATO Institute > Ordinarily I refrain from making politics comments on c.l.l., but a > quote from somebody from the CATO institute merits at least a > sarcastic remark. Especially when such Think-Tank essentially work > as a pundit for the rich tax payers who do not want to pay taxes on > behalf of the poor tax consumers. > There! I said it!
That doesn't prevent it from being a reasonably pithy saying, and even if all you suggest is true, doesn't prevent it from having some truth. If there's a serious and persistent imbalance between the tax burdens of different groups of people, that _will_ lead to divisiveness, regardless of whether CATO folk are to be regarded for good or for ill.
.. Which vaguely ties to Lisp what with the old quote about Lispers knowing the value of everything, but the cost of nothing ... -- (reverse (concatenate 'string "ac.notelrac.teneerf@" "454aa")) http://www.cbbrowne.com/info/sap.html Talk a lot, don't you?
>So I wrote a silly port of MD5 in CL. > ftp://Samaris.tunes.org/pub/lang/cl/fare/md5.lisp >On CMUCL, the result runs about 5.5 times slower than the equivalent optimized >C code; that said, on CMUCL, it can also call the external md5sum utility
By the way, is this C code maximally portable ISO C? Let's see it.
>for fast processing of large files or streams. >It purports to abide by (and extend) the MD5 interface defined in ACL6.
>Excellent things that helped: CL macros. >Horrible things that get in the way: CL characters, >and lack of builtin modular operators.
What modular operators are missing? The standard truncate function returns an integer quotient and remainder. The logand function can be used as a means of reduce modulo a power of two: (logand #xABCD #xFF) ==> #xCD
Looking at your code, I think I understand what you mean. In some places in the computation, it's necessary to reduce a result modulo #x100000000. This is because the algorithm is defined in terms of machine arithmetic, specifically to be efficiently impementable in machine languages.
Note that a C program which assumes the presence of a type that is exactly 32 bits wide is not maximally portable, so if you rely on the addition of two unsigned longs, for instance, to do the reduction for you, your code is not portable. In maximally portable C, you have to do the & 0xFFFFFFFF operation to reduce a result modulo 32 bits.
jrm> Raymond Toy <t...@rtp.ericsson.se> writes: >> >> (defun ub32-add/2 (x y) >> (declare (type ub32 x y)) >> (the ub32 (+ x y))) >> >> With the right speed and safety, this will probably be converted to a >> single 32-bit add instruction, which is what you wanted.
jrm> Probably not. The commercial lisps generally use two or three low jrm> bits as a tag (and set them to zero), so although the compiler may jrm> emit a 32-bit add instruction, it is really performing a 29 bit add.
Yes, this is true in general, but I'm assuming most commercial lisps also support 32-bit integer types. That might not be true, and then you are hosed. CMUCL does support 32-bit integer types so for
(defun foo (x y) (declare (type (unsigned-byte 32) x y) (optimize (speed 3) (safety 0))) (the (unsigned-byte 32) (+ x y)))
You get something like
<stuff to get x and y into 32-bit integer form from possibly boxed values> 6B0: L1: MOV %NL1, %NL0 6B4: ADD %NL2, %NL0 ; No-arg-parsing entry point <stuff to return this 32-bit result as a boxed answer>
>> This shouldn't cons except for the result and probably runs faster >> than what you have because it doesn't have to call out to a generic + >> routine.
jrm> The problem is that an (unsigned-byte 32) won't fit in a boxed value jrm> on a 32-bit machine. Your ub32-add/2 will cons a bignum when you ash jrm> the high sum by 16 bits.
With CMUCL, I'm pretty sure that the consing happens when it tries to return the final 32-bit result. No consing happens for the 16-bit shift of the high sum, because the compiler understands unboxed 32-bit integers.
k...@ashi.footprints.net (Kaz Kylheku) writes: > What a coincidence! It happens that earlier today, I typed ``md5.lisp'' > into Google and it came up with this:
Nice. I didn't try this combination in Google. Anyway, what gives is the MD5 implementation from CL-HTTP. I had forgotten about that one. Still, I can't use it because of license problems and prefer to write code than grovel to have licenses changed (it seems that whatever you do, people will diss you for it). Thanks for the tip, though.
> By the way, is this C code maximally portable ISO C? Let's see it.
The md5.c file I had, a derivative of the reference implementation, was the fairly portable one used in Erlang. I don't remember if ISO C mandates two's complement arithmetics (the earlier standard didn't; I remember rumors that this one does). If it does, then it is maximally portable. If not, then indeed it requires and'ing with 0xFFFFFFF (which in practice is optimized away by all C compilers that matter).
That said, since the 1980's, 99.99% of the world have standardized on hardware that does two-complement 8/16/32/64 bit arithmetics. The only recent exception I know is Chuck Moore's F21 (20/21-bit two-complement architecture). Even an Ivory LISPM has 32-bit fixnums. (Followup-To: comp.arch ?). Moreover, various formal and informal standards that extend the ISO C standard mandate 32-bit types. In other words, the world isn't just the work of one committee - it's what everyone does. Well, in the CL world, there is no more committee, and implementations don't support any standard for modular arithmetics.
> What modular operators are missing? The standard truncate function returns > an integer quotient and remainder. The logand function can be used > as a means of reduce modulo a power of two: (logand #xABCD #xFF) ==> #xCD
Yes, and "I could also do the same even with a Turing machine". The question was one of builtin support for efficient such operators.
[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] [ TUNES project for a Free Reflective Computing System | http://tunes.org ] "Sometimes it is said that man cannot be trusted with the government of himself. Can he, then, be trusted with the government of others? Or have we found angels in the forms of kings to govern him? Let history answer this question." -- Thomas Jefferson, First Inaugural Address
On Thu, 01 Nov 2001 15:30:22 GMT, Erik Naggum wrote: > | Once again, reading their specification, I happen to like the recent > | things done by Franz (SIMPLE-STREAM), but it's unhappily not directly > | applicable to me. > Well, what makes it impossible for you to take their good ideas and > implement them on your own?
Oh well, I guess it's a good time to announce that I'm implementing simple-streams for CMUCL. It's far from usable as yet, but I'll put something on http://www.actrix.gen.nz/users/mycroft/cl.html over the weekend. If anyone wants to help out, please let me know.
-- The power of accurate observation is commonly called cynicism by those who have not got it. -- George Bernard Shaw (setq reply-to (concatenate 'string "Paul Foley " "<mycroft" '(#\@) "actrix.gen.nz>"))
* Francois-Rene Rideau <fare+NOS...@tunes.org> | Ok. Do you (or someone else in this newsgroup) have some MD5 code | that you (they) are willing to publish without any licensing issues?
No. Part of the reason I do not publish my code anymore is that there are too many incompetents and assholes out there that I specifically do _not_ want to get access to my work if they do not compensate me for it. After all, you are whining and complaining and posting a bunch of crap because you are unwilling to do the work on your own, so giving _you_ this code without any licensing terms seems like a really bad idea.
| You seem to be looking for things to argue over.
Wrong. For some people who think it is their birthright to be imprecise and sloppy and disrespectful, being taken seriously and expected to be precise is sometimes extremely provocative. You seem to be one of those.
| [Skipped ad-hominem attacks]
Well, gee, your own are apparently OK, while you have a problem with mine. That kind of double standard _really_ annoys me. If you do not want what you call ad hominem attacks (but you are still imprecise -- look it up), simply avoid them yourself -- if _you_ cannot do this or do now _want_ to do it, shut the fuck up about what anyone else does, OK?
| I suppose I could have also written a FFI to C for CMUCL or CLISP, but I | didn't feel like doing both of it, and wanted my code to run on both, as | well as on Genera, ThinLisp, etc.
So design a common FFI framework! I have designed and implemented one for my own needs, but I find the number of disgusting losers who would benefit from it if I published my code to be too high. Sharing should be rewarding, not felt as a loss. I choose to withhold what I do from the public because of the number of assholes who would never respect my rights as an owner, but fortunately, they are honest enough to scream bloody murder if someone wants them to sign a license agreement. Withholding from the general public is in fact the _only_ option an author has in these days of "open source" and "free software". I find that many of the problems people have are in fact solved in an hour or two, but I have great fun solving problems for their own sake (and anyone who enjoys mathematics knows what I mean), but the fun that used to be found in seeing somebody else use my code vanished years ago with the agressiveness of the demands of the consumers of open source and free software -- there used to be a sense of gratitude and respect towards those who solved your problems for you and gave you code, but this is replaced by lazy fucks who demand that others solve their problems for them and get pissed off if somebody else refuse to give them their work for free. This gives me solid reason to believe that certain problems are problems only to incompetents and it also gives me a good way to determine if certain "problems" are posted by trolls who do not really want a solution to them. There are lots of people on the Net who only want to get sympathy for their plight, not actually solve their problems. That kind of whining wimps tick me off, and if I can help one of them lose their job or fail an exam by not answering their stupid requests for somebody else's work, that is good!
| In other words, I purported to support the (maybe illusive) idea that | there is a usable portable Common LISP language.
There is, and you have not been betrayed because you need to do some hard work to get something you want. In fact, that sense of betrayal you feel is that same irrational attitude problem that comes from believing in a "perfection" that somebody else owes to you.
| > You really need to clue in that MD5 does not operate on characters.
| Who said it did?
You did, when you complained about the character concept.
| Still, it operates on streams that I also have to interpret as | characters, in a different context (notably so as to use, e.g. WRITE, | FORMAT, and all those text processing libraries).
Well, considering that I have actually done precisely the same thing and did not have _any_ or your problems, what does that tell me about you? Sometimes, the only problem _is_ the person, but you do not believe me, and _stupidly_ think everything is an ad hominem attack. Clue in, dude, an ad hominem attack is an attack on the arguments via the person, it is _not_ an attack on the person for something that person does wrong. Arguing tha you cannot trust someone because he has a bad credit record is ad hominem, sending his unpaid bills to a collection agency is not. Using the fact that somebody has been caught with a Swiss Army pocket knife in Oslo against them in a debate over efficienty Java is ad hominem, arguing that it was not particularly nice to lie to the police about the reason he carried it so he would not get into serious trouble is not. But because you are not very precise, you will probably not understand this difference. So I do not argue that you cannot be trusted or your arguments suck because you are not very precise or pay much attention to detail -- I simply point out that you are imprecise and do not pay much attention to detail, and then I expect you to _fix_ that, but since you are that stupid antagonistic, imprecise dude who seems to think that "you seem to be looking for things to argue over" is a much better approach to the criticism, you will not quite grasp what you are being told in any case. That kind of arrogance is something that those who suffer from it will exhibit in many ways, and the belief that their code is perfect (or any flaws somebody else's fault) goes with it.
| CL forces me to do double buffering (actually, treble, or more), which | sucks because of coding nightmares even more than for the slowness.
No, the language does _not_ force you to do that, damnit. Your very own incompetence and extremely annoying _arrogance_ forces you to do that. Just be smarter. Think more, whine less, work better.
| > That some languages have no idea what a character is, but treat it | > like a small integer is really not something you can blame either MD5 | > or Common Lisp for. | I actually don't blame languages that treat characters as small integers.
Huh? Nobody said you did, either. Do you argue against straw men, now?
| I do blame CL for providing an implementation-dependent notion of | character that is both too low-level to portably do any interesting thing | with it, and too high-level to allow for portable interaction with | standard low-level protocols.
I think you need to explain your understanding of the character type, because this sounds like the rantings of an incredibly incompetent person who has not paid a lot of attention to detail in his life. The _fact_ is that the implementation-dependent aspects of the character are precisely those that manifest itself in the coding and representation which you want to get access to directly. How you can fail to understand this is truly amazing. I wonder when you were satisfied with your bogus belief in "character flaw" as a good explanation for your problems. (If you pardon the pun -- I simply could not resist. :)
| That's why I was talking about double, treble, whatever, buffering, which | is overhead in programming time as well as in running time and space, and | qualifying that as "clumsy, inefficient, not portable, and | underspecified".
That is how you would implement them. Your implementation is only partly a consequence of the language, and only a small part, because you have contributed a lot more than the language has. I get really annoyed by people who think that their algorithm and implementation is perfect and if it does not work the way they want, it is somebody else's fault. You are obviously that kind of person, and there is a serious flaw with your thinking that you need to fix before you are willing and able to listen to counter-arguments. As long as you protect your notion of perfect implementation, you will never get anywhere.
| Sure I can. (Well, someone else seems to be doing that for CMUCL). | However, so it would be no more portable CL, and so that it remains | "ported CL", I'd have to do it on each of the implementations I use | (although most of it could hopefully be done in portable CL).
Yes, it seems that you are one of those perennially dissatisfied people who do not understand that the _only_ way you can arrive at portable _interfaces_ to some common functionality is to write non-portable, implementation-dependent code that implements them efficiently in a particular Common Lisp system. In fact, I am frankly _stunned_ by the lack of insight among people who want _both_ portable interfaces _and_ portable implementations of such interfaces. That is simply not how things work. Common Lisp itself is a prime example: It is a supremely portable language, yet requires massively implementation-dependent code to implement correctly. Actually, it seems that it takes a particular willingness to think things through that many people lack to figure out that what comes out elegant and beautiful at the other end of a long production line has been through a very _dirty_ process. Like, do you think the factory that makes office furniture is as clean as the offices their furniture is sold to, or that writing an elegant poem about love and happiness did not require the usual incredible amount of pain that getting a poem "right" requires? No, creating something beautiful takes a lot of ugly, painful work. If you are not up to that, you really have no business _demanding_ that other people do it for you, or complaining that they have or do not. It is that demanding attitude that makes me _not_ want to give people like you an easy way out of your
* Raymond Toy | With the right speed and safety, this will probably be converted to a | single 32-bit add instruction, which is what you wanted.
But it might box the result when returning it, unless the function call is inlined or declared in to return unboxed results. E.g., Allegro CL offers a means to do this which is pretty intricate, but which yields significant speed-ups in some applications. I have only seen it used, not used it myself, but if this is an issue, do ask and investigate.
/// -- Norway is now run by a priest from the fundamentalist Christian People's Party, the fifth largest party representing one eighth of the electorate. -- Carrying a Swiss Army pocket knife in Oslo, Norway, is a criminal offense.
Paul Foley <mycr...@actrix.gen.nz> writes: > On Thu, 01 Nov 2001 15:30:22 GMT, Erik Naggum wrote:
> > | Once again, reading their specification, I happen to like the recent > > | things done by Franz (SIMPLE-STREAM), but it's unhappily not directly > > | applicable to me.
> > Well, what makes it impossible for you to take their good ideas and > > implement them on your own?
> Oh well, I guess it's a good time to announce that I'm implementing > simple-streams for CMUCL. It's far from usable as yet, but I'll put > something on http://www.actrix.gen.nz/users/mycroft/cl.html over the > weekend. If anyone wants to help out, please let me know.
Thanks, Paul, for stepping up to this task. Too bad we will just miss each other while I am on vacation down-under. :-(
(Paul and I have been discussing his implementation over the past month or so, and he has made very quick progress on it. He has had advance access to our 6.1 streams document, and that is now available on our website. I also have a chapter on encapsulation which should become available in a new version of that document, once it has been massaged by our docuentation expert. Look for the new version within a week.)
-- Duane Rettig Franz Inc. http://www.franz.com/ (www) 1995 University Ave Suite 275 Berkeley, CA 94704 Phone: (510) 548-3600; FAX: (510) 548-8253 du...@Franz.COM (internet)
In article <3213650592348...@naggum.net>, Erik Naggum wrote: >* Francois-Rene Rideau <fare+NOS...@tunes.org> >| Ok. Do you (or someone else in this newsgroup) have some MD5 code >| that you (they) are willing to publish without any licensing issues?
> No. Part of the reason I do not publish my code anymore is that there > are too many incompetents and assholes out there that I specifically do > _not_ want to get access to my work if they do not compensate me for it.
Similarly, people who apply the GNU license to the programs don't want assholes to take code and spin it into a proprietary program that carries use restrictions, doesn't permit redistribution, has no source code and so on. So you see, it boils down to your definition of ``asshole'', which is generally a subjective specialization of ``someone who does things I very strongly disapprove of''.
Erik Naggum <e...@naggum.net> writes: > Part of the reason I do not publish my code anymore is that there > are too many incompetents and assholes out there that I specifically do > _not_ want to get access to my work if they do not compensate me for it.
With such an attitude, I think it is useless for you to wonder why people do not ask you anything.
> | [Skipped ad-hominem attacks] > Well, gee, your own are apparently OK, while you have a problem with > mine. That kind of double standard _really_ annoys me.
I have tried hard not to make any such attack in the current thread. I apologize if there is anything that could be construed as such in my messages.
> So design a common FFI framework!
That's indeed a good idea of thing to do.
> I have designed and implemented one for my own needs, > but I find the number of disgusting losers who would > benefit from it if I published my code to be too high. > Sharing should be rewarding, not felt as a loss.
It sure should be. I regret that you do not feel it is. If you are only ready to help people better than yourself, you'll find that the only people whom you accept to help are those who don't need your help.
> if I can help one of them > lose their job or fail an exam by not answering their stupid requests for > somebody else's work, that is good!
I think your are wasting your precious time with them? Just ignore them. You will thus do yourself (and them) a favor. You seem to me to react like you take perfection as a granted ideal such that ethical behaviour consists in destroying whatever is imperfect. I think that that on the contrary, perfection is an unreached ideal such that ethical behaviour consists in building more perfect things.
> | In other words, I purported to support the (maybe illusive) idea that > | there is a usable portable Common LISP language. > There is, and you have not been betrayed because you need to do some hard > work to get something you want.
I do not feel "betrayed" - nobody owes me anything. I feel like there is a room for improvement, the perception of which can be usefully discussed in such a community forum as comp.lang.lisp.
> Clue in, dude, > an ad hominem attack is an attack on the arguments via the person, it is > _not_ an attack on the person for something that person does wrong.
Granted. Whereas you are only very rude and quick to ignore or dismiss arguments (rather than actually attack them) once you decided that someone was a bad person.
> I simply point out that you are imprecise
And I thank you for it.
> and do not pay much attention to detail, > and then I expect you to _fix_ that,
I will try to.
> | CL forces me to do double buffering (actually, treble, or more), which > | sucks because of coding nightmares even more than for the slowness. > No, the language does _not_ force you to do that, damnit.
Indeed, if I manage to avoid characters altogether (and thus not use any character-based library), or else use non-portable extensions (and lose some interoperability) I can avoid double-buffering. This is a worthwhile engineering tradeoff to consider. But I argue that this is a current flaw in the language.
> | > That some languages have no idea what a character is, but treat it > | > like a small integer is really not something you can blame either MD5 > | > or Common Lisp for. > | I actually don't blame languages that treat characters as small integers. > Huh? Nobody said you did, either. Do you argue against straw men, now?
I didn't say you did. I tried to clarify and argue my opinion, which from your post I construed to be opposite to yours.
> The _fact_ is that the implementation-dependent aspects of the > character are precisely those that manifest itself in the coding > and representation which you want to get access to directly.
No. I do not want to access implementation-dependent aspects of the character. I want to access protocol-dependent aspects of the character, in an implementation-independent way. Sometimes, the protocol and the implementation agree, and then I like to be able to take advantage of it; sometimes they do not, and I find it a pity that it makes reuse of code designed to manipulate characters a nuisance.
> I get really annoyed by > people who think that their algorithm and implementation is perfect and > if it does not work the way they want, it is somebody else's fault. You > are obviously that kind of person,
I know many flaws in my code (it notably does much more consing that it should, which currently makes it very slow on big data - but for big data I call /usr/bin/md5sum, anyway, so that's not a bottleneck to me right now - on the other hand, the way I call an external program might or not leak memory, which is not a concern to me right now but might become in the future).
> the _only_ way you can arrive at portable > _interfaces_ to some common functionality is to write non-portable, > implementation-dependent code that implements them efficiently in a > particular Common Lisp system.
Well, the fact that something implementation-dependent has to be done for efficiency is precisely what an "abstraction inversion" is about when you try to do it portably. I argue it's a flaw in a language; a flaw that may be part of a valid engineering tradeoff in the design and implementation of said language, but a flaw nonetheless. A lot of useful code is written in LISP, C, C++, Java, OCAML, SML, Python, Perl, etc., that is written in a portable way and provide portable interfaces efficiently. But various languages have various flaws; CL has a lot of interesting features that other languages have not, it has also flaws that other languages have not; this means that the potential non-portability issues or abstraction inversions come at different places in various languages. Tradeoff? Maybe. But it is interesting to discuss these tradeoffs.
> creating something beautiful takes a lot of ugly, painful work.
Indeed. But the pain is not the goal, the beauty is. Fruitful discussions help determine how to achieve more beauty with less pain.
> Using implementation-specific things to > implement a common/standard interface to some functionality is not wrong, > it is in fact right,
Indeed. *Having to do it* is the wrong, and *doing it* is the right that hopefully fixes that wrong. The cost of doing it is the measure of the wrong. Better ways of doing it alleviate the wrong, worse ways increase it. Discussing is a way to find better ways.
> 700 lines!? Geez. Using implementation-specific features in Allegro CL, > the core algorithm is 100 lines. Setting up and preparing for this short > and fast algorithm takes another 150 lines.
I get the approximate same code size. The rest is documentation, and a collection of functions to portably mock up the implementation-specific features you use, until each implementation has them. I also add lots of declarations in an attempt to keep the CMUCL compiler happy.
> Have you ever wondered how all these implementations sprung into being > from nowhere when people like you do not want to implement things they > say they want?
Uh? My post was precisely about code I was publishing. I admit I should be back to coding -- wasted more than enough time posting on USENET.
> Ever wondered how all of this "free software" depends on > people's willingness to implement things that sometimes are _boring_ and > which some lazy fuck like yourself does not want to do?
It's not boring when it's the Right Thing(tm) (or rather, the joy overcomes the bore).
> It is not a question of picking one implementation and sticking with it, > but of actually using an implementation to get what you want.
Maybe you should explain the difference.
> interface that does not expose the implementation-dependent features
In the case of modular integers that's easy. In the case of doing the Right Thing(tm) with characters, that's difficult. For other flaws in CL (modularity, concurrency, etc.), I doubt it is possible.
> You seem to believe that as soon as it is necessary to do something > like this, you have found a flaw in the language.
Yes. It might not be a big flaw (depending on what it is), but it is a flaw nonetheless.
> | > You have to get at the bytes where they are actually found, just > | > after they have been read, and just before they are written. > | > Anything else is pretty damn stupid. > | Sure. Who said otherwise? > [...] You have strongly implied "otherwise", you insuffable numbnut.
In as much as I might not always be precise enough in what I say, you read everything with negative prejudice and then become rude.
> [...] All this requires is that you are willing to study your > implementation or talk to the vendor and get at the underlying buffer.
"All this requires" for a portable program is thus a lot of work in each implementation to achieve desired functionality efficiently. So this particular flaw is fixable (and being fixed, it seems). It's still a flaw.
Back to coding,
[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] [ TUNES project for a Free Reflective Computing System | http://tunes.org ] I made it a rule to forbear all direct contradictions to the sentiments of others, and all positive assertion of my own. I even forbade myself the use of every word or expression in the language that imported a fixed opinion, such as "certainly", "undoubtedly", etc. I adopted instead of them "I conceive", "I apprehend", or "I imagine" a thing to be so or so; or "so it appears to me at present".
When another asserted something that I thought an error, I denied myself the pleasure of contradicting him abruptly, and of showing him immediately some absurdity in his proposition. In answering I began by observing that in certain cases or circumstances his opinion would be right, but in the present case there appeared or semed to me
Francois-Rene Rideau <fare+NOS...@tunes.org> writes: > Erik Naggum <e...@naggum.net> writes: > > | CL forces me to do double buffering (actually, treble, or more), which > > | sucks because of coding nightmares even more than for the slowness. > > No, the language does _not_ force you to do that, damnit. > Indeed, if I manage to avoid characters altogether (and thus not use > any character-based library), or else use non-portable extensions > (and lose some interoperability) I can avoid double-buffering. This > is a worthwhile engineering tradeoff to consider. But I argue that > this is a current flaw in the language.
No, this seems rather more like a misunderstanding on your part. You're half understanding that using characters is a Bad Idea; you have to get the rest of the way, and conclude that using characters is not only a Bad Idea, but also pointless.
-- (concatenate 'string "aa454" "@freenet.carleton.ca") http://www.cbbrowne.com/info/sap.html "Oh, I've seen copies [of Linux Journal] around the terminal room at The Labs." -- Dennis Ritchie