Christopher Stacy <cst...@spacy.Boston.MA.US> writes: > >>>>> On Sat, 08 Sep 2001 22:39:22 GMT, Erik Naggum ("Erik") writes:
> Typically, the contracts of full-time employees do not allow "spare time": > all ideas and programs that you write are the intellectual property of > the company, regardless of whether it happened during office hours.
I know that's the case in the US, but is it also the case in the rest of the world?
In fact, in the US, it can be even worse. I was once asked to sign a contract that stated that all ideas I had and all programs I wrote would be the intellectual property of the company EVEN AFTER THE EMPLOYMENT ENDED and with no limit. I would essentially be an intellectual slave for the rest of my life. I refused to sign, but all the other employees had already signed.
-- Robert Strandh
--------------------------------------------------------------------- Greenspun's Tenth Rule of Programming: any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementation of half of Common Lisp. ---------------------------------------------------------------------
> Typically, the contracts of full-time employees do not allow "spare > time": all ideas and programs that you write are the intellectual > property of the company, regardless of whether it happened during office > hours. But there are "lots of" people who cheat on that.
That must be an American invention. Over here in Europe, I even think such contracts are illegal at the EU level. Then there are students, who frequently earn money on the side and do all kinds of weird stuff in exchange for whatever passes for recognition. But there is no doubt that well-funded work flows better than un-funded work.
Robert STRANDH wrote: > In fact, in the US, it can be even worse. I was once asked to sign a > contract that stated that all ideas I had and all programs I wrote > would be the intellectual property of the company EVEN AFTER THE > EMPLOYMENT ENDED and with no limit. I would essentially be an > intellectual slave for the rest of my life. I refused to sign, but > all the other employees had already signed.
Is this contract enforcible in US courts? It looks like unconstitutional.
Robert STRANDH wrote: > In fact, in the US, it can be even worse. I was once asked to sign a > contract that stated that all ideas I had and all programs I wrote > would be the intellectual property of the company EVEN AFTER THE > EMPLOYMENT ENDED and with no limit. I would essentially be an > intellectual slave for the rest of my life. I refused to sign, but > all the other employees had already signed.
Is this contract enforcible in US courts? It looks like unconstitutional.
>> In fact, in the US, it can be even worse. I was once asked to sign a >> contract that stated that all ideas I had and all programs I wrote >> would be the intellectual property of the company EVEN AFTER THE >> EMPLOYMENT ENDED and with no limit. I would essentially be an >> intellectual slave for the rest of my life. I refused to sign, but >> all the other employees had already signed.
> Is this contract enforcible in US courts? It looks like unconstitutional.
IANAL, but it almost certainly is uneforceable. In fact, most company's employment agreements here in the states are overly broad. Of course, since most of the cases never go to court, the point is somewhat moot. The companies want a big stick in the offhand case that an employee starts his own business that competes with his original employer or acts as a drain on the employers' minions or customers.
As long as one is careful to make sure that (for the most part) the original company's resources aren't being used to create the new business and you don't start raiding the original company's employees or customers, everyone looks the other way. And watch out going to work for a competitor (though, usually, the company sues the other company, not you - deeper pockets, ya know :).
As with most things legal, there are horror stories but, usually, these are quite rare. In short, you sign these things to get the job and they are almost always ignored.
Bulent Murtezaoglu <b...@acm.org> writes: > >>>>> "TFB" == Thomas F Burdick <t...@famine.OCF.Berkeley.EDU> writes: > [...] > TFB> They're living in Berkeley, perhaps? $700 isn't unusual for > TFB> rent around here.
> Even for shared housing?
Yep. A 1-bedroom usually costs $1200-1400/mo since rent control went away. Living in a co-op or at the other end of Oakland will save you a bunch of money, but $500 isn't so different from $700.
> TFB> And we're a public university, so a whole > TFB> lot of us don't exactly have many means. 35 hours as a > TFB> university student?
> 35 hours cumulative was what I was thinking. You do get summers though > correct? Is this undergraduate or graduate anyway? What kind of > facilities does the school offer? If it is undergraduate, one thing > you might want to try is to hook up with a research project in the > grad school as a programmer, it might at least get you some access to > better facilities and might pay minimum wage + a few bucks.
Oh, undergraduate. Graduate students can usually scrape up funding from grants. And if you can attach yourself to their projects, you can ride along that. I was just trying to illustrate the situation for an undergrad who's doing it alone, or with another undergrad. I've actually managed to connive my way into a situation where I have access to quite a bit of room on two Solaris servers and a campus job where I can work on my own projects about 50% of the time.
[...]
> TFB> My point is just that most people's assumptions of college > TFB> students' budgets are *waaaay* off.
> Ok, I take your point. I did not mean to offend (I don't think I did > but better be safe and say it). Again, if this is undergrad just hang in > there, it'll be over. If it's grad school, well what can I tell you ... > all the more reason to write the damn thing up (best) or badger your > advisor for decent funding (futile, but you'll hear about the best option > again).
Oh, no offense taken. I'm intentionally quite vocal about the subject because there are a lot of people (especially here in Cali) who remember when we had something of a social democracy and don't realize how much it's changed.
On a good note, from the second-hand experience of several people I know who tried to do significant research as undergrads without a prof sponsoring, most software companies are very accommodating and understand that students have no money, even if universities are flush with it. So coming up with $700 for software for research is generally a non-issue. Just $200 for a box to run it on, which is a lot easier.
In article <xcvlmjpf3ds....@famine.OCF.Berkeley.EDU>, Thomas F. Burdick wrote: >mjo...@ipx.frottage.org (Mark Hulme-Jones) writes:
>> For CL this means things like hash tables, vectors, structs and >> classes. All I'm saying is that it would be even better if Regexps >> and Sockets were included too.
>Do regular expressions really need a separate Lisp specification? >They're already standardized (by POSIX). Now, most RE implementations >ignore this standard completely, but there's always the implementation >in the C library (on unix, anyhow). This is actually quite nice when >writing in C or C++. Sure, CL would want a different interface to the >RE language, but the variety of reasonable possibilities should be >small enough to make porting between them trivial.
The reason I thought it's worth specifying is not so much to ensure the same interface from vendors (though this would be an obvious bonus), but to make sure that they provided regexps at all. It'd be nice to wave the CL spec at people and say, "See, it supports regexps, so tell me again what it is Perl can offer me over CL?"
* Tim Moore wrote: > Interesting theory, but it seems to have a couple of weaknesses. First, > with all that money floating around, you'd think that for-pay software > would have a great shot too, perhaps even an advantage. After all, for a > great many corporate applications, the cost of software is down in the > noise.
It did, but it wasn't newsworthy, so it didn't get the same hype. It's not earth-shattering news that Oracle made bucketloads of money shipping a commercial product for instance.
> Second, I'd like to see an example of *one* piece of software that was > written by someone who should have actually been doing something else at > his high-paying, new economy job. Plenty of free software got written by > people who were trying to solve a problem and decided to share, but you > seem to be suggesting a different scenario.
I'm sure there are examples of such, but this is not what I meant. What I meant was that because all sorts of companies saw themselves knee-deep in money it started to look reasonable to simply give things away, and people were employed to do that, with no real investigation as to where the profit would come from. Of course this is far from the only thing these companies were doing with no investigation as to where the profit would come from.
Finally, in regard to my original post, please not that I was not trying to argue about whether free software is good or bad, I was simply trying to make an observation about the economics of the recent high-tech bubble and how it might effect free software and people's attitudes to it. It is unfortunate that it has become almost impossible to have rational discussions of free software because many people are so emotionally involved. This is the reason why I'm not going to post further in this thread: I hope I've made the point I wanted to make, at least.
> The reason I thought it's worth specifying is not so much to ensure the > same interface from vendors (though this would be an obvious bonus), but > to make sure that they provided regexps at all. It'd be nice to wave > the CL spec at people and say, "See, it supports regexps, so tell me > again what it is Perl can offer me over CL?"
What is the problem with using a widely available regular expression package written in Common Lisp? (I am actually really curious, and do not ask this rhetorically.)
Incidentally, Perl can offer a million morons who are willing to do stupid chores that no thinking person would ever consider worthwhile. This keeps all kinds of progress in the software industry back, because instead of improving the ways software produces logs (to produce log entries with sufficiently good keys that they could be considered to be in some normal form, suitable for database-like access paradigms), or configuration file formats (to perhaps arrive at a common language, even a _rationalized_ XML would suffice), everyone knows that they do not have to think about any of these things -- some Perl programmer with a very narrow focus and little clue will always glue things together for you. Common Lisp cannot compete in that market.
In article <3209116573841...@naggum.net>, Erik Naggum wrote: >* Mark Hulme-Jones >> The reason I thought it's worth specifying is not so much to ensure the >> same interface from vendors (though this would be an obvious bonus), but >> to make sure that they provided regexps at all. It'd be nice to wave >> the CL spec at people and say, "See, it supports regexps, so tell me >> again what it is Perl can offer me over CL?"
> What is the problem with using a widely available regular expression > package written in Common Lisp? (I am actually really curious, and do > not ask this rhetorically.)
I don't have a problem using a library, but when you're trying to convince people of the virtues of Common Lisp and they say things like "Does it support regexps?", and you answer, "Yes, with a library", I find you often get a negative reaction (regexps are actually one of the poorer examples, sockets and threading are better ones). I don't know what people's problem with libraries is. It reminds me of the 'Paradox of the Active User' paper [1] that talked about users who flatly refuse to learn to use their software properly (eg. not installing easily available editor macros, or downloading and installing 3rd party libraries).
> Incidentally, Perl can offer a million morons who are willing to do > stupid chores that no thinking person would ever consider worthwhile. > This keeps all kinds of progress in the software industry back, because > instead of improving the ways software produces logs (to produce log > entries with sufficiently good keys that they could be considered to be > in some normal form, suitable for database-like access paradigms), or > configuration file formats (to perhaps arrive at a common language, even > a _rationalized_ XML would suffice), everyone knows that they do not have > to think about any of these things -- some Perl programmer with a very > narrow focus and little clue will always glue things together for you. > Common Lisp cannot compete in that market.
There have been a few efforts to create more sensible, unified configuration file formats (didn't NeXT have some kind of property list format for their applications?) but they mostly seem to have failed. Whether this is because of some inherent shortcoming in the idea, or because the systems administrators who realise it would put them out of a job resist the idea is another matter entirely.
Erik Naggum wrote: > > The reason I thought it's worth specifying is not so much to ensure the > > same interface from vendors (though this would be an obvious bonus), but > > to make sure that they provided regexps at all. It'd be nice to wave > > the CL spec at people and say, "See, it supports regexps, so tell me > > again what it is Perl can offer me over CL?" > What is the problem with using a widely available regular expression > package written in Common Lisp? (I am actually really curious, and do > not ask this rhetorically.)
The reason I don't like the lisp regexp packages available to me is that they are, IMO, poorly designed as natural extensions to the Lisp language. This isn't merely an aesthetic complaint; the fit is so poor that the regexp packages are extremely difficult to use in nontrivial ways.
Everyone who has used regexps (in Emacs, shells, Perl, or wherever) knows that nontrivial regexps are hard to write. This is because a regexp string contains a very high concentration of syntax characters, and because those syntax characters must usually be escaped, and because those escape characters are so often also appear as literal characters of the regexp.
Recently I had to write a html screen scraper and used regexps to extract the fragments in which I was interested. It wasn't pretty. Here are two examples from that program:
These are nearly write-only code. They are hard to write, and even harder for a human to scan. (The #\newline business is relevant only to the extent that I needed to reconcile the differences between newline treatment in regexp expressionss, in html, and in lisp source. Even without the newline crocks, the regexps are unreadable as Lisp code.) It seems so odd that sophisticated programmers can argue in favor of regexp while simultaneously participating in the LOOP and IF* religious syntax wars.
Most of this mess arises IMO because regexps are specified as strings. The logically first thing a regexp implementation must do is parse that string into some sort of semantic tree (which is later transformed to generate pseudocode for a regexp engine, or perhaps real Lisp code). That tree is (or could be) much clearer for humans both to write and to read. It is almost logically necessary for the implementation to have such a tree internally, but usually it is not documented, externally available, or designed in a clean enough syntax for expernal use.
So why do Lisp regexp implementations code regexps as strings?
I suspect the reason is that most programmers first encountered regexps in Unix land, where the common languages (shell, grep, etc.) had no viable alternatives. Indeed, one regexp imlpementer with whom I privately discussed this said something to the effect "But regexps _are_ strings." (Elisp had other alternatives, but I suspect the retention of strings was cultural.) Since a regexp is fundamentally a tree (often extended to allow variable unification, i.e., "whatever matches this node here must be the same as what matched that earlier node there") it is otherwise incomprehensible that the regexp binding in a language where trees were the first datatype would not use trees!
(Anyone who doesn't understand that a "regular expression" is essentially a representation of a Finite State Machine state transition diagram should consult a useful resource such as
but you might as well stop reading when you come to the section "Regular Expressions in Unix" which begins the descent into Hell.)
In detail, it would be no big deal also to provide a front end that parses string-form regexps into the tree (and maybe also the other direction). But IMO it is first and foremost that some tree expression be the primary form of regexp expression in Lisp.
I don't mean to slew this thread into a design of regexp, so I'll close by explaining what this has to do with Erik's question:
> What is the problem with using a widely available regular expression > package written in Common Lisp? (I am actually really curious, and do > not ask this rhetorically.)
Now, it is (to me) compelling that a standard Lisp regexp api should use some tree form expression rather than regexp strings. Others may disagree, and some appropriate standards body could deliberate the question and form a decision. However, the current way the community adds things like regexps and *ml tools is for one hacker is just to code something and make it available. It is both the blessing and curse of Lisp that one person can so easily do so much. There is no guarantee that a group of people will make better choices than a single person, but there is greater chance that more issues will be considered, because each issue is more likely to occur to at least one member of a group than to a single individual. Indeed, the refinement that the standardization process can bring could prevent regexp adding fodder for years more religious wars such as we have from time to time over LOOP and IF*.
Well I agree it does not look nice. So if it is allowed I suggest you check out SCSH (Scheme Shell) please read the handbook (page 112) You might find that nicer. To me it seems to fit nicely into Scheme, so maybe this time the Common Lisper can learn from Scheme ;-)
> * Erik Naggum > > How much of a Common Lisp system do you expect to be written in Common > > Lisp?
> * Tim Moore > > From my experience, anywhere from "a good part" to "almost all."
> That is not my experience. The parts that provide services outside of > the standard are so much larger than the parts that implement Common Lisp > and in order to implement them, you need access to system-specific code, > which means you have to use functions that cannot be expressed in Common > Lisp. That is, if you recursively search for all called functions or > methods and you stop at any function or method defined in the standard, > there will be a relatively small number of functions that terminate this > search with _all_ standard Common Lisp functions or methods.
I concur that in the code for a CL implementation you will often bottom out at OS system or library calls; calls to functions that are written in C or assembler; or magic functions that are open-coded by the compiler, like the ever-confusing
(defun car (x) (car x))
However, I still think of this as Common Lisp, despite inevitable restrictions: generally control flow, declarations, macros, GC, etc. all work as expected at this level. This stands in contrast to other implementation strategies such as writing everything in C; writing s-expressions that are essentially assembly macros; or writing in a "syslisp" which is crippled for easy compilation or translation to C, missing major Lisp features like garbage collection or typeless variables. I do think that this system code makes instructive reading for someone learning Common Lisp, even if they "shouldn't try it at home."
Not to mention that large, interesting parts of a CL implmentation may be mostly written in genuine Common Lisp, such as pretty printing or format; not to mention also that even if the system isn't written in CL but in one of the alternatives above, it's still educational to read the source.
> So why do Lisp regexp implementations code regexps as strings?
There was some discussion here a while back (a year or so?) on a regexp syntax for Lisp which was sexp based, not string-based. I think Will Deakin did some work on this. It relates somehow to scsh as well, which has at least one sexp-based syntax for regexps. I don't know if there's an actual implementation of anything like this for CL - there probably is...
> The reason I don't like the lisp regexp packages available to me is that > they are, IMO, poorly designed as natural extensions to the Lisp language. > This isn't merely an aesthetic complaint; the fit is so poor that the > regexp packages are extremely difficult to use in nontrivial ways.
> Everyone who has used regexps (in Emacs, shells, Perl, or wherever) knows > that nontrivial regexps are hard to write. This is because a regexp > string contains a very high concentration of syntax characters, and because > those syntax characters must usually be escaped, and because those escape > characters are so often also appear as literal characters of the regexp.
> Recently I had to write a html screen scraper and used regexps to extract > the fragments in which I was interested. It wasn't pretty. Here are two > examples from that program:
> These are nearly write-only code. They are hard to write, and even > harder for a human to scan. (The #\newline business is relevant only > to the extent that I needed to reconcile the differences between > newline treatment in regexp expressionss, in html, and in lisp source. > Even without the newline crocks, the regexps are unreadable as Lisp code.) > It seems so odd that sophisticated programmers can argue in favor of > regexp while simultaneously participating in the LOOP and IF* religious > syntax wars.
> Most of this mess arises IMO because regexps are specified as strings. > The logically first thing a regexp implementation must do is parse that > string into some sort of semantic tree (which is later transformed to > generate pseudocode for a regexp engine, or perhaps real Lisp code). > That tree is (or could be) much clearer for humans both to write and to > read. It is almost logically necessary for the implementation to have > such a tree internally, but usually it is not documented, externally > available, or designed in a clean enough syntax for expernal use.
> So why do Lisp regexp implementations code regexps as strings?
Dorai Sitaram's pregexp (Portable Regular Expressions for Scheme and Common Lisp) package allows you to write both strings and trees. Your examples become
This is just for illustration. The point is that you can forget about strings altogether. It's quite nice, if one can say this of regexps in *any* form :-).
mjo...@ipx.frottage.org (Mark Hulme-Jones) writes: > There have been a few efforts to create more sensible, unified > configuration file formats (didn't NeXT have some kind of property list > format for their applications?) but they mostly seem to have failed. > Whether this is because of some inherent shortcoming in the idea, or > because the systems administrators who realise it would put them out of > a job resist the idea is another matter entirely.
I don't think that NeXT's effort failed. They made a configuration API that you were supposed to use, rather than messing with the file directly, though they did, I believe, document the file format. GNUStep still uses it (WindowMaker is probably the only GNUStep app people have, though), and I believe Mac OS X does. I'm not sure about that, but if it does, that's a pretty good case against it being dead.
> > What is the problem with using a widely available regular expression > > package written in Common Lisp? (I am actually really curious, and do > > not ask this rhetorically.)
Are there any? When I've looked, I've found buggy implementations, and "Perl-compatible" implementations, etc., but nothing I'd want to use. Since I only use unix, I just wrap up the POSIX api.
> Now, it is (to me) compelling that a standard Lisp regexp api should use > some tree form expression rather than regexp strings. Others may disagree, > and some appropriate standards body could deliberate the question and form > a decision. However, the current way the community adds things like > regexps and *ml tools is for one hacker is just to code something and make > it available. It is both the blessing and curse of Lisp that one person > can so easily do so much. There is no guarantee that a group of people > will make better choices than a single person, but there is greater chance > that more issues will be considered, because each issue is more likely > to occur to at least one member of a group than to a single individual. > Indeed, the refinement that the standardization process can bring could > prevent regexp adding fodder for years more religious wars such as we have > from time to time over LOOP and IF*.
Lisp regexs should follow the POSIX standard. There are 1001 different, non-conforming regex implementations, and the extra effort that goes into learning their weird extensions are IMO not justified by what they add.
Now, that said, I would like to see regexs specified to follow the POSIX standard, including representation as a string -- however, I see no reason why we shouldn't also make a macro system that produces these strings from a nice, readable symbolic representation. I have a partially done implementation of this, and to the extent that it works, it's nice to use. It's RE's are massively verbose, but then again, there's a lot going on, so they should be. If I don't want them wasting screen space, I'll asign them to a constant or global variable. The best thing, though, is that I can modify them several months later.
Tim Bradshaw wrote: > There was some discussion here a while back (a year or so?) on a > regexp syntax for Lisp which was sexp based, not string-based. I > think Will Deakin did some work on this. It relates somehow to scsh > as well, which has at least one sexp-based syntax for regexps.
Tim Bradshaw wrote: > There was some discussion here a while back (a year or so?) on a > regexp syntax for Lisp which was sexp based, not string-based. I > think Will Deakin did some work on this. It relates somehow to scsh > as well, which has at least one sexp-based syntax for regexps.
You probably mean http://www.ai.mit.edu/~shivers/sre.txt. Here's a small quote from the introduction: This document describes the regular-expression system used in scsh. The system is composed of several pieces: - An s-expression notation for writing down general regular expressions. In most systems, regexps are encoded as string literals. In scsh, they are written using s-expressions.
> I don't know if there's an actual implementation of anything like > this for CL - there probably is...
Not that I know of (but I'd love to be proven wrong).
> These are nearly write-only code. They are hard to write, and even > harder for a human to scan.
Maybe some special syntax (a reader macro) for string literals where one wouldn't have to escape so many regexp-esque characters would be helpful? At least you should be able to get the same level of read- and writability as other regexp interfaces.
> * In message <xcvelpex1we....@apocalypse.OCF.Berkeley.EDU> > * On the subject of "Re: On Lisp" > * Sent on 10 Sep 2001 12:48:49 -0700 > * Honorable t...@apocalypse.OCF.Berkeley.EDU (Thomas F. Burdick) writes:
> Are there any? When I've looked, I've found buggy implementations, > and "Perl-compatible" implementations, etc., but nothing I'd want to > use. Since I only use unix, I just wrap up the POSIX api.
> > These are nearly write-only code. They are hard to write, and even > > harder for a human to scan.
> Maybe some special syntax (a reader macro) for string literals where > one wouldn't have to escape so many regexp-esque characters would be > helpful? At least you should be able to get the same level of read- > and writability as other regexp interfaces.
Or just use another character as the escape/command. #\~ has some precedent in that role...
| Most of this mess arises IMO because regexps are specified as strings. | The logically first thing a regexp implementation must do is parse that | string into some sort of semantic tree (which is later transformed to | generate pseudocode for a regexp engine, or perhaps real Lisp code). | That tree is (or could be) much clearer for humans both to write and to | read. It is almost logically necessary for the implementation to have | such a tree internally, but usually it is not documented, externally | available, or designed in a clean enough syntax for expernal use. ... | In detail, it would be no big deal also to provide a front end that | parses string-form regexps into the tree (and maybe also the other | direction). But IMO it is first and foremost that some tree expression
Such a package isn't exactly a little deal either. Based on a design by Olin Shivers and his Scheme implementation of it, I wrote a package during 1999--2000 that does pretty much what you describe. Not counting my generic library of misc. macros and utility functions, it is ~165kB of code and that doesn't include the actual matching engine (I use PCRE via FFI because writing such an optimized engine would take two more years of my spare time) and few features that should be there.
There is s-expression surface syntax (it can subsume POSIX regexp strings too) which gets transformed into a tree of regexp objects which is the representation used internally (it can also be used from outside). During this transformation some simple optimizations are done. There is also a more complex optimizer. The interface includes a macro takes an s-expression regexp, converts it into an internal format, runs the optimizer on it and compiles it (by first applying an internal representation -> POSIX regexp transformation and then compiling the result with PCRE) if the regexp is static (i.e. it doesn't include code that constructs parts of the regexp at run-time) or if not, transforms the partially optimized regexp into code that constructs and compiles the final regexp at run-time.
I thought I would test it personally for a year or two before releasing it and at the same time write the missing features but it turned out I haven't had much need for it lately so the missing features remain unimplemented and there are two known bugs that I haven't bothered to fix. It also doesn't handle anything else but characters, which seems to be of a concern to some. It doesn't support Unicode --- something I would certainly like to add, although that may be rather hard at the moment as CMUCL, the CL implementation that I use, doesn't support Unicode. The syntax has some LOOPish problems (due to package's Scheme-heritage) that also need to be fixed.
Even just this remaining work feels quite a big deal to me :) Especially Unicode support seemed somewhat involved when I last took a look at it. Sadly there's not much I can do about it at the moment.
One of the things that have kept me away from finishing that code is that I'd really like to implement something along the lines of parser combinators. I've noticed that some people still get excited about the META parsing technique. It would be nice to show them that Common Lisp can do even better than that.
Simon András <asi...@math.bme.hu> wrote: >"Steven M. Haflich" <hafl...@pacbell.net> writes:
>> Recently I had to write a html screen scraper and used regexps to extract >> the fragments in which I was interested. It wasn't pretty. Here are two >> examples from that program:
>> ... >> Most of this mess arises IMO because regexps are specified as strings. >> The logically first thing a regexp implementation must do is parse that >> string into some sort of semantic tree (which is later transformed to >> generate pseudocode for a regexp engine, or perhaps real Lisp code). >> That tree is (or could be) much clearer for humans both to write and to >> read. It is almost logically necessary for the implementation to have >> such a tree internally, but usually it is not documented, externally >> available, or designed in a clean enough syntax for expernal use.
>> So why do Lisp regexp implementations code regexps as strings?
>Dorai Sitaram's pregexp (Portable Regular Expressions for Scheme and >Common Lisp) package allows you to write both strings and trees. Your >examples become
>This is just for illustration. The point is that you can forget about >strings altogether. It's quite nice, if one can say this of regexps in >*any* form :-).
Your expansions are not quite right, because he (Haflich) is using escaped characters as metacharacters, and of course representing them in a Lisp string needs another escape. Perl-type regexps use metacharacters without escape, and such characters need escaping (twice, in a Lisp string) only when they must be treated literally. I find this approach greatly cuts down on the "row of toothpicks" nature of a Lisp regexp string. I don't mind double-escaping the odd metacharacter-as-literal because it doesn't impede the reading of the "control words" in the regexp language.
In this case, Perl-compatibility actually makes for more readable regexps in Lisp. IMCO.
Hannu Koivisto wrote: > One of the things that have kept me away from finishing that code > is that I'd really like to implement something along the lines of > parser combinators. I've noticed that some people still get > excited about the META parsing technique. It would be nice to show > them that Common Lisp can do even better than that.
Depends on what you mean by "better". META is extremely simple and is by far easier to implement than regexps. I still think that regexps are a really bad parsing technique but for very simple things (like URI parsing) they are ok. But why do I need a 165k regex library with FFI bindings to a C-lib to parse simple things like URIs if you get essentially the same in less than 100 lines?
The problem nowadays is that people use regexps (which are nice for editors shells and very simply grammars) for rather complex tasks and create incredible bad and slow code. This is probably the price we had to pay for the millions of perl drones we supported...