Larceny v0.963 fixes about 20 bugs in the recent
v0.962 release that were identified by (or with
the help of) the PLT Scheme R6RS test suite that
Matthew Flatt announced here one week ago today.
Larceny v0.963 still fails 45 of 8843 tests in
revision 10978 of the R6RS test program. These
failures are confined to 4 of the 24 standard
libraries tested, two of which have one failure
each. Of the 45 failures, 35 occur within
(rnrs io ports), and correspond to unimplemented
or deprecated features of that library. Most
other test failures involve negative zero or
branch cuts in inexact arithmetic.
We believe Larceny implements the most important
99.5% of the R6RS, but we will continue to make
improvements. Thank you for your patience and
support.
We must also thank Matthew Flatt for making his
excellent R6RS test program available to other
implementors of the R6RS.
William D Clinger
I know what unimplemented means, but what do you mean by 'deprecated'
here? They don't appear to be deprecated by the R6RS. Normally,
features seem to be deprecated by their creator - as in, Microsoft
deprecates some legacy Windows API.
sam th
Sorry I wasn't clear. In the announcement, "deprecated"
means "deprecated in Larceny". For an explanation of
that concept, and the most current list of things that
are deprecated in Larceny, see
https://trac.ccs.neu.edu/trac/larceny/wiki/DeprecatedFeatures
If you examine the list of R6RS features that have been
deprecated in Larceny, you will find that most of them
have implementation-dependent or partially unspecified
semantics.
> Normally,
> features seem to be deprecated by their creator - as in, Microsoft
> deprecates some legacy Windows API.
Normally, standards organizations establish a process
by which stakeholders can suggest that, in the light
of experience, some things should be added or
deprecated. Since the creators of the R6RS have not
yet established any such process, the lists being
accumulated by individual users and implementors are
the best we have right now.
Will
Thanks for the clarification. I do worry, however, about the
implications of some implementors of a standard declaring that
significant features of that standard are 'deprecated' without more
consensus from the community. I'm thinking particularly of identifier
syntax and the syntactic record layer. The other examples seem more
minor.
>
> > Normally,
> > features seem to be deprecated by their creator - as in, Microsoft
> > deprecates some legacy Windows API.
>
> Normally, standards organizations establish a process
> by which stakeholders can suggest that, in the light
> of experience, some things should be added or
> deprecated. Since the creators of the R6RS have not
> yet established any such process, the lists being
> accumulated by individual users and implementors are
> the best we have right now.
You clearly interact with a nicer set of standards organizations than
I'm used to. :)
sam th
An R6RS standard "consensus" means 60% or more.
> I'm thinking particularly of identifier syntax and the
> syntactic record layer.
Short of polling the broad R5RS/ERR5RS/R6RS community
served by Larceny (and this newsgroup), perhaps the best
we can do is to test a bunch of systems to see how many
of them even support those features.
Here's a quick check of the default interactive top levels
provided by eleven systems, plus two non-default top levels
for Larceny just to show our broad-mindedness:
System identifier-syntax R6RS define-record-type
====== ================= =======================
Bigloo no no
Chicken no no
Gambit no no
Ikarus yes yes
Kawa no no
Larceny (default) no no
Larceny (ERR5RS) no no
Larceny (ERR5RS+(rnrs)) yes yes
MIT Scheme no no
MzScheme (default) no no
Petite Chez yes no
scm no no
Ypsilon yes yes
That adds up to an R6RS standard consensus against those
two features.
Users often look to implementors for guidance on which
features of a standard are regarded as good programming
practice and which are regarded as unfortunate warts on
the standard, to be avoided if possible. Larceny's
lists of deprecated features offer precisely that kind
of advice.
For years, implementors have been advising users not to
use things like eval, call-with-current-continuation,
set-cdr!, and so forth unless there's a really good
reason for it, and have been suggesting programming
practices that allow users to avoid those standard
features of the R5RS. The need for that advice did not
disappear with the R6RS; indeed, the need became even
greater because the R6RS is so new, so large, and so
controversial.
Will
As a user trying to develop a portable system in R6RS Scheme,
I am grateful for any advice. But "deprecating" a feature
of software means more than advising not to use it, or
recommending something else. It expresses a warning to
programmers that this feature should be avoided, because
it will not be supported by the implementation in the future.
Will Larceny continue to comply with all of the R6RS standard in
the future, or are there plans to remove the "deprecated"
features?
Also, survey of support for R6RS features be
limited to implementations which aim to support the R6RS standard.
-Tom
Sorry, this should have read:
"Also, shouldn't a survey of Scheme implementation to assess their
support
for R6RS features be limited to implementations which aim to support
the
R6RS standard?"
Of course implementations which do not support R6RS at all do not
support
R6RS's identifier syntax or its syntax for defining record types.
This isn't
very surprising, or very informative.
-Tom
I think its not helpful to survey non-R6RS systems for the presence of
R6RS features. This survey would show that many other things should be
'deprecated', such as almost every procedure introduced in the R6RS,
which I wouldn't think is what you intended.
> Users often look to implementors for guidance on which
> features of a standard are regarded as good programming
> practice and which are regarded as unfortunate warts on
> the standard, to be avoided if possible. Larceny's
> lists of deprecated features offer precisely that kind
> of advice.
>
> For years, implementors have been advising users not to
> use things like eval, call-with-current-continuation,
> set-cdr!, and so forth unless there's a really good
> reason for it, and have been suggesting programming
> practices that allow users to avoid those standard
> features of the R5RS. The need for that advice did not
> disappear with the R6RS; indeed, the need became even
> greater because the R6RS is so new, so large, and so
> controversial.
I do not, of course, disagree that users need advice on this subject.
However, while I recommend that in most cases, `eval' is not the right
solution, I wouldn't describe it as 'deprecated' in the same way that
the `transcript-on' procedure might be, or in the way that the `call-
with-binary-input-file' in Larceny is apparently deprecated.
Deprecated has a particular specific meaning, involving future
maintenance, and I think it's useful to keep it.
sam th
Although our usage of the word in Larceny may not
conform to your expectations, our usage is consistent
with the usage described by the current Wikipedia
article devoted to the subject [1].
Will
> Will Larceny continue to comply with all of the R6RS standard in
> the future, or are there plans to remove the "deprecated"
> features?
The implementors of Larceny wish to support all relevant
standards for Scheme.
Regarding features of the R6RS that are deprecated in
Larceny, we have no plans to remove those features so
long as the R6RS remains the relevant standard for
R6RS-style programs. On the other hand, we share the
hope expressed by several R6RS editors that the R6RS
will soon be superseded by revisions that address its
problems. In the case of several R6RS features that
have been deprecated in Larceny, the simplest and best
solution would be to remove the misfeature.
Deprecation of an R6RS feature in Larceny is not meant
to signal our determination to remove the feature from
Larceny at the cost of compatibility with the relevant
standard. It is meant to warn our users not only that
we advocate the removal of that feature from the next
revision of the relevant standard, but also that we
think there is or will be a growing consensus within
the R6RS community that the feature should be removed
or its specification changed in the next major revision
of the R6RS.
> Also, shouldn't a survey of Scheme implementation to
> assess their support for R6RS features be limited to
> implementations which aim to support the R6RS standard?
Not when the question to be answered is whether Larceny's
designation of an R6RS feature as deprecated reflects a
consensus within the community served by Larceny.
The R6RS added two new kinds of program to the traditional
structure of Scheme programs, but did not address the
traditional program structure in any way. It also did
not address the semantics of an interactive top level,
which is critical for educational applications and
significant even for developers of R6RS-style programs
(as witnessed by the fact that most (all?) implementators
of the R6RS have chosen to provide an interactive top
level, and differences between the semantics of those
top levels have already created some confusion and
frustration among R6RS programmers).
The IEEE/ANSI/R5RS standards therefore remain the relevant
standards for the development and execution of libraries
and programs written in the traditional style. ERR5RS
is a recent development intended to help bridge the gap
between the R5RS and R6RS.
Those R5RS-style libraries and programs are important,
as can be seen by examining the libraries collected in
SLIB, as SRFIs, and by implementations such as Chicken,
MzScheme, and even Larceny. To cite just one example,
imagine what it would be like to develop a web app in
R6RS Scheme without relying on existing R5RS-style
libraries for sockets, SSAX, or databases.
Interoperability between R5RS-style and R6RS-style code
has therefore become a critical issue for the Scheme
community. It is to be expected that implementors who
support users relying on both R5RS-style and R6RS-style
libraries and programs will notice problems that might
not be observed within the more insular R6RS-only
implementations.
In Larceny, decisions concerning which features should
be deprecated take interoperability and portability
into account. Because Larceny is one of the leading
implementations of IEEE/ANSI/R5RS/ERR5RS/R6RS Scheme,
we have a broad range of users. Some of them are
developing or maintaining very large libraries or
programs that run in several different implementations
of Scheme. A few of those libraries and programs now
run under some IEEE/ANSI/R5RS implementations as well
as some R6RS implementations. It would be a mistake
for us to ignore portability issues that affect the
developers and maintainers of those programs.
That is why our surveys of current practice include
a broad range of implementations.
Finally, I hope you will pardon my observation that
the two implementations that provide the best support
for interoperation between R5RS-style and R6RS-style
libraries (PLT Scheme and Larceny) also appear to be
the most R6RS-compatible implementations at this time,
at least in terms of their performance on PLT Scheme's
R6RS test program. Those implementations had a head
start, of course, but that head start came about
because they took advantage of legacy code instead of
throwing it away.
Hence R6RS-style libraries are not the only ones that
matter, even if you think you care only about the R6RS.
Will
Thank you for your long and thoughtful response.
I hope you will not mind me saying that I think your
view of Scheme standards is unnecessarily complicated.
Of course this is only my opinion.
In my view, R5RS has been superseded by R6RS and is thus
no longer an active standard. (The IEEE and ANSI standards
are another matter, because they are from other standards bodies.)
Excuse me, but I always cringe when you refer to ERR5RS in the same
breath as R6RS and other Scheme standards. Whatever the technical
merits
of ERR5RS, which I do not doubt, it simply does not have the standing
or legitimacy of these other standards ... at least not yet.
I agree that it is important to be able use and reuse the
considerable body of existing R5RS Scheme code. But this goal can be
achieved
by porting R5RS code to R6RS. This is not difficult.
We already have quite a bit experience doing this as a community.
Most of the SRFIs have
been ported. As has all of the SSAX and SXML libraries you
mentioned.
(We contributed to this effort. Our ports are available in the SVN
repository
at http://carneades.berlios.de)
That said, I think its great that Larceny and PLT Scheme show how it
is
possible to support both the new standard as well as legacy Scheme
code in
one implementation. A few people may want to continue to use R5RS,
for example for educational
purposes. But most people will, I suspect, want to move their code to
R6RS
with time.
I'm glad you are working with the other implementors of R6RS to
overcome its
problems moving forward. For starters, I really hope that Larceny
will soon
implement the nice naming convention for implementation specific
libraries adopted
by both PLT Scheme and Ikarus, since our application depends on this
(non-standard) feature.
Cheers,
-Tom
Whether your view represents a consensus of the Scheme
community is an interesting and important question.
> For starters, I really hope that Larceny will soon
> implement the nice naming convention for implementation
> specific libraries adopted by both PLT Scheme and Ikarus,
> since our application depends on this (non-standard)
> feature.
Larceny will implement that naming convention for v0.97
[1]. In the meantime, I don't understand why you can't
use the more elegant technique described by the tip in
section 3.3.2 of Larceny's current user manual [2].
Will
[1] https://trac.ccs.neu.edu/trac/larceny/ticket/517
[2] http://larceny.ccs.neu.edu/doc/user-manual-alt.html#R6RSLibraryPathSection
I don't see the relevance of "consensus" in this context. There are
basically two kinds of standards, de facto standards and de jure
standards. Neither require consensus. De facto standards are
standards by virtue of their having achieved a "dominant position in
the market." [Wikipedia] De jure standards are standards by virtue of
the authority of the organization which created the standard. Which
kind of standard are R5RS and R6RS? I think the RnRS committees can
claim some legitimacy and authority. Thus, arguably the RnRS series
are de jure standards. If you accept this, then it follows that R6RS
supercedes R5RS and the current de jure standard from this
organization. Or do you want to argue that the R6RS is not the
legitimate successor of the R5RS committee?
If, on the other hand, we view the RnRS standards as only de facto
standards, then we have the empirical question about which standard is
"dominant"? And can they both be de facto standards at the same
time? It seems to early to answer this question, less than a year
after R6RS was ratified. A lot of good things have happened in this
short period, and R6RS implementations and libraries are coming along
very well. But it would seem premature to call them dominant.
Either way, I don't see how both R5RS and R6RS can both be considered
to be current standards.
On second though, perhaps there is one scenario, where R6RS is the de
jure standard but R5RS remains dominant, and thus becomes or remains
the de facto standard. But does anyone really expect this to
happen? The main reason, maybe the only reason, to be interested in
Scheme standards, is so as to be able to write and share portable
code. Otherwise just use the latest and greatest version of PLT
Scheme (not its R6RS mode), Gambit, or whatever suits the job. And
who would honestly prefer to limit themselves to R5RS for this
purpose, now that there are several good R6RS implementations, all of
them portable and Open Source?
>
> > For starters, I really hope that Larceny will soon
> > implement the nice naming convention for implementation
> > specific libraries adopted by both PLT Scheme and Ikarus,
> > since our application depends on this (non-standard)
> > feature.
>
> Larceny will implement that naming convention for v0.97
> [1]. In the meantime, I don't understand why you can't
> use the more elegant technique described by the tip in
> section 3.3.2 of Larceny's current user manual [2].
>
> Will
>
> [1]https://trac.ccs.neu.edu/trac/larceny/ticket/517
> [2]http://larceny.ccs.neu.edu/doc/user-manual-alt.html#R6RSLibraryPathSe...
Thanks for the tip, but this wouldn't this require us to replicate
much of the directory tree of libraries, just to include Larceny
specific versions of a few of them? And require our users to use the -
path parameter every time they want to use our library, instead of
being able to install in conveniently along with the Larceny
collection of libraries?
-Tom
May be a more important reason is to be able to *use* other people's
code? At this point in time, code depending only on the R5RS subset
is clearly more portable (as many more implementations support it).
Many of us are perfectly happy with R5RS and you will deprive us of
your superior (but R6RS dependent) code unless we move away from the
implementations we like and use (doing so has its own costs).
>But "deprecating" a feature
>of software means more than advising not to use it, or
>recommending something else. It expresses a warning to
>programmers that this feature should be avoided, because
>it will not be supported by the implementation in the future.
IME, most vendors of software that interfaces to foreign code
(compilers, libraries, operating systems, etc.) are pretty reasonable
about deprecation. It usually does not mean the feature will suddenly
disappear, but rather that no improvements are to be expected and the
help desk won't respond to questions or bug reports concerning
versions in which the feature is officially unsupported.
Developers who are really concerned about being orphaned just version
the whole tool chain - including the operating system(s) if necessary.
George
--
for email reply remove "/" from address
In principal we'd like to accommodate everyone. But there are limits.
Honestly, if it weren't for R6RS we probably wouldn't be writing our
application in Scheme at all. Even now, Java FX is beginning to look
pretty attractive
to us. (We are implementing a GUI using it.) But I'd like to continue
using Scheme for the backend so long as this makes sense. We are a
small team and a rich language with a large set of portable libraries
makes us more productive. R5RS offers neither.
All I can suggest to those bound for whatever reason to legacy Scheme
implementations is to lobby the implementors to upgrade their
implementation to comply with the only relevant standard, R6RS. Or
vote with your feet.
We are trying to do our part by complying with the standard rather
than, say, allowing ourselves to be seduced by the sexy new features
of the latest version of PLT Scheme, for example. We hope that people
interested in using our code will be willing to do their part as well.
Sorry.
-Tom
Thomas Gordon wrote:
> On second though, perhaps there is one scenario, where R6RS is the de
> jure standard but R5RS remains dominant, and thus becomes or remains
> the de facto standard. But does anyone really expect this to
> happen?
Yes. (Modulo your peculiar definition of "de jure", which
is often taken to mean having the force of law. Where I
live, the R6RS does not have the force of law and is not
likely to acquire it.)
Indeed, some people *hope* that will happen.
Personally, I *fear* it will happen. I think the best way
to prevent that outcome is for the process that led to the
R6RS to demonstrate its continuing viability by revising
the R6RS and producing an R7RS that addresses the problems
of the R6RS. Pretending those problems don't exist won't
cut it.
> Thanks for the tip, but this wouldn't this require us to replicate
> much of the directory tree of libraries, just to include Larceny
> specific versions of a few of them?
No. Larceny allows you to put all libraries of the form
(myprogram * ...) into a single file named myprogram.sls.
Hence you could put all of your Larceny-specific versions
of libraries into larceny/myprogram.sls, and you could put
the larceny directory wherever you want. You could even
place the larceny directory within the same directory that
contains all of your portable libraries.
You don't have to put all of your Larceny-specific code
into a single file. You can collapse the Larceny-specific
directory tree as much or as little as you like. Notice
how much power you get from simple but good design.
> And require our users to use the -path parameter every
> time they want to use our library, instead of
> being able to install in conveniently along with the Larceny
> collection of libraries?
No.
For one thing, I assume you know how to write a two-line
shell script or batch file.
If you expect your users to call larceny directly instead
of going through a shell script or batch file, then they
will already have to know how to specify the options on
Larceny's command line, which bear little resemblance to
the command-line options of Ikarus or PLT Scheme or
Ypsilon.
Nothing stops you from installing your code into one of
the directories that are searched by Larceny's standard
library search path. If you were to install your
Larceny-specific versions into a directory in that search
path that is searched before the directory that contains
your portable versions, then the --path option would be
unnecessary.
Finally, you could just edit the startup.sch file in
Larceny's root directory by adding paths to both the
Larceny-specific and portable versions of your code,
making sure the Larceny-specific path comes first.
Will
My definition wasn't peculiar, just correct :-) Seriously, I spoke
only of "authority" and "legitimacy", not legal rights.
> I think the best way
> to prevent that outcome is for the process that led to the
> R6RS to demonstrate its continuing viability by revising
> the R6RS and producing an R7RS that addresses the problems
> of the R6RS. Pretending those problems don't exist won't
> cut it.
Agreed. But pretending there are other currently relevant Scheme
standards than R6RS isn't the way to accelerate this process.
Wouldn't we all get to were we want to be quicker by supporting and
promoting R6RS at the same time we work to improve it? Much of your
rhetoric seems calculated to drive people away from R6RS, rather than
motivating them to cooperate to improve it.
Finally, thanks for the clarification about how to use Larceny's
mechanism for Larency specific libraries. I'll give a try but am
looking forward the next version of Larceny, which uses the same
mechanism for this as PLT Scheme and Ikarus.
-Tom
Sorry, but that doesn't work in v0.963. I got the design
confused with the implementation. It should work in v0.97.
Will
I was only talking about portability. If portability of Scheme code is
the main requirement, then R5RS is the way to go, not R6RS as you were
saying earlier. Now you say you also want a large set of libraries
(presumably not in R5RS) and that is fine but it is a different
argument.
That's an interesting argument.
The engineering world generally considers a standard body document -
what you're calling de jure standards - to live and be in force
forever ... the fact of technical or marketplace obsolescence does not
matter. When an existing standard is compatibly subsumed by a new
version the standard body will eventually obsolete the older version.
However incompatible versions of the same standard exist in parallel
forever.
So the question is really whether you think the RnRS committee to be a
standard body and it's dribblings to be standards. Given the
historical nature of Scheme as a research and experimental language,
the less formal nature of the RnRS process and the fact that the
committee explicitly calls its documents "reports" rather than
"standards", I'm more inclined to consider them de facto rather than
de jure and regard them as suggestions to implementors. Recall that
there is also an IEEE/ANSI standard for Scheme - it may be hopelessly
out of date, but it is there nonetheless.
If you do consider the RnRS committee to be a standard body and its
documents to be standards, then by conventional engineering practice
we have at least 2 incompatible standards living in parallel ... I'm
not sure how the, circa R3RS, IEEE standard compares to R5RS. For
vendors wanting to maximize market share, the only reasonable choice
is to implement all of the most popular ones.
Here's the Wikipedia definition (in case Richard Stallman is reading
this):
"software features that are superseded and should be avoided"
Have any of the R6RS features you list as deprecated been superseded?
Also, I note that by the standard you propose, the R6RS hashtable
library should also be deprecated.
System identifier-syntax R6RS hashtables
====== ================= =======================
Bigloo no no
Chicken no no
Gambit no no
Ikarus yes yes
Kawa no no
Larceny (default) no no
Larceny (ERR5RS) no no
Larceny (ERR5RS+(rnrs)) yes yes
MIT Scheme no no
MzScheme (default) no no
Petite Chez yes no
scm no no
Ypsilon yes yes
Is that your position?
sam th
I'm not sure whether or not the RnRS committees can be considered
authoritative, legitimate standards body. Maybe it can be, if the
Scheme community to a sufficient extent accepts the committee as such.
My argument was by cases. I tried to argue that R5RS and R6RS cannot
both be standards at the same time, no matter whether they are de jure
or de facto standards. But you've countered my argument successfully
by pointing out this is not uncommon for engineering standards, which
I didn't know.
I agree that it would be in the interest of *implementors* for them to
support as many standards as they can. But I doubt this is in the
interests of users, or at least not a majority of us. Our interest, I
should think, is in sharing libraries. And all these different
standards are getting in the way of us working together towards this
end.
In the end I don't think it matters much whether the RnRS reports are
de jure standards or not. From a practical point of view, all that
matters is whether not people adopt them and use them. Being a de
jure standard is only a very weak argument for actually using it.
There are lots of dead de jure standards.
It should be clear that I favor R6RS. The focus of R6RS proponents
should be on building good implementations, tools and libraries. And
everything else will take care of itself.
-Tom
You quoted only part of the following sentence:
In computer software standards and documentation,
the term deprecation is applied to software features
that are superseded and should be avoided.
That sentence does not rule out the possibility of
applying the word to other kinds of software, and
the second, third, and fourth bullet items of the
following paragraph give reasons why software may
be deprecated even if it has not been superseded
in your favorite sense of that word.
> Have any of the R6RS features you list as deprecated been superseded?
Yes. The R6RS syntactic layer would be a good example.
It was dead on arrival, but was superseded by ERR5RS
records within Larceny for a couple of months before
any implementations of the R6RS even came out.
Don't forget, by the way, that we are talking about
features that are deprecated in Larceny. I don't
have a problem with you using the R6RS library for
syntactic records, but no uses of that library will
be allowed into the Larceny code base beyond the
bare minimum needed to implement and to test the
feature for Larceny's R6RS modes. The technical
reason for that is well-documented and well-known:
the R6RS syntactic library uses a notion of record
type that is not fully compatible with the notion
used by the R6RS procedural record layer.
Most developers who understand that technical fact
will choose to use only one of the two incompatible
notions of record type while deprecating the other.
I have a responsibility to document that choice for
Larceny, and I have done so. I have also provided
Larceny's users with guidance concerning their own
choices. They are free to ignore my advice, and
their programs will continue to run in Larceny,
but their programs will not operate as seamlessly
with Larceny-specific libraries as if they had
followed my advice.
> Also, I note that by the standard you propose, the R6RS hashtable
> library should also be deprecated.
No. Most implementations of Scheme support some
hash tables that are reasonably similar in concept
to those of the R6RS.
Few implementations support identifier-syntax. Few
support a notion of record type that is similar to
that of the R6RS syntactic layer. Most support a
notion of record type that is similar to either the
procedural or the syntactic layer of ERR5RS records.
Will
> Most support a
> notion of record type that is similar to either the
> procedural or the syntactic layer of ERR5RS records.
The reason that's confusing is that the procedural and
syntactic layers of ERR5RS records both use exactly
the same notion of record type. That is the primary
reason (among many) that the ERR5RS record system is
superior to the R6RS system.
Fortunately, the ERR5RS system is every bit as compatible
with the R6RS record system as the R6RS record system
is with itself. Furthermore the ERR5RS record system
has been implemented as portable R6RS libraries.
Therefore absolutely no compatibility or portability
is lost by using the ERR5RS record system instead of
the R6RS systems.
Indeed, the ERR5RS record libraries are an excellent
example of how the R6RS library, macro, and record
systems can be used to construct useful and portable
libraries.
Will
> CommonLarceny.exe Larceny.fasl
source code will be compiled as it is loaded. When
the Larceny.fasl file is not specified on the command
line, the load procedure will interpret instead of
compiling. Either way, code typed at the interactive
top level will be interpreted. (That may change in
future releases.)
Common Larceny v0.964 incorporates the bug fixes and
most other improvements that have been made to native
and Petit Larceny as of v0.963, but still runs only
in Larceny's traditional R5RS mode. We expect to add
ERR5RS/R6RS modes to Common Larceny in v0.97.
Will
I really wish it were. There's no way to use R6RS hashtables as
a simple cache without needing to recompute the hash value on a
cache miss. And the implementation is clumsy:
(define hashtable-ref/cache!
(let ((miss (list 'miss)))
(lambda (tab key thunk)
(let ((x (hashtable-ref tab key miss)))
(if (eq? x miss)
(let ((res (thunk)))
(hashtable-set! tab key res) ; 2nd hashing
res)
x))))
Specifying `hashtable-ref/cache!' as part of the library of
course would allow for it to be optimized. Alternately, a
SRFI-69-style -update! would be usable if it were modified
to return the updated value.
Also, the fact that `hashtable-keys' and `hashtable-values'
return vectors instead of lists is just plain bizarre.
--
Alex
> There's no way to use R6RS hashtables as a simple cache
> without needing to recompute the hash value on a cache miss.
That depends on the implementation. An R6RS implementation may
cache one or more most recently used keys and their associated
positions/buckets in the table in order to speed up successive
operations on the same key. Additionally, computing the hash
value for eq-hashtables, symbol tables, etc., is usually a fast
operation that you need not worry about it (in Ikarus, it's two
shift instructions that convert a pointer to a fixnum, dropping
the lowest tag bits).
A couple of days ago, I completely rewrote Larceny's
implementation of SRFI 69 to use R6RS hashtables.
The motivations:
interoperability: SRFI 69 hash tables are
now the same as R6RS hashtables, which are
also the same as Larceny's deprecated old-style
hashtables.
performance: The Sassy assembler appeared
to be spending more than 20% of its time in
the hash-table-ref and hash-table-set!
operations of SRFI 69.
Using R6RS hashtables to implement SRFI 69 had no
significant impact on the performance of Sassy,
so concerns about the performance of R6RS hashtables
are misplaced.
We expect to see some significant improvement as
we tune Larceny's implementation of hashtables,
however, especially if we also rewrite Sassy to
use hash-table-ref/default and hash-table-update!.
To expand on what Aziz said, hashing on object
identity can be expected to be much faster with
R6RS hashtables than with SRFI 69, because R6RS
hashtables do not expose that hash function to
clients. To implement SRFI 69's default hash
function, I had to use an auxiliary hash-on-eq
hashtable that remembers the pseudo-random hash
values returned for pairs, strings, bytevectors,
vectors, and other mutable objects whose address
in memory may change. That auxiliary hashtable
will treat its keys as weak pointers (when I get
around to it), and my implementation avoids use
of that auxiliary hashtable for most immutable
objects, but it represents a definite performance
problem that was built into the design of SRFI 69.
That is why SRFI 69 is deprecated in Larceny.
In most (but not all) respects, R6RS hashtables
are better.
Will
I'm not suggesting SRFI-69, those tables have the same
performance issue.
The problem is that in the common idiom of using hashtables
as a cache you're forced to recompute the hash function
twice on a cache miss (once for the initial -ref, and then
again for the -set!). One could easily provide an API
where this wasn't necessary.
Obviously the issue is very minor for `eq?' hash tables
where hashing is fast, but not so for `equal?' or `string-ci=?'
or other tables with relatively expensive hash functions.
And in these cases the meta-caching Aziz suggests would
likely be a pessimization.
[And seriously, what's up with returning vectors for
hashtable-keys and hashtable-values?]
--
Alex
> [...] Obviously the issue is very minor for `eq?' hash
> tables where hashing is fast, but not so for `equal?' or
> `string-ci=? or other tables with relatively expensive
> hash functions. And in these cases the meta-caching
> Aziz suggests would likely be a pessimization.
For any expensive hash function, the implementation of hash
tables can cache the last key used (regardless of whether the
key was found in the table or not) and its computed hash value.
In your example of hashtable-ref/cache!, since they same key
is used twice in succession, the cached hash value that was
computed the first time (when you did hashtable-ref) will be
used the second time (when you do hashtable-set!). How is this
a "pessimization"?
Aziz,,,
Not in this case, but in general it can be a pessimization.
Because of the overhead for storing this in the first place,
and because the equality function may also be expensive. For
every case where you're not accessing the same key twice,
you're making an extra useless call to the equality function.
Either way, the standard gives no way to guarantee consistent
performance. By simply providing the suggested procedure,
or an equivalent, users can be guaranteed the hash function
will be called at most once per cache-ref operation.
--
Alex
You seem to be thinking of a cache that records
the key/value pair from the previous lookup. As
implemented just now in Larceny, the cache
records only the key and its hash.
Although this can be a very slight pessimization
when the hash function is lightning-fast and the
pattern of accessing the hashtable always leads
to a cache miss, there are no circumstances in
which caching the key/hash pair can be a major
pessimization.
I have confirmed that by benchmarking the pessimal
cases in Larceny.
(To prevent the cache from retaining a key that
is no longer in the table, you have to arrange
for the cache to be cleared when keys are removed
from the hashtable. I haven't implemented that
yet, but it's a very minor consideration because
it can be one of the occasional cleanups scheduled
by the same mechanism used to implement weak
pointers and finalization.)
> Because of the overhead for storing this in the first place,
> and because the equality function may also be expensive. For
> every case where you're not accessing the same key twice,
> you're making an extra useless call to the equality function.
As implemented in Larceny, the overhead of storing
consists of a call to cons followed by an assignment.
The only equality function used to compare for a
cache hit is eq?, which is very fast. If the current
key is not eq? to the cached key, Larceny goes
through the normal hash function and updates the
cache.
> Either way, the standard gives no way to guarantee consistent
> performance.
Untrue. A hash function that returns distinct
values for subsequent calls on an eq? object is
not a sensible hash function, nor does it appear
to be a legal hash function according to R6RS
Library section 13.1. That section explicitly
allows hashtables to cache the results of calling
the hash function, "so programs cannot rely on
the hash function being called for every lookup
or update." Therefore the caching I described
is safe on all sensible or legal hash functions.
The semantics you advocate would also involve
some similar legalese concerning when the hash
function is called. You're just arguing over
the wording of the fine print. For example:
> By simply providing the suggested procedure,
> or an equivalent, users can be guaranteed the hash function
> will be called at most once per cache-ref operation.
For single-threaded code, that guarantee is also
provided by the algorithm I described above. For
multi-threaded code, the algorithm I described is
probably faster than the one you are advocating,
because your algorithm would require some kind of
locking to guarantee atomicity of the coupled
operations. The algorithm I described does not.
(I admit that assignments may also require locking
on some systems, for reasons I'd just as soon not
try to explain in this newsgroup, but I think
assignments are likely to use lock-free protocols
on most systems.)
Will
No, I was assuming caching the key and its hash.
> Although this can be a very slight pessimization
> when the hash function is lightning-fast and the
> pattern of accessing the hashtable always leads
> to a cache miss, there are no circumstances in
> which caching the key/hash pair can be a major
> pessimization.
I may have misunderstood Aziz' proposal. Is the idea to
only ever compare keys with the cached key using `eq?',
regardless of the hashing and equality functions? In that
case as you say there are no significant pessimization
cases beyond the minor overhead.
> > Either way, the standard gives no way to guarantee consistent
> > performance.
> Untrue.
From the user's perspective, there is no guarantee,
because the suggested caching is not required by the
standard. The standard says the hash function "may
not be called on every lookup." It may still be called
twice for the -ref/cache! idiom. You could add -ref/cache!
to the API with the requirement that the hash function
is only computed once, and this can be guaranteed
across a wide variety of implementations even without
caching key/hash pairs. It's also just a useful
procedure.
Note your proposal can still make multiple calls to
the hash function in single-threaded code when the update
`thunk' accesses the hash table (which is reasonable if
it's a direct, non-mutating access).
And I don't see how the suggested procedure could
possibly require more locks, since it can be implemented
on top of the standard functions. It just allows an
implementation optimization that avoids hashing the key
twice. If anything, your proposal would require more
locking since you're mutating the table even on non-mutating
accesses.
--
Alex
I don't know exactly what Aziz was proposing or has
implemented, but that's all I have implemented or
intend to implement.
> From the user's perspective, there is no guarantee,
> because the suggested caching is not required by the
> standard.
True, but see below.
> And I don't see how the suggested procedure could
> possibly require more locks, since it can be implemented
> on top of the standard functions.
If it were implemented on top of the standard functions
of the R6RS API, there would be no guarantee that the
hash function is called only once.
> If anything, your proposal would require more
> locking since you're mutating the table even on non-mutating
> accesses.
I assume variable references and assignments (set!) are
atomic. Here's the current code [1]:
(let (...
(make-safe-hasher-caching
(let ((cache #f))
(lambda (hf)
(lambda (key)
(let ((keyhash cache))
(if (and keyhash (eq? key (car keyhash)))
(cdr keyhash)
(let ((h (hf key)))
(cond ((and (fixnum? h) (<= 0 h))
(set! cache (cons key h))
h)
((and (exact? h) (integer? h) (<= 0
h))
h)
(else
(assertion-violation
'hashtable
"illegal hash value" h))))))))))
Here are elapsed timings in milliseconds for four benchmarks that
range from the worst case to a medium case:
v0.963 current
pessimal 2752 3003 all cache misses, fixnums,
identity, =
caching1 3821 3929 update pattern, symbols, symbol-
hash, eq?
caching2 5084 5149 update pattern, symbols, equal-
hash, equal?
caching3 9705 7944 update pattern, strings, string-
hash, equal?
Each benchmark performs a million calls to hashtable-set! to
initialize the hashtable (which contains a million elements
for the pessimal benchmark, but only ten thousand for the
other benchmarks), and then performs ten million iterations
of the pattern, which is just a hashtable-ref for the pessimal
benchmark and is the following pattern for the other three:
(if (hashtable-contains? ht key)
(+ count (hashtable-ref ht key 0))
(begin (hashtable-set! ht key 0)
count))
Benchmarks like these convinced me there was more performance to
be lost by not caching than by caching. In particular, Larceny's
compiler seems to run very slightly faster with caching, and that
was (part of) the motivation for adding caching and for rewriting
Larceny's implementation of SRFI 69 to use R6RS hashtables. Now
any improvements to R6RS hashtables will also benefit SRFI 69.
Will
[1] https://trac.ccs.neu.edu/trac/larceny/browser/trunk/larceny_src/src/Lib/Common/hashtable.sch
Thanks for taking the time to benchmark this. The overhead
is actually more than I expected, presumably because you're
consing on each reference (which I see now is so that you can
`set!' the cache to a single value and avoid the need for
locks).
But to get back to my point, the basic idea of -ref/cache!
is to combine (or refactor) the code for -ref and -set! so
that any implementation strategy can guarantee the hash function
is only called once. Essentially you would have
...
(let ((h (hf key)))
(or (... ref code ...)
(begin
(lock! ht)
(... set! code ...))))
Thus the hash value has already been computed for both the
-ref and -set! cases, and no extra locks are needed. This
may be a little hairier in Larceny because of the gc handling,
but still works (though it isn't really needed because Larceny
uses the caching trick). Depending on the implementation
it may also be a win to combine the search for the hash entry
for both the -ref and -set!.
--
Alex
That sounds good until you notice that the hashtable lock
is being maintained across the call to an unknown thunk
that might never return. That's a very bad idea.
You can get around that by calling the thunk before you lock
the hashtable, but that precludes combining "the search for
the hash entry for both the -ref and -set!" because the
computations performed by the thunk or by concurrent threads
might have resized the hashtable or added or deleted the key.
That means the gain from -ref/cache! is limited to what you
would gain from caching the hash value as in Larceny, and
caching is more effective because it works even for code
that doesn't use your -ref/cache! procedure.
Finally, your proposal doesn't really "guarantee the hash
function is called only once", because the hash function
might be called many times when the hashtable is resized,
and the thunk might perform operations that force the
hashtable to be resized before the thunk returns. You
could get around that by storing the hash value with every
entry in the table, but that would cause every hashtable
to consume up to 50% more space.
Will
Sorry, I'm decsribing an abstract idea, not specific
Larceny code. I already said specifically:
"This may be a little hairier in Larceny because of the gc handling"
Even in Larceny you can separate the case of non-gc-sensitive hash
functions, guarantee only one call to the hash function in that case.
In the eq? hash function cases, you can check if the object has been
moved, but multiple calls to the hash function aren't expensive anyway
in that case anyway.
The point is that the proposed API addition can be made efficient
across a wider variety of implementations.
> Finally, your proposal doesn't really "guarantee the hash
> function is called only once", because the hash function
> might be called many times when the hashtable is resized,
Check and resize the table after the key is found missing
but before adding the key, and then add the new entry using
the same result of the hash function (again, gc-sensitive
keys are a special case).
--
Alex