third ballot results

22 views
Skip to first unread message

Alex Shinn

unread,
Jul 10, 2011, 6:02:14 AM7/10/11
to scheme-re...@googlegroups.com
The third ballot has been closed, and the results tallied and
available at

http://trac.sacrideo.us/wg/wiki/WG1Ballot3Results

A summary of the results follows.

In the re-opened #32 "user-defined types," Hsu's proposal tied with
SRFI-9, which is not enough to overthrow the default, so we'll stick
with SRFI-9.

In #28 we voted to switch to the PortsShinn proposal for binary I/O.
There's still some ambiguity on whether the file-spec notation should
be removed, but since it didn't actually win before I will take it out
of the next draft, and resolve this with the ballot 4 ticket #226.
Ballot 4 will also include I/O tickets #222, #223 and #224.

In the re-opened #83 "auxiliary keywords" we voted to make the
keywords bound. `else` and `=>` will be added to the base module.

In #3 we voted weakly on ModuleFactoringMedernach. We'll include
those changes in this draft, but allow further revisions in the next
ballot. One thing that surprised me was how similar all the proposals
were. Given that, and assuming we take care to avoid inconsistencies,
it will likely make more sense to split this into several items as
members requested previously.

In #137 "current-seconds semantics" we voted this should return TAI
time as an inexact value. There seems to be general sentiment that we
should be loose in the specification.

In #147 we voted to allow literal file spec lists in include and
include-ci, but at the same time voted to remove file spec lists
elsewhere. I'll leave this out for now, and if we keep property lists
as a result of item #226 then they will apply consistently to include
and include-ci.

In #159 we voted for the Shinn proposal that the base program
environment was empty, scheme-report-environment was (scheme base),
and repls were an implementation-defined superset thereof.

In #216 we voted for a separate procedure, `write/simple`, to write a
value without reader labels. There were several requests to change
this to `write-simple`, so I will do so pending a new item for name
proposals.

In #162 we weakly voted to reintroduce the `lazy` syntax from SRFI-45.
This needs to be revisited.

We also voted for changes in the following more minor items:

* #171 Duplicate identifiers in define-record-type
We voted this is an error.

* #112 REPL redefinitions
We voted redefinitions follow the common semantics.

* #132 Imports override previous imports?
We voted this is an error.

* #160 Interleaving of imports and code in a module
We voted for the Scheme48/Shinn semantics that the imports and code
are resolved separately.

* #158 mutating imports
We voted this is an error.

* #148 Allow include-ci at top level
We voted to add this as a complement to include.

* #150 cond-expand at top level
We voted yes.

* #181 Add `when` and `unless` to the base module
We voted yes.

* #149 blob ports
We voted to use John Cowan's proposal.

* #153 Renaming blob procedures
We voted for the new naming.

* #206 Provide read-syntax for blobs
We voted for SRFI-4.

* #154 Physical newline in a string equivalent to \n (that is, U+000A)
We voted yes, a newline is the same as \n.

* #208 Is || a valid identifier?
We voted yes (possibly need to clarify the text).

* #155 Make recursively defined code an explicit error
We voted to explicitly state this is an error.

* #156 Replace "an error is signalled" with "an implementation-dependent
object is raised as if by `raise`"
Most members seemed to be under the impression this was already
implied, but we voted to make an explicit note of it.

* #198 Make it an error for a procedure mapped by `map` and friends to
mutate the result list/string/vector
We voted yes.

* #199 Make it an error for a procedure mapped by `map` and friends to
return more than once
We voted yes.

* #164 Meaning of char-numeric?
We voted to make char-numeric? equivalent to the Unicode Numeric_Digit
property (general category value of Nd).

* #166 Add predicate and accessors for error objects
We voted to add error-object?, error-object-message and
error-object-irritants.

* #133 Provide read-line
We voted yes.

* #122 Make infinity, NaN, and -0.0 semantics consistent with IEEE 754
We are requiring them to be consistent with IEEE-754.

* #138 DivisionRiastradh domain
We voted integers.

* #217 DivisionRiastradh exactness preservation
We voted for the exactness-preserving-unless option.

* #140 Removing `quotient`, `remainder`, `modulo`
We voted to keep these.

* #151 Extend `finite?` and `nan?` to non-real values
We voted yes.

* #152 exact-integer-sqrt inconsistent with multiple values module
We voted to move this to the core.

* #207 Editorial: Polar complex numbers are inexact
We voted yes.

The following items all correspond to changes where we voted to
coincide with R6RS:

* #183 Escaped newline removes following whitespace?
We voted yes.

* #85 Blobs, bytevectors, byte-vectors, octet-vectors, or something else?
We voted to switch to the R6RS name `bytevector`.

* #215 initial value argument to make-blob
We voted yes.

* #118 Simple literals must be explicitly delimited.
We now require these to be delimited.

* #124 Nested quasiquote semantics
We voted for R6RS, which conflicts with the more specific #123, so
assuming strict (i.e. the R6RS specification minus the multiple
arguments).

* #125 Allow procedures not to be locations
Yes, we need to add this to the eqv? specification.

* #126 Partly specify the mutability of the values of quasiquote structures
Yes, we need to add this to the quasiquote specification.

* #127 Specify the dynamic environment of the ''before'' and ''after''
procedures of dynamic-wind
Yes.

* #135 let-values and let*-values
Yes, these will be added tentatively to the base module.

* #172 Multiple returns from `map`
We voted for the R6RS semantics, that multiple returns does not
mutate the previous results.

* #178 Shadowing with internal definitions
We voted for the R6RS language.

* #161 module argument to eval
We voted to add the R6RS procedure `environment`.

* #139 `exit`
We voted to add this.

* #117 Real numbers have imaginary part #e0
We voted exact-only.

* #120 Define the semantics of the transcendental functions more fully
We voted to use the R6RS semantics.

* #121 The semantics of expt for zero bases has been refined
We voted to use the R6RS semantics.

* #195 Editorial: proposed rewording for `begin`
We voted yes, to adopt the R6RS terminology.

* #191 Include `close-port`?
We voted yes.

* #134 Provide flush-output-port
We voted yes.

* #184 Require `char=?`, `string=?` etc. to accept arbitrary numbers
of arguments?
We voted yes.

* #188 Clarify wording of `and` and `or` definitions
We voted yes to use the R6RS definition.

* #187 Clarify duplicate bindings in `let*`
We voted yes, to adopt the R6RS clarification here.

* #174 Safe uses of multiple values
We voted yes.

Finally, for the following items we voted against any change (you can
stop reading now):

* #45 Record-let syntax and semantics
We voted no.

* #119 Whether to treat # as a delimiter.
We voted that it is not a delimiter.

* #123 Extend unquote and unquote-splicing to multiple arguments
No.

* #131 Output procedures return value
We voted no, the return value stays unspecified.

* #141 What are the semantics of modules with respect to separate compilation?
We voted to leave this unspecified.

* #144 strip prefix on import
We voted no.

* #163 Allow modules at the REPL?
We voted no.

* #167 Add constructor for error objects
We voted no.

* #169 Add standard-*-port procedures
We voted no.

* #170 Add with-error-to-file procedure
We voted no.

* #173 Unifying `begins`
We voted no change from R5RS.

* #175 Control of significant digits or decimal places in `number->string`
We voted no.

* #176 Are string ports exclusively character ports?
We voted to leave this unspecified.

* #177 Distinguish file and string ports?
We voted no.

* #180 Make case and cond clauses into bodies
We voted no.

* #182 Add `while` and `until`
We voted no.

* #185 Add sixth "centered" division operator
We voted no.

* #200 Completing the blob procedures
We voted no.

* #205 Roll partial-blob-copy(!) into blob-copy(!)
We voted no.

--
Alex

Alex Shinn

unread,
Jul 10, 2011, 10:53:24 AM7/10/11
to scheme-re...@googlegroups.com
We have inconsistent results in #172, #198 and #199.

First, #198 says "return value" but I'm assuming that's
a mistake and the intention was the R6RS semantics
that proc can't mutate the argument lists? At least one
other member assumed that in their rationale. That's how
I wrote it up. If it's wrong, we need a rationale for this
novel restriction.

#172 is another R6RS restriction which says that if
map returns multiple times, it must not mutate the
previous values. This just says you have to cons
up a new list - you can't use set-cdr! in the map
implementation. This is good because there is
nothing unusual about multiple returns in Scheme,
and it's desirable to support them as best as possible.

#199, however, makes the somewhat contradictory
restriction that it's not allowed for the procedures to
return multiple times. If map isn't allowed to use
mutation, then there's no reason for this restriction.

Moreover, #199 is a very drastic change and I
should have come down stronger on it and required
a rationale and use cases. It breaks existing R5RS
programs, including simple cases like

(map any-proc-using-amb ls)

In light of this, I've closed #199 as invalid despite
it winning the vote. Any member may re-open it,
but I'm going to require a strong rationale and more
discussion.

--
Alex

Emmanuel Medernach

unread,
Jul 10, 2011, 2:59:29 PM7/10/11
to scheme-reports-wg1


On Jul 10, 12:02 pm, Alex Shinn <alexsh...@gmail.com> wrote:
> The third ballot has been closed, and the results tallied and
> available at
>
>  http://trac.sacrideo.us/wg/wiki/WG1Ballot3Results
>
> A summary of the results follows.
>

Nice summary, Thanks a lot Alex for doing it ! And thanks also to all
voters, however I am a a bit concerned that there are fewer voters.

> In the re-opened #32 "user-defined types," Hsu's proposal tied with
> SRFI-9, which is not enough to overthrow the default, so we'll stick
> with SRFI-9.
>

Ok, there was a Condorcet's Paradox on the last ballot. My feeling is
that there is no "one record system to bind them all", and as SRFI-9
is so widespread it is after all a good choice to put it in the
standard (as it is "for better and for worse"). This is why I
suggested to have an open door for others and modules like (scheme
records srfi-9) or (scheme records Tiny-CLOS).

> In #3 we voted weakly on ModuleFactoringMedernach.  We'll include
> those changes in this draft, but allow further revisions in the next
> ballot.  One thing that surprised me was how similar all the proposals
> were.  Given that, and assuming we take care to avoid inconsistencies,
> it will likely make more sense to split this into several items as
> members requested previously.
>

Thanks, Yes, It seems we agree on most part of module factoring. But I
am sure not everyone approve ModuleFactoringMedernach, so splitting it
is recommended.

> [...]
>   * #160 Interleaving of imports and code in a module
>     We voted for the Scheme48/Shinn semantics that the imports and code
>     are resolved separately.
>

What is the rationale here ? Is it because of macros ? or includes ?

>   [...]
>   * #199 Make it an error for a procedure mapped by `map` and friends to
>       return more than once
>     We voted yes.
>

I am confused about this one. Why would it be an error at all ?

>   [...]
> * #135 let-values and let*-values
>     Yes, these will be added tentatively to the base module.
>

What about define-values ? I don't remember that we voted it or not.

> [...]
>   * #163 Allow modules at the REPL?
>     We voted no.
>

What is the rationale here ? Why do we have to restrict
implementations allowing it ?

> [...]

Thanks again for your good work.

Best regards,
--
Emmanuel

John Cowan

unread,
Jul 10, 2011, 4:08:17 PM7/10/11
to scheme-re...@googlegroups.com
Emmanuel Medernach scripsit:

> Ok, there was a Condorcet's Paradox on the last ballot. My feeling is
> that there is no "one record system to bind them all", and as SRFI-9
> is so widespread it is after all a good choice to put it in the
> standard (as it is "for better and for worse"). This is why I
> suggested to have an open door for others and modules like (scheme
> records srfi-9) or (scheme records Tiny-CLOS).

The first symbol "scheme" is reserved for modules defined by the standard,
so unless Tiny CLOS becomes part of the standard (which I don't expect),
then its module would have to begin with some other symbol.

> What about define-values ? I don't remember that we voted it or not.

There was no ticket for it, so we didn't vote on it. If you want it,
file a ticket at http://trac.sacrideo.us/wg/newticket .

--
John Cowan http://www.ccil.org/~cowan co...@ccil.org
Be yourself. Especially do not feign a working knowledge of RDF where
no such knowledge exists. Neither be cynical about RELAX NG; for in
the face of all aridity and disenchantment in the world of markup,
James Clark is as perennial as the grass. --DeXiderata, Sean McGrath

John Cowan

unread,
Jul 10, 2011, 4:30:20 PM7/10/11
to Alex Shinn, scheme-re...@googlegroups.com
Your draft specifies that with-*-file deals in binary ports, but that
can't be right -- it's R5RS-compatible, and should deal in character
ports. I've overridden this in commit 6ce649a32b43.

--
An observable characteristic is not necessarily John Cowan
a functional requirement. --John Hudson co...@ccil.org

Aaron W. Hsu

unread,
Jul 10, 2011, 6:04:35 PM7/10/11
to scheme-re...@googlegroups.com
On Sunday, July 10, 2011 04:30:20 pm John Cowan wrote:
> Your draft specifies that with-*-file deals in binary ports, but that
> can't be right -- it's R5RS-compatible, and should deal in character
> ports. I've overridden this in commit 6ce649a32b43.

It seems like a natural and backwards compatible extension to expect them to
handle binary ports as well as character ports. That doesn't affect old code
that doesn't deal with binary ports, and code that must deal with arbitrary
ports must already change, even without the extension.

Aaron W. Hsu

--
Programming is just another word for the lost art of thinking.

Aaron W. Hsu

unread,
Jul 10, 2011, 6:08:11 PM7/10/11
to scheme-re...@googlegroups.com
On Sunday, July 10, 2011 02:59:29 pm Emmanuel Medernach wrote:
> I am confused about this one. Why would it be an error at all ?

See Alex's emails about this issue specifically.

Alex Shinn

unread,
Jul 10, 2011, 6:25:59 PM7/10/11
to John Cowan, scheme-re...@googlegroups.com
On Mon, Jul 11, 2011 at 5:30 AM, John Cowan <co...@mercury.ccil.org> wrote:
> Your draft specifies that with-*-file deals in binary ports, but that
> can't be right -- it's R5RS-compatible, and should deal in character
> ports.  I've overridden this in commit 6ce649a32b43.

Thanks - the binary ports changes were still in mid-edit,
and I was falling asleep.

--
Alex

John Cowan

unread,
Jul 10, 2011, 10:13:42 PM7/10/11
to scheme-re...@googlegroups.com
Aaron W. Hsu scripsit:

> > Your draft specifies that with-*-file deals in binary ports, but that
> > can't be right -- it's R5RS-compatible, and should deal in character
> > ports. I've overridden this in commit 6ce649a32b43.
>
> It seems like a natural and backwards compatible extension to expect them to
> handle binary ports as well as character ports. That doesn't affect old code
> that doesn't deal with binary ports, and code that must deal with arbitrary
> ports must already change, even without the extension.

The `with-{input-from,output-to}-file` procedures open files themselves, though.
There is no indication of whether to open them as character ports or binary
ports, and backward compatibility requires that they be character ports.
If you want to bind the current input and output ports to binary files,
open the files yourself and use `parameterize`.

--
The experiences of the past show John Cowan
that there has always been a discrepancy co...@ccil.org
between plans and performance. http://www.ccil.org/~cowan
--Emperor Hirohito, August 1945

Emmanuel Medernach

unread,
Jul 11, 2011, 2:30:40 AM7/11/11
to scheme-reports-wg1


On 10 juil, 22:08, John Cowan <co...@mercury.ccil.org> wrote:
> The first symbol "scheme" is reserved for modules defined by the standard,
> so unless Tiny CLOS becomes part of the standard (which I don't expect),
> then its module would have to begin with some other symbol.
>

Yes, sure this was an example. But from WG2 there is srfi-99 on the
items list and this naming scheme let open for others as well (future
standards ?).

> > What about define-values ? I don't remember that we voted it or not.
>
> There was no ticket for it, so we didn't vote on it.  If you want it,
> file a ticket athttp://trac.sacrideo.us/wg/newticket.
>

Ok, ticket filled: http://trac.sacrideo.us/wg/ticket/232

--
Emmanuel

Emmanuel Medernach

unread,
Jul 11, 2011, 2:33:21 AM7/11/11
to scheme-reports-wg1


On 11 juil, 00:08, "Aaron W. Hsu" <arcf...@sacrideo.us> wrote:
> On Sunday, July 10, 2011 02:59:29 pm Emmanuel Medernach wrote:
>
> > I am confused about this one. Why would it be an error at all ?
>
> See Alex's emails about this issue specifically.
>

Ok, so #199 is closed. Thanks for pointing me at it.

--
Emmanuel

Alaric Snell-Pym

unread,
Jul 11, 2011, 5:13:05 AM7/11/11
to scheme-re...@googlegroups.com
On 07/10/11 11:02, Alex Shinn wrote:
> The third ballot has been closed, and the results tallied and
> available at
>
> http://trac.sacrideo.us/wg/wiki/WG1Ballot3Results

Hahah, I wrote "write/symbol" when I of course meant "write/simple" for
#216. My apologies! "write/simple" still won, so it's not a problem, but
I wonder how on Earth my fingers typed that...

#215 is interesting. What will the fill argument be? Another blob, which
is repeated as many times as needed to fill the blob? I hope nobody
suggests "a number, which is treated as a (signed or unsigned?) byte and
repeated", as I see the slide of blobs into byte vectors as a downhill
slide - SRFI-4 is there when we want structured packed arrays; making a
blob into a differently-worded u8vector is, I think, just untidy.

Gleckler/Hsu both (if I may paraphrase) say "It's a vector of bytes, so
let's call it a byte vector" - but it's NOT a vector of bytes. The
reason we got blobs in WG1, if I remember correctly, was as a natural
consequence of having binary ports. The byte happens to be the standard
granularity of measuring binary data, largely because it's fiddly to
deal with less than eight bits at a time on contemporary CPUs, but
binary data often involves structures much larger than a byte. I
honestly can't remember the last time I actually dealt with a string of
discrete bytes - even text these days is usually UTF-8, a variable-width
encoding.

Why give blobs a name that over-emphasises a rarely-used interpretation
of them, then give them a nearly-useless facility to fill them with some
arbitrarily repeated byte?

Now, I'd support a blob-fill! or blob initial fill argument that's a
blob in itself, as that can then be used to initialise an array of
floats to a chosen initial value and so on. But please please please
leave vectors-of-bytes to SRFI-4, which places them alongside various
other packed vectors, and even then is still only a tiny subset of the
stuff you might find in a blob...

ABS

--
Alaric Snell-Pym
http://www.snell-pym.org.uk/alaric/

John Cowan

unread,
Jul 11, 2011, 11:33:16 AM7/11/11
to scheme-re...@googlegroups.com
Alaric Snell-Pym scripsit:

> #215 is interesting. What will the fill argument be? Another blob,
> which is repeated as many times as needed to fill the blob? I hope
> nobody suggests "a number, which is treated as a (signed or unsigned?)
> byte and repeated", as I see the slide of blobs into byte vectors as
> a downhill slide - SRFI-4 is there when we want structured packed
> arrays; making a blob into a differently-worded u8vector is, I think,
> just untidy.

Unfortunately, #215 combined with #85 means that the byte vector view
of things has beaten our obviously superior blob's-eye view. Such is
democracy. So the new draft talks of bytes everywhere, and defines
_byte_ as an exact integer in the range [0..255]. And the fill argument
is a byte.

See my (non-random) .sig below.

--
John Cowan <co...@ccil.org> http://www.ccil.org/~cowan
Sir, I quite agree with you, but what are we two against so many?
--George Bernard Shaw,
to a man booing at the opening of _Arms and the Man_

Arthur A. Gleckler

unread,
Jul 11, 2011, 1:18:06 PM7/11/11
to scheme-re...@googlegroups.com
#215 is interesting. What will the fill argument be? Another blob, which
is repeated as many times as needed to fill the blob? I hope nobody
suggests "a number, which is treated as a (signed or unsigned?) byte and
repeated", as I see the slide of blobs into byte vectors as a downhill
slide - SRFI-4 is there when we want structured packed arrays; making a
blob into a differently-worded u8vector is, I think, just untidy.

I like the idea of accepting a byte vector as a fill argument.  We'll just have to specify what to do if the length of the byte vector is not a multiple of that argument.
 
Gleckler/Hsu both (if I may paraphrase) say "It's a vector of bytes, so
let's call it a byte vector" - but it's NOT a vector of bytes. The
reason we got blobs in WG1, if I remember correctly, was as a natural
consequence of having binary ports. The byte happens to be the standard
granularity of measuring binary data, largely because it's fiddly to
deal with less than eight bits at a time on contemporary CPUs, but
binary data often involves structures much larger than a byte. I
honestly can't remember the last time I actually dealt with a string of
discrete bytes - even text these days is usually UTF-8, a variable-width
encoding.

Why give blobs a name that over-emphasises a rarely-used interpretation
of them, then give them a nearly-useless facility to fill them with some
arbitrarily repeated byte?

A rarely used interpretation?  Hardly.  I read binary files as bytes all the time.  Sure, I sometimes interpret them as other than sequences of bytes, but frequently a sequence of bytes is exactly the right thing.  And there's no reason that calling them byte vectors means that the fill argument must be a single byte.

Alaric Snell-Pym

unread,
Jul 11, 2011, 1:28:22 PM7/11/11
to scheme-re...@googlegroups.com
On 07/11/11 18:18, Arthur A. Gleckler wrote:
>> Why give blobs a name that over-emphasises a rarely-used interpretation
>> of them, then give them a nearly-useless facility to fill them with some
>> arbitrarily repeated byte?
>
> A rarely used interpretation? Hardly. I read binary files as bytes all the
> time.

What file formats (or whatever they are) are those, out of interest?

> Sure, I sometimes interpret them as other than sequences of bytes,
> but frequently a sequence of bytes is exactly the right thing. And there's
> no reason that calling them byte vectors means that the fill argument must
> be a single byte.

Yeah, I'm just a bit edgy because I've not spotted any spirited argument
AGAINST avoiding aligning blobs tightly with bytes, while John and I
have argued to keep them apart; I just want to make sure people have
actually rejected the distinction, rather than just wondering what I'm
blathering about :-)

Arthur A. Gleckler

unread,
Jul 11, 2011, 1:40:04 PM7/11/11
to scheme-re...@googlegroups.com
> A rarely used interpretation?  Hardly.  I read binary files as bytes all the
> time.

What file formats (or whatever they are) are those, out of interest?

My web server reads binary files as uninterpreted streams of bytes in order to deliver images, etc. to the browser without corruption.

Streams of machine code lend themselves naturally to a byte-based interpretation.

I have an object database that packs its redo log tightly, and bytes are the natural unit for that.

I've written a special kind of port called a length-limited input port, whose goal is to ensure that the port is incapable of delivering more than a specified number of bytes.  This is used, for example, to ensure that a malicious client doesn't deliver more bytes than it claims it will, while allowing the code reading from the port to interpret the bytes however it likes, without having to count them.

I'm sure there are other examples.

I do agree that there should be ways to read and write units other than bytes.  But I don't see any problem with the name "byte vector" -- except for the lack of hyphenation as we've chosen to use it in identifiers, which makes the copy editor in me cringe.

Alaric Snell-Pym

unread,
Jul 11, 2011, 1:49:46 PM7/11/11
to scheme-re...@googlegroups.com
On 07/11/11 18:40, Arthur A. Gleckler wrote:

>> What file formats (or whatever they are) are those, out of interest?
>>
>
> My web server reads binary files as uninterpreted streams of bytes in order
> to deliver images, etc. to the browser without corruption.

That's the kind of thing I'm aiming blobs at - uninterpreted binary
strings, *measured* in bytes but without necessarily being composed of
byte-sized chunks.

> Streams of machine code lend themselves naturally to a byte-based
> interpretation.

Yes, x86-style instructions streams are defined in terms of bytes, with
the odd 16-bit or 32-bit value thrown in at random alignments. The most
natural way of dealing with those parsing-type tasks is to get your blob
of data and put a binary blob port on it, and then use procedures to
read bytes and various larger types, with flags to specify endianness
and so on - or, at a higher level, a means of specifying the "wire
format" of record-like objects declaratively and having them read/write
themselves.

> I have an object database that packs its redo log tightly, and bytes are the
> natural unit for that.

That's also something I'd suggest binary ports for

> I've written a special kind of port called a length-limited input port,
> whose goal is to ensure that the port is incapable of delivering more than a
> specified number of bytes. This is used, for example, to ensure that a
> malicious client doesn't deliver more bytes than it claims it will, while
> allowing the code reading from the port to interpret the bytes however it
> likes, without having to count them.

That's handy, but again just *measures* blobs in bytes, rather than
calling it a sequence of integers in the range 0..255

>
> I do agree that there should be ways to read and write units other than
> bytes. But I don't see any problem with the name "byte vector" -- except
> for the lack of hyphenation as we've chosen to use it in identifiers, which
> makes the copy editor in me cringe.
>

The name doesn't kill me, but I worry that it will lead people to lump
more and more of the semantics of an SRFI-4 u8vector (namely, its
contents are integers in the range 0..255) into something people are
simply using to shuffle data from disk out of an HTTP socket with!

Arthur A. Gleckler

unread,
Jul 11, 2011, 1:52:54 PM7/11/11
to scheme-re...@googlegroups.com
The name doesn't kill me, but I worry that it will lead people to lump
more and more of the semantics of an SRFI-4 u8vector (namely, its
contents are integers in the range 0..255) into something people are
simply using to shuffle data from disk out of an HTTP socket with!

I get the feeling that I don't understand exactly what you mean here, and that we may agree completely, after all.

Would you (or John) mind stating your position once again, as precisely as you can?

Aaron W. Hsu

unread,
Jul 11, 2011, 2:31:00 PM7/11/11
to scheme-re...@googlegroups.com
On Monday, July 11, 2011 01:18:06 pm Arthur A. Gleckler wrote:
> > #215 is interesting. What will the fill argument be? Another blob, which
> > is repeated as many times as needed to fill the blob? I hope nobody
> > suggests "a number, which is treated as a (signed or unsigned?) byte and
> > repeated", as I see the slide of blobs into byte vectors as a downhill
> > slide - SRFI-4 is there when we want structured packed arrays; making a
> > blob into a differently-worded u8vector is, I think, just untidy.
>
> I like the idea of accepting a byte vector as a fill argument. We'll just
> have to specify what to do if the length of the byte vector is not a
> multiple of that argument.

I would also point out that while I would definitely want to support a single
byte fill argument (I'm thinking specifically of zeroed bytevectors, which
should *not* be the default), I definitely think that allowing a bytevector fill
argument is a very good idea. This maps very nicely to the concept of Reshape
in APL, where you can create an array of arbitrary shape from another array,
and the fill argument is just repeated to fill up the array. In the vector case,
it's quite simple, and I think this is a good idea.

Aaron W. Hsu

unread,
Jul 11, 2011, 2:52:54 PM7/11/11
to scheme-re...@googlegroups.com
On Monday, July 11, 2011 01:28:22 pm Alaric Snell-Pym wrote:
> Yeah, I'm just a bit edgy because I've not spotted any spirited argument
> AGAINST avoiding aligning blobs tightly with bytes, while John and I
> have argued to keep them apart; I just want to make sure people have
> actually rejected the distinction, rather than just wondering what I'm
> blathering about :-)

I have, to a degree, explicitely rejected aligning blobs with bytes, because
they *are* aligned with bytes. Whether we interpret bytevectors as being
vectors of anything else, the smallest, most general unit of which
bytevectors/blobs are composed are always bytes. We might even treat them as
bit fields, but underlying that there will be bytes.

Blobs tells me nothing about access patterns and gives no hints as to the
interface of the system. Whether we want to grab 4 bytes at a time or
something else, blobs are still vector like objects that have data in them and
are accessed in a vector like fashion.

I do not disagree that we should recognize that bytes may not be the native
method by which most people use bytevectors; I often use IEEE floating point
vectors. But I still think that bytevectors more helpfully reflects what we are
working with than the name blob does.

To draw an analogy, we still talk about strings as a sequence or vector of
characters, even though we rarely think about them at the character level.
Others disagree with me about strings being a vector like object inherently,
and I grant that, but if you go into any computability class that discusses
complexity of languages with strings, they talk about them as vectors of
characters. Even though they have grammars that treat strings as streams of
tokens, or words or whatever, we still understand the most primitive, general
sense of the strings is a character vector. There might be some arguments to
changing this, but in the case of blobs, I don't see a similar movement. It's
still a vector of bytes. It is important to interpret those bytes in
meaningful ways and that our API be capable of doing so, but it's still just a
sequence of bytes, so we should give it a name that reflects this underlying
structure, and let abstractions be built on top of that.

In summary, i tend to think that much of this is just bike shedding, I don't
think you would disagree that the underlying structure is a sequence of bytes,
but you don't want to make that apparent in the name; I on the other hand, do
want to make that apparent in the name, not because I think it ties us to
bytes, but because I think it makes programming with them nicer, and makes
using the interface more naturally predicted.

Aaron W. Hsu

unread,
Jul 11, 2011, 6:16:46 PM7/11/11
to scheme-re...@googlegroups.com
On Sunday, July 10, 2011 10:13:42 pm John Cowan wrote:
> The `with-{input-from,output-to}-file` procedures open files themselves,
> though.

Whoops, sorry, right, I misread that and didn't see the file part.

John Cowan

unread,
Jul 12, 2011, 2:07:05 AM7/12/11
to scheme-re...@googlegroups.com
Arthur A. Gleckler scripsit:

> I get the feeling that I don't understand exactly what you mean here,
> and that we may agree completely, after all.
>
> Would you (or John) mind stating your position once again, as
> precisely as you can?

I'll try.

The idea is that a blob is just what its name says: a Binary Large
OBject. (This is probably a retronym, but it's a good one.) A blob
can be understood in a large variety of ways. It can be a SRFI-4
homogeneous vector. It can be the binary representation of a string in
a uniform encoding like ISO 8859-1 or a variable-length encoding like
UTF-8 or ISO 2022. It can be an ASN.1 object incoded in BER or DER or
PER. It can be a GIF or JPEG.

To use the term "byte vector" suggests that one of these is
*fundamentally* privileged over the rest. We blob-ists rather see
the measurement of blobs in 8-bit units, and the u8 access to them, as
implementation conveniences rather than definitions. Given that it's
hard to deal with blobs with sizes that aren't a multiple of 8 bits,
or to access them in less than 8-bit units or in ways that cross the
natural 8-bit boundaries, providing u8-ref and u8-set! is sensible, and
leaving the rest out is practical because they can (though probably
should not) be defined in terms of them. (It's much better not to write
f64-native-ref/set! directly in terms of eight calls to u8-ref/set! and
a lot of bit twiddling, but with a C-level cast.)

Aaron Hsu scripsit:

> I don't think you would disagree that the underlying structure is a
> sequence of bytes, but you don't want to make that apparent in the

> name.

I do disagree that the *underlying* structure is a sequence of bytes.
That may be the *implementation* structure, as the implementation
structure of a bignum is a sequence of 32-bit or 64-bit bigits. True
that I think it's best to provide only sequence-of-byte operators in WG1
for the sake of simplicity, but see BlobAPI for my full proposal (and
that only handles a few of the possibilities above).

--
John Cowan co...@ccil.org http://ccil.org/~cowan
Assent may be registered by a signature, a handshake, or a click of a computer
mouse transmitted across the invisible ether of the Internet. Formality
is not a requisite; any sign, symbol or action, or even willful inaction,
as long as it is unequivocally referable to the promise, may create a contract.
--Specht v. Netscape

Alaric Snell-Pym

unread,
Jul 12, 2011, 5:56:59 AM7/12/11
to scheme-re...@googlegroups.com

I'll give it a try!

A blob really represents what's read to and from POSIX file descriptors,
I guess - some bits, measured in bytes, that could have any structure.
Or a region of memory, which is also measured in bytes on nearly every
computer system I've actually seen a diagram of.

Now, it is indeed *measured* in bytes - which is why my objection to the
name "bytevector" is only mild.

What I object to is thinking that therefore it should be treated as a
vector of exact integers in the range 0..255. Such a representation sort
of makes sense for, say, ASCII text, but starts to fall apart if it's
UTF-8 text, or multi-byte integers, or floats, or the output of a
deflate compressor, and so on.

So my "dream" is to see a blob (or bytevector, if you wish) to represent
a region of memory, counted in bytes, but to leave it to SRFI-4 to allow
access to regions of memory as signed or unsigned integers of various
widths and so on. If the features of u8vector slip into blobs, then it's
privileging u8vector, when other things may be more appropriate.

And not all blobs are vector-like structures, anyway - they might be
viewable as a vector of bytes, but then so might a cons cell or a
procedure - they're all going to be stored as a sequence of bytes in
memory somewhere! Many blobs are really best defined by a binary
grammar, generally composed of various fixed-length structures
concatenated together with some kind of parsing state that lets the
reader know how to interpret the next structure to emerge.

So I see binary data in Scheme as having blobs or bytevectors as an
underlying abstraction for "some binary data", then blob-backed binary
ports as a way to read heterogenous mixtures of strings, other blobs
(sub-blobs?), and binary words in various signednesses and endiannesses;
and SRFI-4 backed onto blobs as a way to access them as homogenous
vectors; and so on.

I/O would be in terms of blobs, as I/O routines just deal with opaque
bytes - they don't care what higher-level structure and representation
sits on top.

I guess my core point is that since we need some tooling for parsing
different data formats on top of an underlying representation of a
region of memory, it'd be neat and tidy to treat u8s the same as
everything else and keep them separate, and therefore keep
blobs/bytevectors as an abstract region-of-memory-counted-in-bytes. Sort
of like "void *" rather than "char *" in C!

Alaric Snell-Pym

unread,
Jul 12, 2011, 5:59:06 AM7/12/11
to scheme-re...@googlegroups.com
On 07/11/11 19:52, Aaron W. Hsu wrote:

> Blobs tells me nothing about access patterns and gives no hints as to the
> interface of the system.

That's the point! Other mechanisms (SRFI-4, binary ports) give you
access to the contents of the blob.

> Whether we want to grab 4 bytes at a time or
> something else, blobs are still vector like objects that have data in them and
> are accessed in a vector like fashion.

Not necessarily; they may be heterogenous (most file formats have some
kind of complex structure, that basically conforms to a BNF-type grammar).

John Cowan

unread,
Jul 12, 2011, 11:02:38 AM7/12/11
to scheme-re...@googlegroups.com
Alaric Snell-Pym scripsit:

[most of posting, which I agree with, snipped]

> I guess my core point is that since we need some tooling for parsing
> different data formats on top of an underlying representation
> of a region of memory, it'd be neat and tidy to treat u8s the
> same as everything else and keep them separate, and therefore keep
> blobs/bytevectors as an abstract region-of-memory-counted-in-bytes. Sort
> of like "void *" rather than "char *" in C!

Blobs can't, however, be entirely abstract: there has to be some accessor
and some mutator that is primitive *for implementation*, without being
therefore conceptually privileged. In practice, that has to be u8.

I hope my posting and Alaric's demonstrate that this issue is not pure
bikeshedding.

--
John Cowan <co...@ccil.org> http://www.ccil.org/~cowan
Raffiniert ist der Herrgott, aber boshaft ist er nicht.
--Albert Einstein

Arthur A. Gleckler

unread,
Jul 15, 2011, 1:01:45 AM7/15/11
to scheme-re...@googlegroups.com
On Mon, Jul 11, 2011 at 11:07 PM, John Cowan <co...@mercury.ccil.org> wrote:
 
I'll try.

Thanks.
 
To use the term "byte vector" suggests that one of these is
*fundamentally* privileged over the rest.  We blob-ists rather see
the measurement of blobs in 8-bit units, and the u8 access to them, as
implementation conveniences rather than definitions.  Given that it's
hard to deal with blobs with sizes that aren't a multiple of 8 bits,
or to access them in less than 8-bit units or in ways that cross the
natural 8-bit boundaries, providing u8-ref and u8-set! is sensible, and
leaving the rest out is practical because they can (though probably
should not) be defined in terms of them.  (It's much better not to write
f64-native-ref/set! directly in terms of eight calls to u8-ref/set! and
a lot of bit twiddling, but with a C-level cast.)

I understand what you're saying now, but I'm just not worried about the problem of suggesting "...that one of these is *fundamentally* privileged over the rest."  If WG2 provides the full set of non-byte-oriented accessors and mutators, no one will have any illusion that bytes or more privileged.  I can't see how it will have any practical effect on implementation or usage.

On Tue, Jul 12, 2011 at 2:56 AM, Alaric Snell-Pym <ala...@snell-pym.org.uk> wrote:
 
What I object to is thinking that therefore it should be treated as a
vector of exact integers in the range 0..255. Such a representation sort
of makes sense for, say, ASCII text, but starts to fall apart if it's
UTF-8 text, or multi-byte integers, or floats, or the output of a
deflate compressor, and so on.

But unless we have a specific "byte" type, we'll certainly want operations that allow reading and writing in units of integers in the range 0-255.  It's just that those won't be the only units for which there are operations.  The name "byte vector" certainly doesn't prevent there being operations based on other units.  To me, it doesn't even imply that.  It just implies that one can _address_ whatever unit one is reading or writing in units of bytes, so I can, for example, read the IEEE single-precision floating-point number at byte offset 8,798,345, which is not a multiple of 64, the size in bytes of such a value.
 
So my "dream" is to see a blob (or bytevector, if you wish) to represent
a region of memory, counted in bytes, but to leave it to SRFI-4 to allow
access to regions of memory as signed or unsigned integers of various
widths and so on. If the features of u8vector slip into blobs, then it's
privileging u8vector, when other things may be more appropriate.

I want more than just homogeneous vectors.  Making byte operations available doesn't prevent anyone from imposing an interpretation of the byte vector or blob as homogeneous 32-bit integers or Unicode values or anything else.  It's just a baseline, minimal interface.
  
I guess my core point is that since we need some tooling for parsing
different data formats on top of an underlying representation of a
region of memory, it'd be neat and tidy to treat u8s the same as
everything else and keep them separate, and therefore keep
blobs/bytevectors as an abstract region-of-memory-counted-in-bytes. Sort
of like "void *" rather than "char *" in C!

You can think of them as void *, but think of the byte-based operations as performing an implicit char * (or byte *) cast on them, just as floating point-based operations would have analogous, implicit casts.

I must not have noticed the "WG1" section at the bottom of John's proposal when I voted on this item.  I thought I was voting to include the whole shebang in WG1, not just the byte operations.  So if this were somehow to come up for a vote again, I would argue for adding another option that didn't limit what operations were included in WG1.  After all, WG1 is supposed to support embedded systems, and that kind of operation is especially important for such systems.  That would also address the concerns about bytes being somehow privileged -- except for the name, which I just don't think is going to cause problems.

John Cowan

unread,
Jul 15, 2011, 2:43:06 AM7/15/11
to scheme-re...@googlegroups.com
Arthur A. Gleckler scripsit:

> I must not have noticed the "WG1" section at the bottom of John's proposal
> when I voted on this item. I thought I was voting to include the whole
> shebang in WG1, not just the byte operations.

You mentioned that. There are just so many procedures. 99 of them in all,
almost as many as R5RS has.

Sound change operates regularly to produce irregularities;
analogy operates irregularly to produce regularities.
--E.H. Sturtevant, ca. 1945, probably at Yale

Arthur A. Gleckler

unread,
Jul 15, 2011, 4:53:35 PM7/15/11
to scheme-re...@googlegroups.com
You mentioned that.  There are just so many procedures.  99 of them in all,
almost as many as R5RS has.

Sure, but in a module, that would be perfectly reasonable.
Reply all
Reply to author
Forward
0 new messages