The last name is Text not String so we don't accidentally create a
fmt.Stringer out of a Scanner.
To scan the input, use the Next method as the loop condition, the
Bytes or Line methods as the "getters", and Close at the end.
On 14 February 2013 13:36, Rob Pike <r...@golang.org> wrote:
The last name is Text not String so we don't accidentally create a
fmt.Stringer out of a Scanner.
Why not let it be a Stringer?
s := bufio.NewScanner(r)for s.Next() {fmt.Println(s)}// etc
I would be more concerned if the String/Text method had some kind of side effect, apart from an allocation. Since you must always advance with Next, I don't see the problem with Scanner being a Stringer..
I'd like a possibility of continuing on from an overlong ling in some way.
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
We add a new type, called Scanner, to capture the new functionality.
Its constructor takes an io.ReadCloser, for reasons that will become
clear. (The caller can promote a Reader to a ReadCloser using
ioutil.NopCloser.) If the argument is not already a bufio.Reader, one
is created to wrap the argument.
func (s *Scanner) Text() string
The last name is Text not String so we don't accidentally create a
fmt.Stringer out of a Scanner.
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
}
Some contradictory opinions: Steve McCoy's proposed func seems close, but would be much faster if it returned a slice (otherwise the func would continually have to rescan the same input until it was given a whole token). The stdlib side could use address checking to determine where the token resides (and panic if the input and output slices have no overlap). I suggest:
func (data []byte, prevState int) (token []byte, state int)
Here, the func should scan data for at most one token. state is user defined, except for zero, which means the token return value, if not nil, represents a complete token. This allows the function to continue processing a token without nesting the previous data. If the returned state is negative, it indicates an invalid token (the contents of which should be represented by the returned token value); the abs of the negative state will be stuffed with the full token (over however many calls had consecutively returned a state > 0) into an error; the caller of Close can type assert to use the error state to lookup a useful error message.
Though for simple cases, what's wrong with the func type that bytes.FieldsFunc takes, or something similar, like `func (byte) bool`. This is less flexible and less efficient, yet sufficient for many cases, and an adaptor that wraps this into the defacto func type would be useful.
@rog: who says you can't call Close twice? I believe os.File takes steps to ensure that its idempotent (and so it didn't close a reopened fd), while being safe with other ReadCloser use cases I've seen.
Some contradictory opinions: Steve McCoy's proposed func seems close, but would be much faster if it returned a slice (otherwise the func would continually have to rescan the same input until it was given a whole token). The stdlib side could use address checking to determine where the token resides (and panic if the input and output slices have no overlap). I suggest:
func (data []byte, prevState int) (token []byte, state int)
Here, the func should scan data for at most one token. state is user defined, except for zero, which means the token return value, if not nil, represents a complete token. This allows the function to continue processing a token without nesting the previous data. If the returned state is negative, it indicates an invalid token (the contents of which should be represented by the returned token value); the abs of the negative state will be stuffed with the full token (over however many calls had consecutively returned a state > 0) into an error; the caller of Close can type assert to use the error state to lookup a useful error message.
Though for simple cases, what's wrong with the func type that bytes.FieldsFunc takes, or something similar, like `func (byte) bool`. This is less flexible and less efficient, yet sufficient for many cases, and an adaptor that wraps this into the defacto func type would be useful.
@rog: the token return val would only contain the first token. The next call to that func would pass the remaining buffer (starting on the first byte after the token's last byte), similar to the logic used with Read or Write in a loop, except instead of n, addresses would be used by bufio to determine the offset.
The EOF argument is true only at EOF, giving the function a chance to
terminate the last token.
The incoming data is a slice of unconsumed data. Each call to
SplitFunc occurs at the previous location, plus the returned 'advance'
value from the previous call. Thus by returning advance==0, SplitFunc
can ask the Scanner to accumulate data until there is a full token to
return. If the required storage becomes too large while accumulating,
the Scanner will terminate with a line-too-long error. Once a token is
delivered, SplitFunc would typically return advance=len(token) plus
perhaps len(separator).
The token returned by SplitFunc is the next token to deliver to the
client; there is no requirement that it correspond to any actual input
data.
For instance, it might be upper-cased or lower-cased or
something completely arbitrary. A nil token signals to return nothing
to the client yet.
breaks of the form `\r?\n`
We set up a custom splitter with an option method:
func (s *Scanner) Split(SplitFunc) *Scanner // default: split on line
For example, if we provided a rune splitter in the package, you'd scan
runes like this:
s := bufio.NewScanner(io.Stdin).Split(bufio.SplitRune)
for s.Next() {
fmt.Printf("rune: %s\n", s.Bytes())
}
if err := s.Close(); err != nil {
log.Fatal(err)
}
Comments welcome.
[...]
Maybe rename Close Stop?
I hate the chaining API. It's a cheap trick just to avoid writing one line of code, which to me is a very un-Go-like goal. MaxLength does not return a new Scanner, so it shouldn't return *Scanner; it's an unnecessary complication of the interface. I haven't found anything else in the standard library that does that, and I don't think starting now is a good idea.
Also, this may have been assumed, but it wasn't explicitly stated - you should export the default SplitFunc for splitting on '\r?\n'.
On Thursday, February 14, 2013 4:28:56 PM UTC-5, Rob Pike wrote:I left the chaining design in place. It's cheap and helpful for
initialization but is not necessary: one may always use a separate
line for the option setup if desired. This is a trivial decision to
reverse, of course, and it's not set in stone yet.
On Thursday, February 14, 2013 5:59:45 PM UTC-8, Nate Finch wrote:I haven't found anything else in the standard library that does that,
The text/template and html/template packages use the chaining design (see Funcs and Delims methods on Template type).
Updated proposal.
I'm leaning towards Scan and Stop as the methods at this point. I'm
close to blowing the whistle on the bikeshedding session.
I don't like Err or Error because they suggest something about the
error interface.
The point is, in fact, to shut down the scanner and
terminate the scan; the error is a side effect, not the driver, and
calling it Err for instance will encourage lazy users to skip that
stage. That's partly why I liked Close, but now think that Close
indicates closure of the underlying resource. Stop indeed makes sense.
When you're done, you Stop. "When you're done, you Err" doesn't sound
right.
I'm going to write some code.
Hi,I just wanted to thank for everyone for this nice and usable solution - I'm using it to write converting readers (stripping =\r?\n, converting unheard-of semi-base64 quoted-printable encodings in mails), and it is easy, simple and performant (at least memory-wise, compared to the ReadFull + bytes.Replace).Thanks again!Tamás Gulácsi