Am I missing anything obvious? Will I just have to roll my own
predicate?
--
Matthew X. Economou <xeno...@irtnog.org> - Unsafe at any clock speed!
I'm proud of my Northern Tibetian heritage! (http://www.subgenius.com)
Max told his friend that he'd just as soon not go hiking in the hills.
Said he, "I'm an anti-climb Max." [So is that punchline.]
What are you going to do with the result?
--
Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
Erik> What are you going to do with the result?
I'm writing a library function that parses an IP address embedded in a
string. I'm using PARSE-INTEGER as a model for the function's
behavior. In addition to being able to operate on sub-strings and
ignoring junk, PARSE-INTEGER ignores leading and trailing whitespace,
and I'd like to do the same, using the same definition of whitespace
as the hosting Lisp implementation if possible.
Is this a reasonable thing to do?
--
Matthew X. Economou <xeno...@irtnog.org> - Unsafe at any clock speed!
I'm proud of my Northern Tibetian heritage! (http://www.subgenius.com)
"If it's not on fire, it's a software problem." --Carrie Fish
> Am I missing anything obvious? Will I just have to roll my own
> predicate?
I've tried to find this as well, but with no luck. I use the
following to mean white-space, though:
(#\Tab #\Newline #\Linefeed #\Page #\Return #\Space)
I guess there might be cases where want some of these not to count as
whitespace.
(I got these from the table in section 2.1.4 of the HyperSpec. Those
are the characters listed as having whitespace-syntax type.)
--
Johannes Grødem <OpenPGP: 5055654C>
> Am I missing anything obvious? Will I just have to roll my own
> predicate?
I've tried to find this as well, but with no luck. I use the
following to mean white-space, though:
(#\Tab #\Newline #\Linefeed #\Page #\Return #\Space)
I guess there might be cases where you want some of these not to count
as whitespace.
(I got these from the table in section 2.1.4 of the HyperSpec. These
Since an IP address may be several different things, I think the function
should be separated into two parts: one that searches for an IP address
(however defined: IPv4, IPv6, abbreviated or full), and several functions
that accept whatever passes for IP addresses and return the appropriate
address structure. I have found that I need CIDR coding with both /n and
/mask, but in other cases, /port is used. Sometimes, even .port is used
(which does not work with abbreviated IP addresses), although I consider
the smartest choice to be :port with IPv4 and /port with IPv6. When you
make this separation of functionality, there should be no need to know
what the whitespace characters are. Actually processing everything that
people do with IP addresses is fascinatingly complex. Many losers have
no concern for parsability of the output from their programs. *sigh*
Surprisingly often, wanting to know if you look at a whitespace character
means that you have chosen a less-than-ideal approach to the solution.
If you parse using a stream, `peek-char´ has a skip-whitespace option.
Johannes> (I got these from the table in section 2.1.4 of the
Johannes> HyperSpec. These are the characters listed as having
Johannes> whitespace-syntax type.)
This gave me an idea. Since I'm consciously trying to mimic the
behavior of PARSE-INTEGER, especially its ability to parse substrings
via the START and END arguments, I have to manually track my position
within the string. It would be a lot easier to treat the string as a
string stream via WITH-INPUT-FROM-STRING, as I get both substrings and
bounds checking for free with streams.
The other nice thing this gives me is PEEK-CHAR, which with a peek
type of T, peeks ahead to the first non-whitespace character in the
stream.
I definitely need this behavior at the start of parsing, and I think I
can make it work to end parsing.
Thanks for the help! I'll be sure to post the code when I'm done.
Erik> Since an IP address may be several different things, I think
Erik> the function should be separated into two parts: one that
Erik> searches for an IP address (however defined: IPv4, IPv6,
Erik> abbreviated or full), and several functions that accept
Erik> whatever passes for IP addresses and return the appropriate
Erik> address structure.
I think I'm on the right track. The code I'm writing now (the
PARSE-ADDRESS function) handles only IPv4 dotted-quads. I thought it
would be a lower-level function suitable for use in a reader macro (or
other user-input routine), just as PARSE-INTEGER seems to be used by
READ.
Erik> Actually processing everything that people do with IP
Erik> addresses is fascinatingly complex.
I didn't realize how complicated it could be until I took a look at
the source code to the IP address parsing routines in several
different operating systems and resolver libraries.
Erik> Surprisingly often, wanting to know if you look at a
Erik> whitespace character means that you have chosen a
Erik> less-than-ideal approach to the solution. If you parse
Erik> using a stream, `peek-char´ has a skip-whitespace option.
I was processing the input string character by character, instead of
converting it to a string-stream.
--
Matthew X. Economou <xeno...@irtnog.org> - Unsafe at any clock speed!
I'm proud of my Northern Tibetian heritage! (http://www.subgenius.com)
`parse-integerด is not used by `readด.
| I didn't realize how complicated it could be until I took a look at the
| source code to the IP address parsing routines in several different
| operating systems and resolver libraries.
No kidding. People do so many horrible things you could cry.
| I was processing the input string character by character, instead of
| converting it to a string-stream.
Ideally, a string-stream should be better from all perspectives, but is
often much more expensive than need be.
String streams are COOL and should be used for almost everything.
Real Lisp Programmers use string streams instead of lists. If your
implementor makes string streams expensive, complain vigorously.
(half serious)
--tim
Considered by whom? The thing is, it depends. In the standard,
there's the whitespace syntax type, and then there's whitespace(1),
which is independent of the readtable (and there's no standard way to
determine if a character is either of those). But if you're parsing
some non-CL syntax, that's the wrong place to look; you should look at
the definition of that syntax. And then you should spare a moment to
think about possible extension to character sets other than ASCII: Do
you want, e.g., U+00A0 No-Break Space or U+3000 Ideographic Space?
--
Pekka P. Pirinen
In cyberspace, everybody can hear you scream. - Gary Lewandowski