Is there an implementation-independent way to determine if a character is considered whitespace? I'm looking for the equivalent of the isspace() function in the standard C library, but the permuted symbol index in the CLHS only lists predicates for alphabetic, digit, and graphics characters. There also doesn't seem to be an implementation- independent way to query the reader for this information.
Am I missing anything obvious? Will I just have to roll my own predicate?
-- Matthew X. Economou <xenop...@irtnog.org> - Unsafe at any clock speed! I'm proud of my Northern Tibetian heritage! (http://www.subgenius.com) Max told his friend that he'd just as soon not go hiking in the hills. Said he, "I'm an anti-climb Max." [So is that punchline.]
>>>>> "Erik" == Erik Naggum <e...@naggum.no> writes:
Erik> What are you going to do with the result?
I'm writing a library function that parses an IP address embedded in a string. I'm using PARSE-INTEGER as a model for the function's behavior. In addition to being able to operate on sub-strings and ignoring junk, PARSE-INTEGER ignores leading and trailing whitespace, and I'd like to do the same, using the same definition of whitespace as the hosting Lisp implementation if possible.
Is this a reasonable thing to do?
-- Matthew X. Economou <xenop...@irtnog.org> - Unsafe at any clock speed! I'm proud of my Northern Tibetian heritage! (http://www.subgenius.com) "If it's not on fire, it's a software problem." --Carrie Fish
* Matthew X. Economou | I'm writing a library function that parses an IP address embedded in a | string.
Since an IP address may be several different things, I think the function should be separated into two parts: one that searches for an IP address (however defined: IPv4, IPv6, abbreviated or full), and several functions that accept whatever passes for IP addresses and return the appropriate address structure. I have found that I need CIDR coding with both /n and /mask, but in other cases, /port is used. Sometimes, even .port is used (which does not work with abbreviated IP addresses), although I consider the smartest choice to be :port with IPv4 and /port with IPv6. When you make this separation of functionality, there should be no need to know what the whitespace characters are. Actually processing everything that people do with IP addresses is fascinatingly complex. Many losers have no concern for parsability of the output from their programs. *sigh*
Surprisingly often, wanting to know if you look at a whitespace character means that you have chosen a less-than-ideal approach to the solution. If you parse using a stream, `peek-char´ has a skip-whitespace option.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
>>>>> "Johannes" == Johannes Grødem <joh...@ifi.uio.no> writes:
Johannes> (I got these from the table in section 2.1.4 of the Johannes> HyperSpec. These are the characters listed as having Johannes> whitespace-syntax type.)
This gave me an idea. Since I'm consciously trying to mimic the behavior of PARSE-INTEGER, especially its ability to parse substrings via the START and END arguments, I have to manually track my position within the string. It would be a lot easier to treat the string as a string stream via WITH-INPUT-FROM-STRING, as I get both substrings and bounds checking for free with streams.
The other nice thing this gives me is PEEK-CHAR, which with a peek type of T, peeks ahead to the first non-whitespace character in the stream.
I definitely need this behavior at the start of parsing, and I think I can make it work to end parsing.
Thanks for the help! I'll be sure to post the code when I'm done.
-- Matthew X. Economou <xenop...@irtnog.org> - Unsafe at any clock speed! I'm proud of my Northern Tibetian heritage! (http://www.subgenius.com) Max told his friend that he'd just as soon not go hiking in the hills. Said he, "I'm an anti-climb Max." [So is that punchline.]
>>>>> "Erik" == Erik Naggum <e...@naggum.no> writes:
Erik> Since an IP address may be several different things, I think Erik> the function should be separated into two parts: one that Erik> searches for an IP address (however defined: IPv4, IPv6, Erik> abbreviated or full), and several functions that accept Erik> whatever passes for IP addresses and return the appropriate Erik> address structure.
I think I'm on the right track. The code I'm writing now (the PARSE-ADDRESS function) handles only IPv4 dotted-quads. I thought it would be a lower-level function suitable for use in a reader macro (or other user-input routine), just as PARSE-INTEGER seems to be used by READ.
Erik> Actually processing everything that people do with IP Erik> addresses is fascinatingly complex.
I didn't realize how complicated it could be until I took a look at the source code to the IP address parsing routines in several different operating systems and resolver libraries.
Erik> Surprisingly often, wanting to know if you look at a Erik> whitespace character means that you have chosen a Erik> less-than-ideal approach to the solution. If you parse Erik> using a stream, `peek-char´ has a skip-whitespace option.
I was processing the input string character by character, instead of converting it to a string-stream.
-- Matthew X. Economou <xenop...@irtnog.org> - Unsafe at any clock speed! I'm proud of my Northern Tibetian heritage! (http://www.subgenius.com) Max told his friend that he'd just as soon not go hiking in the hills. Said he, "I'm an anti-climb Max." [So is that punchline.]
* Matthew X. Economou | I thought it would be a lower-level function suitable for use in a reader | macro (or other user-input routine), just as PARSE-INTEGER seems to be | used by READ.
`parse-integer´ is not used by `read´.
| I didn't realize how complicated it could be until I took a look at the | source code to the IP address parsing routines in several different | operating systems and resolver libraries.
No kidding. People do so many horrible things you could cry.
| I was processing the input string character by character, instead of | converting it to a string-stream.
Ideally, a string-stream should be better from all perspectives, but is often much more expensive than need be.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
String streams are COOL and should be used for almost everything. Real Lisp Programmers use string streams instead of lists. If your implementor makes string streams expensive, complain vigorously.
> Is there an implementation-independent way to determine if a character > is considered whitespace? I'm looking for the equivalent of the > isspace() function in the standard C library,
Considered by whom? The thing is, it depends. In the standard, there's the whitespace syntax type, and then there's whitespace(1), which is independent of the readtable (and there's no standard way to determine if a character is either of those). But if you're parsing some non-CL syntax, that's the wrong place to look; you should look at the definition of that syntax. And then you should spare a moment to think about possible extension to character sets other than ASCII: Do you want, e.g., U+00A0 No-Break Space or U+3000 Ideographic Space? -- Pekka P. Pirinen In cyberspace, everybody can hear you scream. - Gary Lewandowski