Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Determining whitespace
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  11 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Matthew X. Economou  
View profile  
 More options Oct 12 2002, 11:50 am
Newsgroups: comp.lang.lisp
From: "Matthew X. Economou" <xenophon+use...@irtnog.org>
Date: 12 Oct 2002 11:22:36 -0400
Local: Sat, Oct 12 2002 11:22 am
Subject: Determining whitespace
Is there an implementation-independent way to determine if a character
is considered whitespace?  I'm looking for the equivalent of the
isspace() function in the standard C library, but the permuted symbol
index in the CLHS only lists predicates for alphabetic, digit, and
graphics characters.  There also doesn't seem to be an implementation-
independent way to query the reader for this information.

Am I missing anything obvious?  Will I just have to roll my own
predicate?

--
Matthew X. Economou <xenop...@irtnog.org> - Unsafe at any clock speed!
I'm proud of my Northern Tibetian heritage! (http://www.subgenius.com)
Max told his friend that he'd just as soon not go hiking in the hills.
Said he, "I'm an anti-climb Max."  [So is that punchline.]


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Oct 12 2002, 1:22 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.no>
Date: 12 Oct 2002 17:22:43 +0000
Local: Sat, Oct 12 2002 1:22 pm
Subject: Re: Determining whitespace
* Matthew X. Economou
| Is there an implementation-independent way to determine if a character is
| considered whitespace?

  What are you going to do with the result?

--
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthew X. Economou  
View profile  
 More options Oct 12 2002, 3:51 pm
Newsgroups: comp.lang.lisp
From: "Matthew X. Economou" <xenophon+use...@irtnog.org>
Date: 12 Oct 2002 15:25:49 -0400
Local: Sat, Oct 12 2002 3:25 pm
Subject: Re: Determining whitespace

>>>>> "Erik" == Erik Naggum <e...@naggum.no> writes:

    Erik> What are you going to do with the result?

I'm writing a library function that parses an IP address embedded in a
string.  I'm using PARSE-INTEGER as a model for the function's
behavior.  In addition to being able to operate on sub-strings and
ignoring junk, PARSE-INTEGER ignores leading and trailing whitespace,
and I'd like to do the same, using the same definition of whitespace
as the hosting Lisp implementation if possible.

Is this a reasonable thing to do?

--
Matthew X. Economou <xenop...@irtnog.org> - Unsafe at any clock speed!
I'm proud of my Northern Tibetian heritage! (http://www.subgenius.com)
"If it's not on fire, it's a software problem." --Carrie Fish


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Johannes Grødem  
View profile  
 More options Oct 12 2002, 5:39 pm
Newsgroups: comp.lang.lisp
From: "Johannes Grødem" <joh...@ifi.uio.no>
Date: Sat, 12 Oct 2002 23:38:38 +0200
Local: Sat, Oct 12 2002 5:38 pm
Subject: Re: Determining whitespace
* "Matthew X. Economou" <xenophon+use...@irtnog.org>:

> Am I missing anything obvious?  Will I just have to roll my own
> predicate?

I've tried to find this as well, but with no luck.  I use the
following to mean white-space, though:

(#\Tab #\Newline #\Linefeed #\Page #\Return #\Space)

I guess there might be cases where want some of these not to count as
whitespace.

(I got these from the table in section 2.1.4 of the HyperSpec.  Those
are the characters listed as having whitespace-syntax type.)

--
Johannes Grødem <OpenPGP: 5055654C>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Johannes Grødem  
View profile  
 More options Oct 12 2002, 5:46 pm
Newsgroups: comp.lang.lisp
From: "Johannes Grødem" <joh...@ifi.uio.no>
Date: Sat, 12 Oct 2002 23:46:42 +0200
Local: Sat, Oct 12 2002 5:46 pm
Subject: Re: Determining whitespace
* "Matthew X. Economou" <xenophon+use...@irtnog.org>:

> Am I missing anything obvious?  Will I just have to roll my own
> predicate?

I've tried to find this as well, but with no luck.  I use the
following to mean white-space, though:

(#\Tab #\Newline #\Linefeed #\Page #\Return #\Space)

I guess there might be cases where you want some of these not to count
as whitespace.

(I got these from the table in section 2.1.4 of the HyperSpec.  These
are the characters listed as having whitespace-syntax type.)

--
Johannes Grødem <OpenPGP: 5055654C>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Oct 12 2002, 6:32 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.no>
Date: 12 Oct 2002 22:32:11 +0000
Local: Sat, Oct 12 2002 6:32 pm
Subject: Re: Determining whitespace
* Matthew X. Economou
| I'm writing a library function that parses an IP address embedded in a
| string.

  Since an IP address may be several different things, I think the function
  should be separated into two parts: one that searches for an IP address
  (however defined: IPv4, IPv6, abbreviated or full), and several functions
  that accept whatever passes for IP addresses and return the appropriate
  address structure.  I have found that I need CIDR coding with both /n and
  /mask, but in other cases, /port is used.  Sometimes, even .port is used
  (which does not work with abbreviated IP addresses), although I consider
  the smartest choice to be :port with IPv4 and /port with IPv6.  When you
  make this separation of functionality, there should be no need to know
  what the whitespace characters are.  Actually processing everything that
  people do with IP addresses is fascinatingly complex.  Many losers have
  no concern for parsability of the output from their programs.  *sigh*

  Surprisingly often, wanting to know if you look at a whitespace character
  means that you have chosen a less-than-ideal approach to the solution.
  If you parse using a stream, `peek-char´ has a skip-whitespace option.

--
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthew X. Economou  
View profile  
 More options Oct 12 2002, 6:36 pm
Newsgroups: comp.lang.lisp
From: "Matthew X. Economou" <xenophon+use...@irtnog.org>
Date: 12 Oct 2002 18:14:18 -0400
Local: Sat, Oct 12 2002 6:14 pm
Subject: Re: Determining whitespace

>>>>> "Johannes" == Johannes Grødem <joh...@ifi.uio.no> writes:

    Johannes> (I got these from the table in section 2.1.4 of the
    Johannes> HyperSpec.  These are the characters listed as having
    Johannes> whitespace-syntax type.)

This gave me an idea.  Since I'm consciously trying to mimic the
behavior of PARSE-INTEGER, especially its ability to parse substrings
via the START and END arguments, I have to manually track my position
within the string.  It would be a lot easier to treat the string as a
string stream via WITH-INPUT-FROM-STRING, as I get both substrings and
bounds checking for free with streams.

The other nice thing this gives me is PEEK-CHAR, which with a peek
type of T, peeks ahead to the first non-whitespace character in the
stream.

I definitely need this behavior at the start of parsing, and I think I
can make it work to end parsing.

Thanks for the help!  I'll be sure to post the code when I'm done.

--
Matthew X. Economou <xenop...@irtnog.org> - Unsafe at any clock speed!
I'm proud of my Northern Tibetian heritage! (http://www.subgenius.com)
Max told his friend that he'd just as soon not go hiking in the hills.
Said he, "I'm an anti-climb Max."  [So is that punchline.]


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthew X. Economou  
View profile  
 More options Oct 13 2002, 1:21 pm
Newsgroups: comp.lang.lisp
From: "Matthew X. Economou" <xenophon+use...@irtnog.org>
Date: 13 Oct 2002 12:59:25 -0400
Local: Sun, Oct 13 2002 12:59 pm
Subject: Re: Determining whitespace

>>>>> "Erik" == Erik Naggum <e...@naggum.no> writes:

    Erik> Since an IP address may be several different things, I think
    Erik> the function should be separated into two parts: one that
    Erik> searches for an IP address (however defined: IPv4, IPv6,
    Erik> abbreviated or full), and several functions that accept
    Erik> whatever passes for IP addresses and return the appropriate
    Erik> address structure.

I think I'm on the right track.  The code I'm writing now (the
PARSE-ADDRESS function) handles only IPv4 dotted-quads.  I thought it
would be a lower-level function suitable for use in a reader macro (or
other user-input routine), just as PARSE-INTEGER seems to be used by
READ.

    Erik> Actually processing everything that people do with IP
    Erik> addresses is fascinatingly complex.

I didn't realize how complicated it could be until I took a look at
the source code to the IP address parsing routines in several
different operating systems and resolver libraries.

    Erik> Surprisingly often, wanting to know if you look at a
    Erik> whitespace character means that you have chosen a
    Erik> less-than-ideal approach to the solution.  If you parse
    Erik> using a stream, `peek-char´ has a skip-whitespace option.

I was processing the input string character by character, instead of
converting it to a string-stream.

--
Matthew X. Economou <xenop...@irtnog.org> - Unsafe at any clock speed!
I'm proud of my Northern Tibetian heritage! (http://www.subgenius.com)
Max told his friend that he'd just as soon not go hiking in the hills.
Said he, "I'm an anti-climb Max."  [So is that punchline.]


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Oct 13 2002, 7:28 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.no>
Date: 13 Oct 2002 23:28:21 +0000
Local: Sun, Oct 13 2002 7:28 pm
Subject: Re: Determining whitespace
* Matthew X. Economou
| I thought it would be a lower-level function suitable for use in a reader
| macro (or other user-input routine), just as PARSE-INTEGER seems to be
| used by READ.

  `parse-integer´ is not used by `read´.

| I didn't realize how complicated it could be until I took a look at the
| source code to the IP address parsing routines in several different
| operating systems and resolver libraries.

  No kidding.  People do so many horrible things you could cry.

| I was processing the input string character by character, instead of
| converting it to a string-stream.

  Ideally, a string-stream should be better from all perspectives, but is
  often much more expensive than need be.

--
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Bradshaw  
View profile  
 More options Oct 13 2002, 7:42 pm
Newsgroups: comp.lang.lisp
From: Tim Bradshaw <t...@cley.com>
Date: 14 Oct 2002 00:42:04 +0100
Local: Sun, Oct 13 2002 7:42 pm
Subject: Re: Determining whitespace
* Matthew X Economou wrote:

String streams are COOL and should be used for almost everything.
Real Lisp Programmers use string streams instead of lists. If your
implementor makes string streams expensive, complain vigorously.

(half serious)

--tim


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pekka P. Pirinen  
View profile  
 More options Oct 14 2002, 1:20 pm
Newsgroups: comp.lang.lisp
From: Pekka.P.Piri...@globalgraphics.com (Pekka P. Pirinen)
Date: 14 Oct 2002 18:18:00 +0100
Subject: Re: Determining whitespace
"Matthew X. Economou" <xenophon+use...@irtnog.org> writes:

> Is there an implementation-independent way to determine if a character
> is considered whitespace?  I'm looking for the equivalent of the
> isspace() function in the standard C library,

Considered by whom?  The thing is, it depends.  In the standard,
there's the whitespace syntax type, and then there's whitespace(1),
which is independent of the readtable (and there's no standard way to
determine if a character is either of those).  But if you're parsing
some non-CL syntax, that's the wrong place to look; you should look at
the definition of that syntax.  And then you should spare a moment to
think about possible extension to character sets other than ASCII: Do
you want, e.g., U+00A0 No-Break Space or U+3000 Ideographic Space?
--
Pekka P. Pirinen
In cyberspace, everybody can hear you scream.  - Gary Lewandowski

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »