Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to remove control characters from a string?

3,101 views
Skip to first unread message

ja...@cadence.com

unread,
Jun 1, 1998, 3:00:00 AM6/1/98
to

I'm trying to remove control characters from a string in Tcl, and I seem
to be having difficulty accomplishing my task. Here's the command I thought
would remove all the control characters from a string, except for tabs and
newlines:

% set test "abc\008123\n"
ab123

% regsub -all {[\000-\010]|[\013-\037]|[\177]} $test {} test
3
puts "<$test>"
<abc
>

So evidently you cannot use octal codes in a [] set. What's the best way to
accomplish this? Remember, I want to remove all control characters *except*
for Control-I and Control-J... There doesn't seem to be a Tcl equivalent to
the Perl/Unix "tr" function?

By the way, I need a solution that works in Tcl 7.5 -- I'm coding this in an
application which has Tcl embedded in it, so I have to use the supplied
version rather than 8.0, etc.

Thanks in advance,

jase

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/ Now offering spam-free web-based newsreading

Bryan Oakley

unread,
Jun 2, 1998, 3:00:00 AM6/2/98
to ja...@cadence.com

ja...@cadence.com wrote:
>
> I'm trying to remove control characters from a string in Tcl, and I seem
> to be having difficulty accomplishing my task. Here's the command I thought
> would remove all the control characters from a string, except for tabs and
> newlines:
>
> % set test "abc\008123\n"
> ab123
>
> % regsub -all {[\000-\010]|[\013-\037]|[\177]} $test {} test
> 3
> puts "<$test>"
> <abc
> >
>
> So evidently you cannot use octal codes in a [] set. What's the best way to
> accomplish this? Remember, I want to remove all control characters *except*
> for Control-I and Control-J... There doesn't seem to be a Tcl equivalent to
> the Perl/Unix "tr" function?

What you say is correct -- regsub doesn't know about octal codes.
However, tcl string processing does, so simply use quotes instead of
curly braces, like this (untested, but I'm just _sure_ it works ;-)

regsub -all "\[\000-\010\013-\037\177\]" $test {} test


--
Bryan Oakley
ChannelPoint, Inc.

Victor Wagner

unread,
Jun 2, 1998, 3:00:00 AM6/2/98
to

ja...@cadence.com wrote:
: I'm trying to remove control characters from a string in Tcl, and I seem
: to be having difficulty accomplishing my task. Here's the command I thought
: would remove all the control characters from a string, except for tabs and
: % regsub -all {[\000-\010]|[\013-\037]|[\177]} $test {} test


: 3
: puts "<$test>"
: <abc
: >

: So evidently you cannot use octal codes in a [] set. What's the best way to

You can use double quoted string instead of curly braces and let Tcl
substutute octal codes BEFORE regexp command sees it. You'll have to
escape brackets. Or use subst -nocommand

regsub -all [subst -nocommands {[\000-\010\013-\037\177]} $test {} test
Hmm. Only two brackets? May be

regsub -all "\[\000-\010\013-\037\177\]" $test {} test

would be shorter.

: accomplish this? Remember, I want to remove all control characters *except*


: for Control-I and Control-J... There doesn't seem to be a Tcl equivalent to
: the Perl/Unix "tr" function?

: By the way, I need a solution that works in Tcl 7.5 -- I'm coding this in an


: application which has Tcl embedded in it, so I have to use the supplied
: version rather than 8.0, etc.

Of course, this WOULDN'T work for Tcl 7.5. There is just NO WAY to store
NUL character in strings prior to 8.0. If you really expect it to occur,
you should use careful approach on getting data and throw nulls away
upon reading.
Typically it was done with following code


set char [read $f 1]
if [string length $char] {
append string $char
}

Reading of NUL character produces string of length 0 in this `read'
command.

If you put \000 in you regexp just in case, replace it with \001 and
thing would work in Tcl 7.5 as long as read string do not contain NUL.
In such case NUL would be treated as end of string and all data after
it would be lost.
: Thanks in advance,

: jase

: -----== Posted via Deja News, The Leader in Internet Discussion ==-----
: http://www.dejanews.com/ Now offering spam-free web-based newsreading

--
--------------------------------------------------------
I have tin news and pine mail...
Victor Wagner @ home = vitus @ orc . ru

Paul Duffin

unread,
Jun 3, 1998, 3:00:00 AM6/3/98
to

Bryan Oakley wrote:
>
> ja...@cadence.com wrote:
> >
> > I'm trying to remove control characters from a string in Tcl, and I seem
> > to be having difficulty accomplishing my task. Here's the command I thought
> > would remove all the control characters from a string, except for tabs and
> > newlines:
> >
> > % set test "abc\008123\n"
> > ab123
> >
> > % regsub -all {[\000-\010]|[\013-\037]|[\177]} $test {} test
> > 3
> > puts "<$test>"
> > <abc
> > >
> >
> > So evidently you cannot use octal codes in a [] set. What's the best way to
> > accomplish this? Remember, I want to remove all control characters *except*
> > for Control-I and Control-J... There doesn't seem to be a Tcl equivalent to
> > the Perl/Unix "tr" function?
>
> What you say is correct -- regsub doesn't know about octal codes.
> However, tcl string processing does, so simply use quotes instead of
> curly braces, like this (untested, but I'm just _sure_ it works ;-)
>
> regsub -all "\[\000-\010\013-\037\177\]" $test {} test
>
> --
> Bryan Oakley
> ChannelPoint, Inc.

regsub does not seem to be able to parse a regular expression with a
NUL character.

regsub -all {[\000-\010]|[\013-\037]|[\177]} $test {} test

fails with "couldn't compile regular expression pattern: unmatched []"

regsub -all {[\001-\010]|[\013-\037]|[\177]} $test {} test

works.

--
Paul Duffin
DT/6000 Development Email: pdu...@hursley.ibm.com
IBM UK Laboratories Ltd., Hursley Park nr. Winchester
Internal: 7-246880 International: +44 1962-816880

WANGNICK Sebastian

unread,
Jun 4, 1998, 3:00:00 AM6/4/98
to Paul Duffin

Regsub also stops its processing of the *source* string when
encountering a NUL character, without complaining (this is tcl8.0p2).
--
Sebastian Wangnick <sebastian...@eurocontrol.bex>
Office: Eurocontrol Maastricht UAC, Horsterweg 11, NL-6191RX Beek,
Tel: +31-433661370, Fax: ~300
Delete 'x' from domain to reply

0 new messages