Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

regexp to clean data

4 views
Skip to first unread message

alexg

unread,
Jun 10, 2009, 12:02:54 PM6/10/09
to
Hi all,
I'm trying to clean data using tcl regular expressions. I use to have
C program which except the rules and clean data.

For example:

string "DEAF SLIGHT"

1. If below rule find "DEAF" AND "SLIGHT OR MILD" in string it would
transform string to "UHOH SLIGHT"
/\(.*\)\(DEAF\)\(.*\)\(SLIGHT\|MILD\)\(.*\)/UHOH \1\3\4\5/g

2. Next rulle would add dashes to the front of the string: "--UHOH
SLIGHT"
/\(.*\)/--\1/g

3. Next rulle will add the Answer Class to the front of the string: "
AC IMPAIRMENT--UHOH SLIGHT"

/\(.*\)--\(.*\)\(IMPAIR\|INDISTINCT\|UHOH\|SOUNDSLIKETROUBLE\|
ABOUTWHATYOUDEXPECT\|KINDACLOSE\|REALLYCLOSE\)\(.*\)/AC IMPAIRMENT--
\2\3\4/g

And so on. How hard it would be to do the same using tcl regular
exprassion?

Thank you.

Alex

Ralf Fassel

unread,
Jun 10, 2009, 12:21:32 PM6/10/09
to
* alexg <gen...@sbcglobal.net>

| 1. If below rule find "DEAF" AND "SLIGHT OR MILD" in string it would
| transform string to "UHOH SLIGHT"
| /\(.*\)\(DEAF\)\(.*\)\(SLIGHT\|MILD\)\(.*\)/UHOH \1\3\4\5/g
--<snip-snip>--

| And so on. How hard it would be to do the same using tcl regular
| exprassion?

Grouping is done in TCL usign plain () not \(\), and alternatives are
done using plain |, not \|.

Other than that, the syntax should be very similar, and the task rather
straight forward.

set string "DEAF SLIGHT"
regsub -all {(.*)(DEAF)(.*)(SLIGHT|MILD)(.*)} $string {UHOH \1\3\4\5} result
set result
=> UHOH SLIGHT

Check the regsub command:
http://wiki.tcl.tk/987
http://www.purl.org/tcl/home/man/tcl8.5/TclCmd/regsub.htm
and the regexp page for the syntax description:
http://wiki.tcl.tk/986
http://www.purl.org/tcl/home/man/tcl8.5/TclCmd/regexp.htm

HTH
R'

alexg

unread,
Jun 10, 2009, 1:25:00 PM6/10/09
to
On Jun 10, 11:21 am, Ralf Fassel <ralf...@gmx.de> wrote:
> * alexg <gend...@sbcglobal.net>

Thank You, this is work!!!

Bruce Hartweg

unread,
Jun 10, 2009, 1:26:00 PM6/10/09
to
Ralf Fassel wrote:

Additionally see:
http://wiki.tcl.tk/1495
http://www.purl.org/tcl/home/man/tcl8.5/TclCmd/re_syntax.htm

The regsub/regexp pages explain the commands and options for those commands
without going into details of the RE syntax. There are links to get there, but
the re_syntax page has the full syntax.

bruce

Ralf Fassel

unread,
Jun 11, 2009, 4:03:45 AM6/11/09
to
* Bruce Hartweg <Bruce-D...@example.com>

| Ralf Fassel wrote:
| > and the regexp page for the syntax description:
| > http://wiki.tcl.tk/986
| > http://www.purl.org/tcl/home/man/tcl8.5/TclCmd/regexp.htm
|
| Additionally see:
| http://wiki.tcl.tk/1495
| http://www.purl.org/tcl/home/man/tcl8.5/TclCmd/re_syntax.htm
|
| The regsub/regexp pages explain the commands and options for those
| commands without going into details of the RE syntax. There are links
| to get there, but the re_syntax page has the full syntax.

Thanks for correcting this. Whenever I need Tcl docs I'm using some
ancient emacs-info pages which are hopelessly out of date, but work most
of the time. Switching from the One True Editor to the browser and back
is just too much work :-)

R'
NB Just yesterday I had a startup error in emacs.
Took some time to recognize that
(package require 'w3)
looks familiar to me but not to emacs :-)

rocket777

unread,
Jun 11, 2009, 2:06:38 PM6/11/09
to
While NOT a free utility, i.e. it's shareware -

I use regexbuddy when I need to create/test a complex regular
expression. The author heavily uses r.e.'s in many of his other
products and this utility is afaik the most comprehensive r.e.
development tool in the universe (I am just a happy customer, not
affiliated at all with the programmer).

Just being able to see a human readable display of a complex regular
expression is worth the price alone (where you can click on any part
of it and it will display an english explanation of the sub-
expression).

He has great documentation as well, and I believe he is in the process
of writing a book devoted to r.e.'s. His documentation mentions 15 or
20 languages that use r.e.'s and TCL is among them.


http://www.regexbuddy.com/


0 new messages