bash : offset to a substring in a string

Allodoxaphobia

unread,

Jul 25, 2009, 9:59:03 PM7/25/09

to

I'm passing "Case couLD be AnyThing" strings into a subroutine:
e.g.:
"the quick brown fox jumped over the lazy dogs back"
"The Quick Brown Fox Jumped Over The Lazy Dogs Back"
"tHe quIck brOwn fOx jUmped oVer the lAzy dOgs bAck"
"thE lazY browN foX jumpeD oveR thE quicK dogS bacK"
etc.,usw.

I wish to detect and highlight the word "lazy" by either
- replacing it with itself enclosed in ANSI escape sequences.
-or-
- simply inserting the ANSI escape sequences fore and aft of the
word "lazy".

I currently detect that the word "lazy" is in the string by making an
UPPERCASE copy of the string and using case/esac to see if LAZY is
there.

Other than using a `do` loop to then run through the UPPERCASE string --
looking for the target substring -- to determine the position in the
original string of the target substring, is there a bash-way to
determine its position? (Only the first occurance is important.)

I do not know what case construct the WORD/Word/worD/wORd/.... is in
the original string -- just that it's there.
Knowing the position in the UPPERCASE copy would give me the position
in the original string -- and I believe I can tackle it from there.

Of course, the input strings can be of any length and content.

I studied "Table B-5. String Operations" at
http://en.tldp.org/LDP/abs/html/refcards.html#AEN21811
until my eyes nearly bled. Something like I'm seeking should
be there: Determining the offset to a substring in a string.

Maybe it's a Forest And The Trees problem for me....

Bill Marcum

unread,

Jul 26, 2009, 12:07:18 AM7/26/09

to

On 2009-07-26, Allodoxaphobia <bit-b...@config.com> wrote:
> I'm passing "Case couLD be AnyThing" strings into a subroutine:
> e.g.:
> "the quick brown fox jumped over the lazy dogs back"
> "The Quick Brown Fox Jumped Over The Lazy Dogs Back"
> "tHe quIck brOwn fOx jUmped oVer the lAzy dOgs bAck"
> "thE lazY browN foX jumpeD oveR thE quicK dogS bacK"
> etc.,usw.
>
> I wish to detect and highlight the word "lazy" by either
> - replacing it with itself enclosed in ANSI escape sequences.
> -or-
> - simply inserting the ANSI escape sequences fore and aft of the
> word "lazy".
>

If you have GNU grep, grep -i --color.

Marcel Bruinsma

unread,

Jul 26, 2009, 1:36:51 AM7/26/09

to

Allodoxaphobia wrote:

> I'm passing "Case couLD be AnyThing" strings into a subroutine:
> e.g.:
> "the quick brown fox jumped over the lazy dogs back"
> "The Quick Brown Fox Jumped Over The Lazy Dogs Back"
> "tHe quIck brOwn fOx jUmped oVer the lAzy dOgs bAck"
> "thE lazY browN foX jumpeD oveR thE quicK dogS bacK"
> etc.,usw.
>
> I wish to detect and highlight the word "lazy" by either
> - replacing it with itself enclosed in ANSI escape sequences.
> -or-
> - simply inserting the ANSI escape sequences fore and aft of the
> word "lazy".

p3=${line#*[lL][aA][zZ][yY]}
if [ "$p3" != "$line" ]; then
p1=${line%%[lL][aA][zZ][yY]*}
p2=${line#${p1}}
p2=${p2%${p3}}
echo "|$p1|$p2|$p3|"
# $line="${p1}${HLON}${p2}${HLOFF}${p3}"
fi

--
printf -v email $(echo \ 155 141 162 143 145 154 142 162 165 151 \
156 163 155 141 100 171 141 150 157 157 056 143 157 155|tr \ \\\\)
# Live every life as if it were your last! #

Marcel Bruinsma

unread,

Jul 26, 2009, 1:39:47 AM7/26/09

to

Marcel Bruinsma wrote:

> Allodoxaphobia wrote:
>
>> I'm passing "Case couLD be AnyThing" strings into a subroutine:
>> e.g.:
>> "the quick brown fox jumped over the lazy dogs back"
>> "The Quick Brown Fox Jumped Over The Lazy Dogs Back"
>> "tHe quIck brOwn fOx jUmped oVer the lAzy dOgs bAck"
>> "thE lazY browN foX jumpeD oveR thE quicK dogS bacK"
>> etc.,usw.
>>
>> I wish to detect and highlight the word "lazy" by either
>> - replacing it with itself enclosed in ANSI escape sequences.
>> -or-
>> - simply inserting the ANSI escape sequences fore and aft of the
>> word "lazy".
>
> p3=${line#*[lL][aA][zZ][yY]}
> if [ "$p3" != "$line" ]; then
> p1=${line%%[lL][aA][zZ][yY]*}
> p2=${line#${p1}}
> p2=${p2%${p3}}
> echo "|$p1|$p2|$p3|"
> # $line="${p1}${HLON}${p2}${HLOFF}${p3}"

# line="${p1}${HLON}${p2}${HLOFF}${p3}"

> fi

Allodoxaphobia

unread,

Jul 26, 2009, 12:47:38 PM7/26/09

to

Now THAT is one clever solution!

$ export GREP_OPTIONS='--color=auto' GREP_COLOR='1;42' && \
echo "The quick brown fox jumped over the laZy dogs back" | grep -i lazy

It chops and it dices.
It grinds and it shuffles.
It shuffles and it sorts.
It (linux) just keeps amazing me!

Thanks!

(I still think bash deserves to have a "determine the offset to a
substring in a string" function. :-)

makyo

unread,

Jul 26, 2009, 1:30:39 PM7/26/09

to

On Jul 26, 11:47 am, Allodoxaphobia <bit-buc...@config.com> wrote:
> On Sun, 26 Jul 2009 04:07:18 +0000 (UTC), Bill Marcum wrote:

Hi.

Using special bash array:
#!/usr/bin/env bash

# @(#) s1 Demonstrate use of variable BASH_REMATCH.

echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1)
set -o nounset
echo

FILE=${1-data1}

echo " Data file $FILE:"
cat $FILE

HLON="|"
HLOFF="|"
P="lazy"
shopt -s nocasematch
echo
echo " Results:"
t1=$( cat $FILE )
while IFS="\n" read t1
do
if [[ $t1 =~ (.+)($P)(.+) ]]
then
echo "${BASH_REMATCH[1]}${HLON}${BASH_REMATCH[2]}${HLOFF}$
{BASH_REMATCH[3]}"
else
echo "NO match for :$t1:"
fi
done <$FILE

exit 0

produces:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0
GNU bash 3.2.39

Data file data1:
A simple line.
Who is the lazy person.
A quick brown fox jumps over the lAzY dog.
The LAZY dogs don't jump.

cheers, makyo

makyo

unread,

Jul 27, 2009, 5:47:36 AM7/27/09

to

Hi.

A better version, correcting the IFS assignment, and changes to the
pattern to match strings at the ends of the lines:

#!/usr/bin/env bash

# @(#) s2 Demonstrate use of variable BASH_REMATCH.

echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1)
set -o nounset
echo

FILE=${1-data1}

echo " Data file $FILE:"
cat $FILE

HLON="|"
HLOFF="|"
P="lazy"
shopt -s nocasematch
echo
echo " Results:"
t1=$( cat $FILE )

while IFS=$'\n' read t1
do
if [[ $t1 =~ (.*)($P)(.*) ]]

then
echo "${BASH_REMATCH[1]}${HLON}${BASH_REMATCH[2]}${HLOFF}$
{BASH_REMATCH[3]}"
else
echo "NO match for :$t1:"
fi
done <$FILE

exit 0

Producing:

% ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0
GNU bash 3.2.39

Data file data1:
A simple line.

A quick brown fox jumps over the lAzY dog.

LAZY dogs don't jump.

Deliberate, not lazy

Results:
NO match for :A simple line.:

A quick brown fox jumps over the |lAzY| dog.

|LAZY| dogs don't jump.

Deliberate, not |lazy|

cheers, makyo

Chris F.A. Johnson

unread,

Jul 30, 2009, 5:01:48 PM7/30/09

to

On 2009-07-26, Allodoxaphobia wrote:
> I'm passing "Case couLD be AnyThing" strings into a subroutine:
> e.g.:
> "the quick brown fox jumped over the lazy dogs back"
> "The Quick Brown Fox Jumped Over The Lazy Dogs Back"
> "tHe quIck brOwn fOx jUmped oVer the lAzy dOgs bAck"
> "thE lazY browN foX jumpeD oveR thE quicK dogS bacK"
> etc.,usw.
>
> I wish to detect and highlight the word "lazy" by either
> - replacing it with itself enclosed in ANSI escape sequences.
> -or-
> - simply inserting the ANSI escape sequences fore and aft of the
> word "lazy".
>
> I currently detect that the word "lazy" is in the string by making an
> UPPERCASE copy of the string and using case/esac to see if LAZY is
> there.
>
> Other than using a `do` loop to then run through the UPPERCASE string --
> looking for the target substring -- to determine the position in the
> original string of the target substring, is there a bash-way to
> determine its position? (Only the first occurance is important.)
>
> I do not know what case construct the WORD/Word/worD/wORd/.... is in
> the original string -- just that it's there.
> Knowing the position in the UPPERCASE copy would give me the position
> in the original string -- and I believe I can tackle it from there.

_index() #@ Store position of $2 in $1 in $_INDEX
{
case $1 in
"") _INDEX=0; return 1 ;;
*"$2"*) ## extract up to beginning of the matching portion
idx=${1%%"$2"*}
## the starting position is one more than the length
_INDEX=$(( ${#idx} + 1 )) ;;
*) _INDEX=0; return 1 ;;
esac
}

In bash 4.0, you can convert the string to uppercase with:

${string^^}

> Of course, the input strings can be of any length and content.
>
> I studied "Table B-5. String Operations" at
> http://en.tldp.org/LDP/abs/html/refcards.html#AEN21811
> until my eyes nearly bled. Something like I'm seeking should
> be there: Determining the offset to a substring in a string.
>
> Maybe it's a Forest And The Trees problem for me....
>
> Thank you for any help/pointers/change-of-approach.
> Jonesy

--
Chris F.A. Johnson, author <http://cfaj.freeshell.org/shell/>
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence

Allodoxaphobia

unread,

Jul 31, 2009, 3:55:25 PM7/31/09

to

On Mon, 27 Jul 2009 02:47:36 -0700 (PDT), makyo wrote:

<- trim to makyo's last followup ->

mayko,

You spent WAY TO MUCH time and effort working on my problem.
I wish to thank you for your efforts!

I have incorporated some of your tactics into my solution. I did not
mention in the OP that the target string would be variable, as well.
An array is loaded with anywhere from 0 to "n" "string2"s to look for
and highlight in a the stream of (unpredictable) "string1"s coming from
a `lynx --dump` execution. (I get wordy enough as it is, and my OP was
already overly 'descriptive'.

Too, my script needs to run on a variety of linux and FreeBSD systems
(over which I have no control and very little influence) -- with bash
versions prior to 3.2 ( even a 2.05b.0(1) ).

But, like I said, I did apply some of your coding techniques to this,
and some of your work prompted me to go back and review some recent
coding -- where I thought to myself: "There's got to be a better way!"

Again, thanks!

Allodoxaphobia

unread,

Jul 31, 2009, 4:08:35 PM7/31/09

to

Chris,

Thanks for the followup post! Like I told mayko in the other
sub-thread, this script/program needs to run under bash versions prior
to 3.2. There's been plenty in the program already that had to be coded
The Hard Way because of the need to avoid bash 3.2'isms. This
`index` thingy was just the most cumbersome of the lot.

But, your subroutine looks to be a 'keeper' for use in future bash 3.2+
projects.

Again, thanks!

Chris F.A. Johnson

unread,

Jul 31, 2009, 7:33:37 PM7/31/09

to

The _index function will work in bash2.