Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Find/Replace only multiple spaces (2 or more) in text file

759 views
Skip to first unread message

Lawrence Knowlton

unread,
Dec 28, 2012, 11:39:08 AM12/28/12
to
Hello All,
I have a text file with multiple columns of data that vary in width and need to replace the variable spaces (which are always 2 or more) with a single delimiting character. I want to do this in a batch to make the maintenance easier. There are 3 rows at the top that need deletion. Any help would be greatly appreciated.
Thank you!

Larry

Auric__

unread,
Dec 28, 2012, 1:22:41 PM12/28/12
to
If a VBscript is acceptable, I believe this does what you want. Test on a
*copy* of your data first.

-----begin munge-data.vbs-----
'Note:
'If any file has less than 3 lines this script will error out on that file.

'If this script won't be run from the directory containing the text files,
'change this to that directory's FULL PATH:
Const WORKINGDIR = "."

'Set your desired delimiter here:
Const DELIMITER = ","

'If there are already files named *.tmp in this directory, change this:
Const TMPEXT = "tmp"

Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 8
Const CreateFile = -1
Const DontCreateFile = 0
Const Unicode = -1
Const ASCII = 0

Set FSO = CreateObject("Scripting.FileSystemObject")

Set working = FSO.GetFolder(WORKINGDIR)

For Each file In working.Files
If ".txt" = LCase(Right(file, 4)) Then
Set fileIn = FSO.OpenTextFile(file, ForReading, DontCreateFile, ASCII)
Set fileOut = FSO.OpenTextFile(Left(file, Len(file) - Len(TMPEXT)) & _
TMPEXT, ForWriting, CreateFile, ASCII)

'Ignore first 3 lines:
thisLine = fileIn.ReadLine
thisLine = fileIn.ReadLine
thisLine = fileIn.ReadLine

Do Until fileIn.AtEndOfStream
thisLine = RTrim(fileIn.ReadLine)
tmp = 0
Do
'Replace *all* runs of 2 or more spaces with the delimiter:
tmp = InStr(tmp + 1, thisLine, " ")
If tmp < 1 Then Exit Do
For L0 = tmp + 2 To Len(thisLine)
If Mid(thisLine, L0, 1) <> " " Then
thisLine = Left(thisLine, tmp - 1) & DELIMITER & _
Mid(thisLine, L0)
Exit For
End If
Next
Loop
fileOut.WriteLine thisLine
Loop

fileIn.Close
fileOut.Close
file.Delete
End If
Next

For Each file In working.Files
If ("." & TMPEXT) = LCase(Right(file, Len(TMPEXT) + 1)) Then _
file.Move Left(file, Len(file) - Len(TMPEXT)) & "txt"
Next
-----end munge-data.vbs-----

--
Mercy is a shield used by the weak.

billious

unread,
Dec 28, 2012, 1:49:06 PM12/28/12
to
An example would have been good. I gather you wish to replace any
sequence of 2 or more space characters by some delimiter, but leave
single spaces alone.

Batch has an enormous sensitivity to certain characters. Within some
limits, the following batch works:

@ECHO OFF
SETLOCAL
::
:: delete first 3 lines
:: and replace each occurrence of 2 or more spaces
:: by a delimiter
::
DEL outfile.txt 2>nul /F /Q
SET delim=*
FOR /f "skip=3delims=" %%i IN (infile.txt) DO (
SET line=%%i
(SET newline=)
SET count=0
CALL :change
)
GOTO :eof

:CHANGE
SET c1=%line:~0,1%
SET line=%line:~1%
IF "%c1%"==" " (SET /a count+=1) ELSE (
IF %count%==0 (SET newline=%newline%%c1%) ELSE (
IF %count%==1 (SET newline=%newline% %c1%) ELSE (
SET newline=%newline%%delim%%c1%)
SET count=0
)
)
IF DEFINED line GOTO CHANGE
::
:: You may want to preserve trailing spaces
:: or convert them...
::
IF %count%==0 GOTO print
IF %count%==1 SET newline=%newline% &GOTO print
SET newline=%newline%%delim%
:PRINT
>>outfile.txt ECHO %newline%
GOTO :eof

As usual, all lines indented 2 spaces; lines not indented 2 spaces have
been wrapped from the previous line.


The limits I found were that the input file should not contain
double-quotes, close-parentheses or redirectors ( <, > or | ) but seemed
happy with !, %, (, +, =, :, ; and comma.

The delimiter can't be a pipe - I tested no others of the not-happy set
but expect that those characters should be not-usable here either.


I believe it would be better to use SED [or (g)awk] - Google will reveal.

A suitable SED line using GNU SED is

sed s/\x20\x20\x20*/*/g <infile.txt >outfile.txt

[substitute-for space-space-any-number-of-spaces with ASTERISK everywhere]

I'm sure that you could also remove the first 3 lines using one of these
TPPs, but

for /f "skip=3delims=" <infile.txt >outfile.txt

will strip these off.

Frank Westlake

unread,
Dec 28, 2012, 3:31:21 PM12/28/12
to
On 2012-12-28 08:39, Lawrence Knowlton wrote:
> I have a text file with multiple columns of data that vary in width
> and need to replace the variable spaces (which are always 2 or more)
> with a single delimiting character. I want to do this in a batch to
> make the maintenance easier. There are 3 rows at the top that need
> deletion.

I assume that there are a set number of columns. In my demo below I use
five columns -- change it by altering the "tokens=1-5" parameter and by
adding or removing "%%5" type variables. I also assume that since spaces
delimit the columns then spaces do not exist in the values.

Set "in=your input file"
Set "out=your output file"
TYPE NUL:>"%out%"
For /F "usebackq skip=3 tokens=1-5 delims= " %%1 in ("%in%") Do (
(Echo;%%1 %%2 %%3 %%4 %%5)>>"%out%"
)

Frank

Lawrence Knowlton

unread,
Dec 28, 2012, 6:12:47 PM12/28/12
to
Clarifications:
There are single spaces in the column data, there are 7 columns and the lines end with regular CR/LF.

Lawrence Knowlton

unread,
Dec 28, 2012, 6:17:57 PM12/28/12
to
Thank you for the effort, but unfortunately, I really want to keep it to a batch.

Lawrence Knowlton

unread,
Dec 28, 2012, 6:20:06 PM12/28/12
to
Wow, looks awesome, thank you! Now all I need to do is figure out what all your statements are doing ;)

Frank Westlake

unread,
Dec 28, 2012, 6:30:39 PM12/28/12
to
On 2012-12-28 15:12, Lawrence Knowlton wrote:
> Clarifications:
> There are single spaces in the column data, there are 7 columns and the lines end with regular CR/LF.

If there are spaces in the data and spaces as delimiters then how do you
distinguish the two? Are the data spaces quotes or escaped, and by which
quotes or escapements?

Frank


foxidrive

unread,
Dec 28, 2012, 8:25:02 PM12/28/12
to
On 29/12/2012 5:49 AM, billious wrote:

> I believe it would be better to use SED [or (g)awk] - Google will reveal.

It sure is. Use an appropriate tool for the job.


@echo off
sed -n 4,$p "file.txt" |sed "s/ */ /g" >"file2.txt"
pause




> I'm sure that you could also remove the first 3 lines using one of these TPPs, but
>
> for /f "skip=3delims=" <infile.txt >outfile.txt
>
> will strip these off.

Will it? :) Have you been sipping the eggnog? ;)



--
foxi

foxidrive

unread,
Dec 28, 2012, 8:41:11 PM12/28/12
to
On 29/12/2012 12:25 PM, foxidrive wrote:
> On 29/12/2012 5:49 AM, billious wrote:
>
>> I believe it would be better to use SED [or (g)awk] - Google will reveal.
>
> It sure is. Use an appropriate tool for the job.
>
>
> @echo off
> sed -n 4,$p "file.txt" |sed "s/ */ /g" >"file2.txt"
> pause
>

This is probably easier to read. Both use GnuSED.


@echo off
sed -e 1,3d -e "s/ */ /g" "file.txt" >"file2.txt"



--
foxi

billious

unread,
Dec 28, 2012, 10:03:44 PM12/28/12
to
You're right of course. Sadly not a surfeit of eggnog, more a deficit of
egg nishner, hence 2:30 am next to a fan. It's a mite warm on the West
coast - and Summer's only just started...

I'm more used to HHSED (which sadly doesn't work under W7) and use the
\xhh syntax to avoid having to quote strings. Always takes a while to
get it just so...

Meanwhile, "s/ */ /g" should be "s/ */delimiter-string/g" - but that
might be stating the obvious....

foxidrive

unread,
Dec 28, 2012, 10:12:23 PM12/28/12
to
On 29/12/2012 10:17 AM, Lawrence Knowlton wrote:
>> > -----end munge-data.vbs-----
>> >
> Thank you for the effort, but unfortunately, I really want to keep it to a batch.

You can wrap the script in a batch file.


Something to keep in mind is that in several cases batch files alone will corrupt your data, as batch has
several flaws related to blank lines and poison characters and handling of certain text.

If you cannot guarantee that your data is free of the things that batch has issues with, then you are
better off to use a VB script or SED.


--
foxi

foxidrive

unread,
Dec 28, 2012, 10:16:12 PM12/28/12
to
On 29/12/2012 2:03 PM, billious wrote:

> You're right of course. Sadly not a surfeit of eggnog, more a deficit of
> egg nishner, hence 2:30 am next to a fan. It's a mite warm on the West
> coast - and Summer's only just started...

I've noted your several days of 40+ celcius days and here on the SE coast we generally get the weather
from the west. I hope we don't get them though!


> I'm more used to HHSED (which sadly doesn't work under W7) and use the
> \xhh syntax to avoid having to quote strings. Always takes a while to
> get it just so...

GnuSED also has the /xhh syntax. It works in Windows 8.

> Meanwhile, "s/ */ /g" should be "s/ */delimiter-string/g" - but that
> might be stating the obvious....

I'm not sure what you mean billious. My script replaces 2 or more spaces with a single space.

--
foxi

Lawrence Knowlton

unread,
Dec 28, 2012, 10:37:46 PM12/28/12
to
Hi Frank,
Single spaces are only in the data, the columns are always separated by 2 or more spaces. Thanks!

Larry

billious

unread,
Dec 28, 2012, 10:55:54 PM12/28/12
to
On 29/12/2012 11:16, foxidrive wrote:
> On 29/12/2012 2:03 PM, billious wrote:
>
>> You're right of course. Sadly not a surfeit of eggnog, more a deficit of
>> egg nishner, hence 2:30 am next to a fan. It's a mite warm on the West
>> coast - and Summer's only just started...
>
> I've noted your several days of 40+ celcius days and here on the SE coast we generally get the weather
> from the west. I hope we don't get them though!
>

Caused by a typical Summer pattern - Easterlies over the sandpit from an
anticyclone in the Bight. Your sandpit is a mite wetter and cooler....

>
>> I'm more used to HHSED (which sadly doesn't work under W7) and use the
>> \xhh syntax to avoid having to quote strings. Always takes a while to
>> get it just so...
>
> GnuSED also has the /xhh syntax. It works in Windows 8.
>
>> Meanwhile, "s/ */ /g" should be "s/ */delimiter-string/g" - but that
>> might be stating the obvious....
>
> I'm not sure what you mean billious. My script replaces 2 or more spaces with a single space.
>

True, but in this case, OP says the column-separator is multiple-spaces
and a column data may contain single spaces, so replacing
multiple-spaces with a single isn't achieving the objective since
there's then no indication of where the columns are.



foxidrive

unread,
Dec 29, 2012, 2:06:24 AM12/29/12
to
On 29/12/2012 2:55 PM, billious wrote:

>>> Meanwhile, "s/ */ /g" should be "s/ */delimiter-string/g" - but that
>>> might be stating the obvious....
>>
>> I'm not sure what you mean billious. My script replaces 2 or more spaces with a single space.
>>
>
> True, but in this case, OP says the column-separator is multiple-spaces
> and a column data may contain single spaces, so replacing
> multiple-spaces with a single isn't achieving the objective since
> there's then no indication of where the columns are.

Ahh yes, thanks. From the original post:

:need to replace the variable spaces (which are always 2 or more) with a single delimiting character.

So the batch below should replace two spaces or more with a pipe character |

Note that the previous sed commands replaced every run of spaces with a space, including a single space.
Now the left hand side of the s/// has three spaces it in, to match two spaces in a row, and then zero
or more spaces extra.


@echo off
sed -e 1,3d -e "s/ */|/g" "file.txt" >"file3.txt"




--
foxi

Frank Westlake

unread,
Dec 29, 2012, 7:05:02 AM12/29/12
to
On 2012-12-28 19:37, Lawrence Knowlton wrote:
> Single spaces are only in the data, the columns are always separated by 2 or more spaces. Thanks!

Change the delimiter to your choice and try this script. If the
delimiter exists also in the values then you still have a problem.

:: BEGIN SCRIPT :::::::::::::::::::::::::::::::::::::::::::::::::
@Echo OFF
SetLocal EnableExtensions EnableDelayedExpansion
Set "delimiter=;"
Set "in=your input file"
Set "out=your output file"
TYPE NUL: >"%out%"
For /F "usebackq skip=3 delims=" %%a in ("%in%") Do (
Set "line=%%a"
CALL :sub
(Echo;!line!)>>"%out%"
)
TYPE "%out%"
Goto :EOF

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:sub
Set "line=!line: = !"
Set line|FindStr /l /C:" " >NUL: && Goto :sub
Set "line=!line: =%delimiter%!"
Goto :EOF
:: END SCRIPT ::::::::::::::::::::::::::::::::::::::::::::::::::::


Frank Westlake

unread,
Dec 29, 2012, 9:21:01 AM12/29/12
to
On 2012-12-29 04:05, Frank Westlake wrote:
> On 2012-12-28 19:37, Lawrence Knowlton wrote:
>> Single spaces are only in the data, the columns are always separated
>> by 2 or more spaces. Thanks!

If you want the option to overwrite the input file with the altered data
then try this script. You can change the value of the variable
"overwrite" to something other than "yes" to append, or you can provide
a new name for "out" to overwrite a secondary file.


:: BEGIN SCRIPT :::::::::::::::::::::::::::::::::::::::::::::::::::::
@Echo OFF
SetLocal EnableExtensions EnableDelayedExpansion
Set "delimiter=;"
Set "in=your input file"
Set "out=%in%"
Set "overwrite=yes"
For /F "usebackq skip=3 delims=" %%a in (`TYPE "%in%"`) Do (
If "!overwrite!" EQU "yes" (TYPE NUL:>"%out%" & Set "overwrite=")
Set "line=%%a"
CALL :sub
(Echo;!line!)>>"%out%"
)
TYPE "%out%"
Goto :EOF
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:sub
Set "line=!line: = !"
Set line|FindStr /l /C:" " >NUL: && Goto :sub
Set "line=!line: =%delimiter%!"
Goto :EOF
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: END SCRIPT :::::::::::::::::::::::::::::::::::::::::::::::::::::::

Frank

Frank Westlake

unread,
Dec 29, 2012, 10:04:16 AM12/29/12
to
All the preceding will lose the character '!' because of delayed
expansion. The following two scripts are the version which writes to a
new file and the version which optionally overwrites, both modified to
preserve all characters.

:: BEGIN SCRIPT :::::::::::::::::::::::::::::::::::::::::::::::::
@Echo OFF
SetLocal EnableExtensions DisableDelayedExpansion
Set "delimiter=;"
Set "in=your input file"
Set "out=your output file"
TYPE NUL: >"%out%"
For /F "usebackq skip=3 delims=" %%a in ("%in%") Do (
Set "line=%%a"
CALL :sub
)
TYPE "%out%"
Goto :EOF

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:sub
SetLocal EnableExtensions EnableDelayedExpansion
Set "line=!line: = !"
Set line|FindStr /l /C:" " >NUL: && Goto :sub
Set "line=!line: =%delimiter%!"
(Echo;!line!)>>"%out%"
Goto :EOF
:: END SCRIPT ::::::::::::::::::::::::::::::::::::::::::::::::::::


:: BEGIN SCRIPT :::::::::::::::::::::::::::::::::::::::::::::::::::::
@Echo OFF
SetLocal EnableExtensions DisableDelayedExpansion
Set "delimiter=;"
Set "in=your input file"
Set "out=%in%"
REM Undefine "overwrite" to append to the output file.
REM Set "overwrite="
Set "overwrite=yes"
For /F "usebackq skip=3 delims=" %%a in (`TYPE "%in%"`) Do (
If DEFINED overwrite (TYPE NUL:>"%out%" & Set "overwrite=")
Set "line=%%a"
CALL :sub
)
TYPE "%out%"
Goto :EOF
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:sub
SetLocal EnableDelayedExpansion
Set "line=!line: = !"
Set line|FindStr /l /C:" " >NUL: && Goto :sub
Set "line=!line: =%delimiter%!"
(Echo;!line!)>>"%out%"
Message has been deleted

Lawrence Knowlton

unread,
Dec 30, 2012, 9:08:39 AM12/30/12
to
Hi Frank,
Wow! Thanks for all the options Frank! I'd love to know where you learned the syntax for all this.

Larry

foxidrive

unread,
Dec 30, 2012, 9:11:02 AM12/30/12
to
On 31/12/2012 1:03 AM, Lawrence Knowlton wrote:
>> > @echo off
>> >
>> > sed -e 1,3d -e "s/ */|/g" "file.txt" >"file3.txt"
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> >
>> > foxi
> Hi foxi,
> What all are the -e's for? Otherwise I think that'd be perfect!
> Thanks!

They allow two or more commands in the same sed command. Use GnuSED.

--
foxi

Lawrence Knowlton

unread,
Dec 30, 2012, 9:16:16 AM12/30/12
to
Thanks foxi, after re-examining your code, I see what you mean. So, unxutils won't do it eh? Drat!

Larry

Frank Westlake

unread,
Dec 30, 2012, 9:39:32 AM12/30/12
to
On 2012-12-30 06:08, Lawrence Knowlton wrote:
> I'd love to know where you learned the syntax for all this.

From this group. Back before Windows XP we used to explore the
abilities of CMD. After XP came out the alt.msdos.batch people came to
this group and turned it into a free script writing forum using mostly
COMMAND.COM procedures. Gradually it has been transitioning back into an
educational forum for CMD.EXE so stick around and read the scripts.

Since this is Usenet and not e-mail it is not helpful to repeat the
contents of everything in the message you are replying to. Usenet
maintains a history which is generally available through your news
agent. Just repeat enough of the relevant portions to remind people of
what you are commenting on.

Frank

foxidrive

unread,
Dec 30, 2012, 9:41:00 AM12/30/12
to
On 31/12/2012 1:16 AM, Lawrence Knowlton wrote:
>> >
>> > They allow two or more commands in the same sed command. Use GnuSED.
>> >
> Thanks foxi, after re-examining your code, I see what you mean. So, unxutils won't do it eh? Drat!

This should work if your sed handles the same command structure.

sed 1,3d file.txt | sed "s/ */|/g" >"file3.txt"



--
foxi

Lawrence Knowlton

unread,
Dec 31, 2012, 1:32:45 PM12/31/12
to
Hi Foxi,
I figured that that's what would have to happen w/o the -e switch, though the unxutil's sed does have the -e switch. It just doesn't go into the s/ / switch.
Thanks!

Larry

foxidrive

unread,
Dec 31, 2012, 8:38:01 PM12/31/12
to
On 1/01/2013 5:32 AM, Lawrence Knowlton wrote:

>>> Thanks foxi, after re-examining your code, I see what you mean. So, unxutils won't do it eh? Drat!
>>
>>
>>
>> This should work if your sed handles the same command structure.
>>
>>
>>
>> sed 1,3d file.txt | sed "s/ */|/g" >"file3.txt"
>>
>>
>
> Hi Foxi,
> I figured that that's what would have to happen w/o the -e switch, though the unxutil's sed does have the -e switch. It just doesn't go into the s/ / switch.
> Thanks!
>

The version of unxutls that I have installed works with it, Larry.

And it's GnuSed

c:\Files\Util\UnxUtils>sed -V
GNU sed version 3.02



What error do you get?



--
foxi
0 new messages