Text Filter using bash, sed, and \r not working

34 views
Skip to first unread message

William Dockery

unread,
Nov 13, 2017, 9:09:13 AM11/13/17
to TextWrangler Talk
Hello, I am relatively new to TextWrangler Text Filters, but I have an ongoing need to replace ~ with ~ and linefeed in files (ie, a linefeed after each ~).  I have decided to use a Text Filter using bash and sed.  (If this is the wrong way to begin, please let me know.)

Could someone explain why the following is not working?

#!/usr/bin/env bash
sed -E 's/~/~\r/g';

Interestingly, the above filter produces the same output as:

#!/usr/bin/env bash
sed -E 's/~/~r/g';

Both of the above filters simply put a r after the ~

And:

#!/usr/bin/env bash
sed -E 's/~/~\\r/g';

produces:

~\r    

no successful linefeeds in any of the above scenarios.


Thanks for pointers--

William

Mac OS 10.12.6
TextWrangler 5.5.2

Thomas Fischer

unread,
Nov 13, 2017, 10:55:44 AM11/13/17
to textwr...@googlegroups.com
Hello William,

first, I usually would use TW to do the Job:
replace ~ by ~\r
by using the „Find…“ command (⌘F). You can also us the 
"Multi File search…“ command (shift-⌘F) if you want to make this change in many files.

And if I need to attach a keyboard shortcut to something I have to do very often I use AppleScript (this is another menu in TextWrangler):

tell application "TextWrangler"
replace "~" using "~\\r" searching in text 1 of window 1 options {starting at top:true}
end tell

Hope this helps.
Cheers
Thomas


--
This is the TextWrangler Talk public discussion group.
If you have a feature request or would like to report a problem,
please email "sup...@barebones.com" instead of posting here.
---
You received this message because you are subscribed to the Google Groups "TextWrangler Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to textwrangler...@googlegroups.com.

William Dockery

unread,
Nov 14, 2017, 5:35:48 PM11/14/17
to TextWrangler Talk
Thanks for the reply.

By way of background, I am implementing this find/replace function as part of ongoing daily editing text files for submission to a billing clearinghouse that requires files in a format called HIPAA 5010.  It's a difficult format to work with.  The files are 1 line in size, and the line is long and is punctuated by separators (~) which mark the ends of special strings within the line called "loops" in 5010 parlance.  After any given ~, the next character is a normal alphanumeric character.  There are no spaces in the file.  My ultimate goal is to be able to toggle the file from a single-line file (the necessary format for submission to the clearinghouse, but hard to read) to a multi-line file (to scan it for errors) and back again.

I do sometimes use TW "Find...Replace All" to do this job, using a saved "find ~" and "replace with ~\r", but I must admit that I like text filters because they entail fewer keystrokes (ie, you don't need to select a saved routine or hit return).

So I futzed around a bit and discovered that if I create a TW Text Filter as originally specified and actually type a return instead of putting \r in my sed expression, the filter works 99% of the way that I want it to work.  (I am calling this syntax Attempt #2.)  (Typing a return seems rather non-elegant compared to other sed implementations which accept \r, but 99% is admittedly a high number.) The 1% of missing functionality is that the Text Filter finds the last ~ in the file (which is followed by EOF) and, not surprisingly, adds a return after the ~.  This is non-desired, because if that return persists after I toggle the file back to a single-line file and I forget to delete it manually and I submit the file to the clearinghouse, the file will be rejected for syntactical reasons.  

So, I changed the regex within the sed to the following, expecting that this regex would cause the final ~ (followed by EOF) to be a non-match and therefore not be altered.

Attempt 3:

#!/usr/bin/env bash
sed -E 's/~(.)/~\
\1/g';

However, inexplicably, this does not work; a final return is still added, indicating, I suppose, that TW considers EOF to be a character and therefore considers ~EOF to be a match.

I read in a regex online discussion group that \z is EOF (I'm not sure of this), so I tried the following, but it fails in the same way:

Attempt #4:

#!/usr/bin/env bash
sed -E 's/~([^\z])/~\
\1/g';

Interestingly, if I use TW Find/Replace (using grep), with find= ~(.) and replace= ~\r\1, I get the desired effect:  no return after the last ~.

I just wish I could do that with the sed Text Filter.

If anyone knows how I can skip ~EOF with my sed filter, I would appreciate your thoughts.

Thanks--

William
Reply all
Reply to author
Forward
0 new messages