Thanks for the reply.
By way of background, I am implementing this find/replace function as part of ongoing daily editing text files for submission to a billing clearinghouse that requires files in a format called
HIPAA 5010. It's a difficult format to work with. The files are 1 line in size, and the line is long and is punctuated by separators (~) which mark the ends of special strings within the line called "loops" in 5010 parlance. After any given ~, the next character is a normal alphanumeric character. There are no spaces in the file. My ultimate goal is to be able to toggle the file from a single-line file (the necessary format for submission to the clearinghouse, but hard to read) to a multi-line file (to scan it for errors) and back again.
I do sometimes use TW "Find...Replace All" to do this job, using a saved "find ~" and "replace with ~\r", but I must admit that I like text filters because they entail fewer keystrokes (ie, you don't need to select a saved routine or hit return).
So I futzed around a bit and discovered that if I create a TW Text Filter as originally specified and actually type a return instead of putting \r in my sed expression, the filter works 99% of the way that I want it to work. (I am calling this syntax Attempt #2.) (Typing a return seems rather non-elegant compared to other sed implementations which accept \r, but 99% is admittedly a high number.) The 1% of missing functionality is that the Text Filter finds the last ~ in the file (which is followed by EOF) and, not surprisingly, adds a return after the ~. This is non-desired, because if that return persists after I toggle the file back to a single-line file and I forget to delete it manually and I submit the file to the clearinghouse, the file will be rejected for syntactical reasons.
So, I changed the regex within the sed to the following, expecting that this regex would cause the final ~ (followed by EOF) to be a non-match and therefore not be altered.
Attempt 3:
#!/usr/bin/env bash
sed -E 's/~(.)/~\
\1/g';
However, inexplicably, this does not work; a final return is still added, indicating, I suppose, that TW considers EOF to be a character and therefore considers ~EOF to be a match.
I read in a regex online discussion group that \z is EOF (I'm not sure of this), so I tried the following, but it fails in the same way:
Attempt #4:
#!/usr/bin/env bash
sed -E 's/~([^\z])/~\
\1/g';
Interestingly, if I use TW Find/Replace (using grep), with find= ~(.) and replace= ~\r\1, I get the desired effect: no return after the last ~.
I just wish I could do that with the sed Text Filter.
If anyone knows how I can skip ~EOF with my sed filter, I would appreciate your thoughts.
Thanks--
William