Delete empty last line

99 views
Skip to first unread message

gl

unread,
May 14, 2014, 4:24:24 AM5/14/14
to text...@googlegroups.com
Hi there, 

I get sometimes files with an empty last line. Can someone show me a way to get rid of that line?

Yours,
Gil

Mark Munz

unread,
May 14, 2014, 11:01:13 AM5/14/14
to text...@googlegroups.com
Not quite sure what you mean by "empty last line".

There are two variations of this that I can think of.

I'm assuming both cases are at the end of the text.

We'll need a custom cleaner with a regular expression action to accomplish this.
In both cases, we want to anchor it to the end of the text. In this case, we'll got with \Z which anchors to the end of the input (text).

1) blank line (as in just a return)

- Find and Replace Text
| type: regex
| options: 
| find: \n+\Z
| replace: 

This actually will remove one or more \n from the end of the text. The replace is empty, which effectively deletes the matched text.

2) line with just spaces and return

- Find and Replace Text
| type: regex
| options: 
| find: \x{20}+\n\Z
| replace: 

This one is a little different. It matches one or more \x{20} (which is hex for a space) followed by a return and end of input. Again, the replace is empty to delete the matched text.
This specifically leaves additional returns.

Another option is similar, but will remove any whitespace (returns, spaces, tabs) from the end of the document.

- Find and Replace Text
| type: regex
| options: 
| find: \s+\Z
| replace: 

\s is the special regex sequence that specifies a whitespace character. The + indicates one or more of the preceding character. The \Z is anchors this match to the end of the text. Finally, the blank replace means we'll delete this match.

You can find more information about the character sequences in regex by selecting Help > Regex Reference in TextSoap.
The regular expressions feature is very powerful, but does have a learning curve.



--
You received this message because you are subscribed to the Google Groups "TextSoap" group.
To unsubscribe from this group and stop receiving emails from it, send an email to textsoap+u...@googlegroups.com.
To post to this group, send email to text...@googlegroups.com.
Visit this group at http://groups.google.com/group/textsoap.
For more options, visit https://groups.google.com/d/optout.



--
Mark Munz
unmarked software
http://www.unmarked.com/

Thomas Falk

unread,
May 14, 2014, 11:20:42 AM5/14/14
to text...@googlegroups.com
Hi Markus,

1st option seems to work. The file is a .csv which comes from someone who edits it in Excel. Redmond's Finest always adds a blank line at the end of the .csv which caused me some pain in the postprocessing.

Thanks for your help. \Z wasn't on my radar :-)

Yours,
Gil
signature.asc

Mark Munz

unread,
May 14, 2014, 12:25:53 PM5/14/14
to text...@googlegroups.com
It's always difficult to offer super generic advice on regex because it really depends on the data you're working with.
\A and \Z are more definitive anchors than ^ and $, which can change depending on options set.

These days, I putting forth more examples that use ^ and $ mostly when working in multiline mode (^ matches beginning of line, $ matches end of line) and stick with \A and \Z for anchoring on entire input text. The hope is to avoid ambiguity in how most regex currently defines ^ and $ (where meaning changes based on options set). :)
Reply all
Reply to author
Forward
0 new messages