pattern for substitution including linefeed and carriage return

42 views
Skip to first unread message

Erhy

unread,
Dec 10, 2017, 11:08:36 AM12/10/17
to vim_use
Hello!
I got an CSV file in which some items have more lines
and I want to delete them. This items have also textmarkers "
e.g.
30.11.2017;"Name";"Legend
for name
is not found";"New York";

How I remove such

"Legend
for name
is not found"

with wildcards focused only on linefeeds in such items.

Thank you Erhy

Gary Johnson

unread,
Dec 10, 2017, 12:53:17 PM12/10/17
to vim_use
A pattern that will match that string is

"Legend\nfor name\nis not found"

\n matches the end-of-line marker in the Vim buffer. Vim converts
a file's end-of-line markers to its internal end-of-line markers
when it reads a file, and writes the appropriate end-of-line markers
to a file when it writes a file, according to the setting of
'fileformat', so you don't need to concern yourself with whether the
file uses LF or CR-LF at the ends of lines.

To delete all lines containing that string, you could use:

:g/"Legend\nfor name\nis not found"/.,+2d

The :g/<pattern>/d command deletes only the line at which the
pattern matches, which would delete only the line containing
'"Legend'. The .,+2 is needed to delete all three lines.

See

:help /\n
:help 'fileformat'
:help /\_.
:help :range

HTH,
Gary

C.v.St.

unread,
Dec 10, 2017, 2:01:55 PM12/10/17
to vim...@googlegroups.com

Am 12/10/2017 um 05:08 PM schrieb Erhy:
> ... This items have also textmarkers " e.g.
> 30.11.2017;"Name";"Legend
> for name
> is not found";"New York";
So You want to delete the contents of cases, where
the Last thing of the line is not the ", followed
by line(s) with no " at all, up to the line, which
contains, but does not begin with "?
(Which misses some cases, and will work only mostly,
as it depends on having the single ;" in the first
and the single "; on the last line of change.
So Lines with broken pairs of ", or with extra newline
without ", or lines with newline in the first or the
last field, would break or be ignored.)

The simple case might be done with:

:s/;"[^"]*\n\([^"]*\n\)[^"]*";/;"";/

OR do you want to remove the newlines only and keep
the text? Which I believe would be more complicated,
because of the newlines inside of the \(...\) pair.

Stucki

Tim Chase

unread,
Dec 10, 2017, 4:44:06 PM12/10/17
to vim...@googlegroups.com
On 2017-12-10 09:52, Gary Johnson wrote:
> A pattern that will match that string is
>
> "Legend\nfor name\nis not found"
>
> :help /\_.

The "\_" convention holds for things other than "." to add the "and
include newline" connotation, so you can change your spaces to

Legend\_s\+for\_s\+name\_s\+is\_s\+not\_s\+found

which would allow a new-line as part of any of the whitespace in your
sentence.

-tim


Erhy

unread,
Dec 10, 2017, 4:50:32 PM12/10/17
to vim_use
Stucki, thanks for your answer!

In my CSV file this multiline fields have
different number of lines, least two
and contain various text.

Is it also possible to delete such items at once?

Thank you all

Erhy

C.v.St.

unread,
Dec 10, 2017, 6:14:44 PM12/10/17
to vim...@googlegroups.com


Am 12/10/2017 um 10:50 PM schrieb Erhy:

> In my CSV file this multiline fields have
> different number of lines, least two
> and contain various text.
>
> Is it also possible to delete such items at once?

>> :s/;"[^"]*\n\([^"]*\n\)[^"]*";/;"";/

As usual :%s..... applies the substitution
to all lines of a file (the complete syntax
of ranges is seen in ":help cmdline-ranges")

Stucki

Eike Rathke

unread,
Dec 11, 2017, 11:38:43 AM12/11/17
to vim_use
Hi Erhy,

On Sunday, 2017-12-10 08:08:36 -0800, Erhy wrote:

> I got an CSV file in which some items have more lines
> and I want to delete them. This items have also textmarkers "
> e.g.
> 30.11.2017;"Name";"Legend
> for name
> is not found";"New York";

As a side note, you are aware that such multi-line field content is
valid in CSV if enquoted? If for some reason the processing software
isn't capable to cope with multi-line content I'd rather suggest to only
replace the embedded newlines with spaces, so the actual field content
is preserved instead of stripped.

Eike

--
OpenPGP/GnuPG encrypted mail preferred in all private communication.
GPG key 0x6A6CD5B765632D3A - 2265 D7F3 A7B0 95CC 3918 630B 6A6C D5B7 6563 2D3A
Care about Free Software, support the FSFE https://fsfe.org/support/?erack
Use LibreOffice! https://www.libreoffice.org/
signature.asc

Erhy

unread,
Dec 11, 2017, 5:07:24 PM12/11/17
to vim_use
Am Montag, 11. Dezember 2017 17:38:43 UTC+1 schrieb Eike Rathke:
> Hi Erhy,

>
> As a side note, you are aware that such multi-line field content is
> valid in CSV if enquoted?

It's also odd for me. But the file comes from a banking institution.
There are only linefeeds in the file and it is art to find the next logical line.
The items which span more line have
"
to mark them as text item

Erhy

C.v.St.

unread,
Dec 13, 2017, 9:04:09 AM12/13/17
to vim...@googlegroups.com


Am 12/11/2017 um 11:06 PM schrieb Erhy:
...
>> valid in CSV if enquoted?
>
> It's also odd for me. But the file comes from a banking institution.

Well, then it's not 'odd' but typical. You get something like:
-------------------------------- kind of symbolically:
first some text like title or explanations, then column/field) names
field;names;separated;by semicolon;and;explaining the following lines
data1;data2;data3;data4 blanks OK;OK! OK;more data up to EOL
DATA1;DATA2;;;;" and like an address with quoted multiple lines
Who
Where
City
etc.etc.etc."
or;even;"a first field
with newline";"a
second
field
with
newlines";still in line 3 of data;end of field 6 of logical line 3
---------------------------------------------------------------------
as long as you have e.g. 5 separators (here ';' so 6 fields of data)
and all cases of 'newline in a field' are quoted in double quotes,
this is correct for the so called 'Comma Separated Values'.
(Where 'banking', mostly uses semicolon for the delimiter,
because '.' is often used in fields.

See more at https://tools.ietf.org/html/rfc4180

BUT parsing such input by vim (i.e. 'regexp') seems to me to be
overly complex (or even unworkable?). Number or fields and newlines
are unlimited. So the structure is a lot easier to work with by
'real' CSV Libraries (e.g. in Perl or Python) or spreadsheet programs.
(Or even the simple 'csvtool' on command line in Linux)

Stucki

Erhy

unread,
Dec 13, 2017, 10:37:37 AM12/13/17
to vim_use
Thank you Stucky,
but I have no Linux.

With my old MS Word 2003 I was able to delete the items with linefeeds

Erhy

Reply all
Reply to author
Forward
0 new messages