Using regex to match quotation parts across tab-spaced text

Skip to first unread message

Christoph Ruehlemann

May 23, 2018, 1:14:23 PM5/23/18

I have story transcripts such as this one with quote marks around coherent pieces of direct speech:

> story
1 Mim:\ty’ know how teachers go “E::wawawawawu” (’n tha’ sor’ o’ th’)
2                           \tso I say, I sa’, when you meet anybody
3                            \tand they say to him “how's teaching?”
4                                                    \t“Re::wawawawu”
5                                                \tI say “J’s STOp it
6                                  Ter:\tHheh heh heh [heh    he  he]
7                 Mim:\t\t     [you’re really] actually enjoying it”
8                                        \t[(so I say “don't do it”)]

Using regex I want to extract all instances of direct speech wrapped into left quote marks “ and right quite marks ”.
This regex does extract all direct speech that's within a single line:

pattern <- "“[^”]*”" #
matches <- gregexpr(pattern, story$V1)
quotes <- regmatches(story$V1, matches)
quotes <- unlist(quotes)

[1] "“E::wawawawawu”"   "“how's teaching?”" "“Re::wawawawu”"    "“don't do it”"

It fails to match, however, the direct speech spread from line 5-7 (“J’s STOp it ... [you’re really] actually enjoying it”)

I've tried to include \t as an optional element, thus:

pattern <- "“((\\t{1,})?)[^”]*”"

But that finds the same as the above regex.

Can anybody help?

Thanks in advance!

Stefan Th. Gries

May 23, 2018, 1:42:42 PM5/23/18
to CorpLing with R
If you read the story in line by line (sep="\n"), the regex will of
course not go across vector elements to find the beginning double
quote in line 5 (“J’s STOp it), everything in line 6, and then the end
of the direct speech part in line 7 (Mim:\t\t [you’re really]
actually enjoying it”). In other words, the tab has nothing to do with
it, it's that regexes won't search across vector elements. One way to
address this is to paste together the story into one string.

Bob Green

May 23, 2018, 5:24:38 PM5/23/18

Does anyone know of sources of autobiographical texts of famous/well
known people in a format suitable for text analysis?

Any assistance is appreciated,


Christoph Ruehlemann

May 24, 2018, 3:42:55 AM5/24/18
Thanks, worked!

You received this message because you are subscribed to the Google Groups "CorpLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To post to this group, send email to
Visit this group at
For more options, visit

Reply all
Reply to author
0 new messages