Hi,
I have story transcripts such as this one with quote marks around coherent pieces of direct speech:
> story
V1
1 Mim:\ty’ know how teachers go “E::wawawawawu” (’n tha’ sor’ o’ th’)
2 \tso I say, I sa’, when you meet anybody
3 \tand they say to him “how's teaching?”
4 \t“Re::wawawawu”
5 \tI say “J’s STOp it
6 Ter:\tHheh heh heh [heh he he]
7 Mim:\t\t [you’re really] actually enjoying it”
8 \t[(so I say “don't do it”)]
Using regex I want to extract all instances of direct speech wrapped into left quote marks “ and right quite marks ”.
This regex does extract all direct speech that's within a single line:
pattern <- "“[^”]*”" #
matches <- gregexpr(pattern, story$V1)
quotes <- regmatches(story$V1, matches)
quotes <- unlist(quotes)
quotes
[1] "“E::wawawawawu”" "“how's teaching?”" "“Re::wawawawu”" "“don't do it”"
It fails to match, however, the direct speech spread from line 5-7 (“J’s STOp it ... [you’re really] actually enjoying it”)
I've tried to include \t as an optional element, thus:
pattern <- "“((\\t{1,})?)[^”]*”"
But that finds the same as the above regex.
Can anybody help?
Thanks in advance!
Best
Chris