lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n
\2',lineList[i])
I intend this to match any string "\begin{document}" that doesn't end
in a line ending. If there's no line ending, then, I want to place
two carriage returns between the string and the non-line end
character.
However, this places carriage returns even when the string is followed
directly after with a line ending. Can someone explain to me why this
match is not behaving as I intend it to, especially the ([^$])?
Also, how can I write a regex that matches what I wish to match, as
described above?
Many thanks,
John
> I'm trying to use the following substitution,
>
> lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n
> \2',lineList[i])
>
> I intend this to match any string "\begin{document}" that doesn't end
> in a line ending. If there's no line ending, then, I want to place
> two carriage returns between the string and the non-line end
> character.
>
> However, this places carriage returns even when the string is followed
> directly after with a line ending. Can someone explain to me why this
> match is not behaving as I intend it to, especially the ([^$])?
[^$] matches: not a $ character
You might want [^\n]
--
John Bokma j3b
Blog: http://johnbokma.com/ Facebook: http://www.facebook.com/j.j.j.bokma
Freelance Perl & Python Development: http://castleamber.com/
> I'm trying to use the following substitution,
>
> lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n
> \2',lineList[i])
>
> I intend this to match any string "\begin{document}" that doesn't end
> in a line ending. If there's no line ending, then, I want to place
> two carriage returns between the string and the non-line end
> character.
>
> However, this places carriage returns even when the string is followed
> directly after with a line ending. Can someone explain to me why this
> match is not behaving as I intend it to, especially the ([^$])?
Quoting http://docs.python.org/library/re.html:
"""
Special characters are not active inside sets. For example, [akm$] will
match any of the characters 'a', 'k', 'm', or '$';
"""
>
> Also, how can I write a regex that matches what I wish to match, as
> described above?
I think you want a "negative lookahead assertion", (?!...):
>>> print re.compile("(xxx)(?!$)", re.MULTILINE).sub(r"\1**", "aaa bbb
xxx\naaa xxx bbb\nxxx")
aaa bbb xxx
aaa xxx** bbb
xxx
Thank you, John.
I thought that when you use "r" before the regex, $ matches an end of
line. But, in any case, if I use "[^\n]" as you suggest I get the
same result.
Here's a script that illustrates the problem. Any help would be
appreciated!:
#BEGIN SCRIPT
import re
outlist = []
myfile = "raw.tex"
fin = open(myfile, "r")
lineList = fin.readlines()
fin.close()
for i in range(0,len(lineList)):
lineList[i]=re.sub(r'(\\begin{document})([^\n])',r'\1\n\n
\2',lineList[i])
outlist.append(lineList[i])
fou = open(myfile, "w")
for i in range(len(outlist)):
fou.write(outlist[i])
fou.close
#END SCRIPT
And the file raw.tex:
%BEGIN TeX FILE
\begin{document}
This line should remain right after the above line in the output, but
doesn't
\begin{document}Extra stuff here should appear below the begin line
and does in the output.
%END TeX FILE
r before a string has nothing to do with regexes. It signals a raw
string- escape sequences wont' be escaped.
>>> print 'a\tb'
a b
>>> print r'a\tb'
a\tb
We use raw strings for regexes because otherwise, you'd have to
remember double up all your backslashes. And double up your doubled up
backslashes when you really want a backslash.
Works for me. Do you have a space after the \begin{document} or
something? Because that get moved. You might want to check for
non-whitespace characters in the reges instead of just non-newlines.
Matching the non-whitespace works, but I'm troubled I can't match a
non-end-of-line. No, there was no space after the string.
Thank you for your help, Ben
Here's the important tidbit:
re.sub(r'(\\begin{document})(.+)', r'\1\n\n\2', line)
From the docs:
'.'
(Dot.) In the default mode, this matches any character except a newline.
If the DOTALL flag has been specified, this matches any character
including a newline.
'+'
Causes the resulting RE to match 1 or more repetitions of the preceding
RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s; it will
not match just ‘a’.
And here's the entire program, a bit more pythonically:
8<---------------------------------------------------------------
import re
outlist = []
myfile = "raw.tex"
fin = open(myfile, "r")
lineList = fin.readlines()
fin.close()
for line in lineList:
line = re.sub(r'(\\begin{document})(.+)', r'\1\n\n\2', line)
outlist.append(line)
fou = open(myfile, "w")
for line in outlist:
fou.write(line)
fou.close
8<---------------------------------------------------------------
Hope this helps!
~Ethan~