>> Un script externe qui détecte les lignes vides suivies d'une ligne
>> qui contient du texte peut aider ?
>
> Ça ne suffirait pas : les sections, par exemple, peuvent être précédées
> d'une ligne vide mais il ne faudrait pas qu'elles soient marquées comme
> devant être numérotées.
>
>> Ça aurait évidemment l'inconvénient de demander une compilation en
>> deux passes, mais ça semble contenir éventuellement moins d'effets de
>> bords.
>
> ÀMHA, un script externe ne parviendra pas à mieux analyser un
> source .tex que TeX lui-même, non ?
C'est pas tout à fait clair parce que nous ne sommes pas en train de
parler de paragraphes au sens de LaTeX, mais au sens des «consignes
données à une doctorante en droit»; si ça se trouve la différence est
suffisante pour que TeX ne soit pas le bon outil.
Quoi qu'il en soit, j'ai écrit ceci :
====================== CODE ============================
#! /usr/bin/python
# -*- coding: utf8 -*-
# (c) Laurent Claessens
# This code is released under the WTFPL - Do What The Fuck You Want To
Public License
#
http://sam.zoy.org/wtfpl/
#
#
# DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
# Version 2, December 2004
#
# Copyright (C) 2004 Sam Hocevar <
s...@hocevar.net>
#
# Everyone is permitted to copy and distribute verbatim or modified
# copies of this license document, and changing it is allowed as long
# as the name is changed.
#
# DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
# TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
#
# 0. You just DO WHAT THE FUCK YOU WANT TO.
#
from __future__ import unicode_literals
import codecs
import sys
class FakePrint(object):
def __init__(self):
self.record_stdout=sys.stdout
sys.stdout=self
def write(self,text):
octet=text.encode("utf8")
self.record_stdout.write(octet)
# the list non_text_macros is the list of things that can be found at
the begginig of a line
# without being the beginning of a paragraph
non_text_macros=[ r"\begin{"+x+"}" for x in ["enumerate","itemize"] ]
non_text_macros.append("\end{")
non_text_macros.append("\input{")
non_text_macros.append("\\bibitem")
non_text_macros.append("\makeatletter")
non_text_macros.append("subitem")
non_text_macros.append("indexspace")
non_text_macros.append("\\tableofcontents")
# The list `exclude_environments` is the list of environments not to be
taken into account
exclude_environments=["equation","equation*","split","aligned","align","subequations","pspicture"]
# The list `add_line` is the list of thinks that mark that the next text
will be a new paragraph.
# We will add an empty line after them. They are typically sectioning.
add_line = [ "\\"+x for x in
["part","chapter","section","subsection","numcases"] ]
add_line.append("\item")
add_line.append("\\begin{")
# This is a list of thinks that are indicating that we are not beginning
a paragraph
non_text_line = non_text_macros+add_line
found_macros=[]
withe_list=[" "," ","\n"] # the second is a TAB
def is_empty(line):
"""
return if `line` is empty, that is if it
contains only TAB and spaces.
"""
for c in line:
if c not in withe_list:
return False
return True
def is_text_line(line):
"""
Says True if the line is considered as a text line.
"""
if is_empty(line):
return False
for special in non_text_line:
if line.startswith(special):
return False
if line[0]=="\\":
found_macros.append(line[:line.find(" ")])
return True
def find_firt_non_withe(line):
for i,c in enumerate(line):
if c not in withe_list:
return i
def remove_first_withe(line):
"""
return the line without the first withe characters.
"""
return line[find_firt_non_withe(line):]
def is_to_be_added_a_line(line):
"""
Return True if a blank line has to be added after `line`
"""
for env in exclude_environments:
if line.startswith("\\begin{"+env+"}"):
return False
for sec in add_line :
if line.startswith(sec):
return True
return False
def preparation(code):
"""
Remove spaces at the beginning of lines
Add a withe line after sectioning
Remove the comments
NOTE : code is a _list_ of strings. The retured value is also a list.
"""
new_code=[]
for line in code:
new_code.append(remove_first_withe(line))
code=new_code
new_code=[]
nested_exclude=0
for line in code:
for env in exclude_environments:
if line.startswith("\\begin{"+env+"}"):
nested_exclude=nested_exclude+1
if line.startswith("\\end{"+env+"}"):
nested_exclude=nested_exclude-1
new_code.append(line)
if is_to_be_added_a_line(line) and nested_exclude==0:
new_code.append("\n")
code=new_code
new_code=[]
for line in code:
if not line.startswith("%"):
new_code.append(line)
return new_code
def hack(code,command):
"""
Add `command` before each new paragraph in `code`.
`code` is a list of strings while the returned value is a string.
"""
new_code=[]
nested_exclude=0
in_document=True
for line in code:
if line.startswith(r"\begin{document}"):
in_document=False
for i,line in enumerate(code):
if line.startswith(r"\begin{document}"):
in_document=True
for env in exclude_environments:
if line.startswith("\\begin{"+env+"}"):
nested_exclude=nested_exclude+1
if line.startswith("\\end{"+env+"}"):
nested_exclude=nested_exclude-1
new_code.append(line)
if in_document and nested_exclude==0 :
if is_empty(line):
try:
if is_text_line(code[i+1]):
new_code.append(command)
except IndexError : # This should only happen at the
end of the file.
pass
return "".join(new_code)
FakePrint()
source_file = codecs.open(sys.argv[1],encoding="utf8",mode="r")
code=list(source_file)
ready_code = preparation(code)
print hack(ready_code,"\youpie\n")
print "For your informations, the following macros were found at the
begining of text lines"
for m in found_macros:
print m
================ FIN DU CODE ========================
Le programme prend un argument le nom d'un fichier tex et retourne (à
l'écran) le code contenant des \youpie au début de chaque paragraphe.
J'ai testé sur un cours de math de 114 pages, et je crois que c'est bon.
Ce n'est certainement pas optimisé, et il faut certainement encore
compléter les listes d'environnements à nier ou à traiter.
Je crois aussi qu'il crachera lamentablement sur des sources non en utf8.
Il y a surement plein de cas tangents non traité : par exemple un \par
ou des \\ explicites au beau milieu d'une ligne.
Bonne semaine
Laurent
PS : Je suis preneur d'un code source de thèse en droit pour voir à quoi
ça ressemble et affiner mon script.