Hello,
I am currently writing a package which aims at making ligatures in an
automatic way, for each word of a .tex file (which uses this package),
if it needs to be ligatured.
Is there a way to parse all the words of the .tex file and replace
them easily, i.e. if there is
"caetera" in the .tex file, replacing it by
"c\ae tera"?
If so, how?
Thanks.
- --
Merciadri Luca
See http://www.student.montefiore.ulg.ac.be/~merciadri/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
iEYEARECAAYFAksGkUkACgkQM0LLzLt8MhwVxQCePT7jQZQDZGRHTjClGgQM3a+e
K4sAoKwyeZ0GnYwCbGC65TD8aCnWhmrS
=qw6G
-----END PGP SIGNATURE-----
If you are not afraid of going through an intermediate stage you can
typeset your file with your wordlist and process the resulting dvi
file with dvitype which will output the ligatures it found as single
character entities.
HTH,
Oliver.
: I am currently writing a package which aims at making ligatures in an
: If so, how?
--
Dr. Oliver Corff e-mail: co...@zedat.fu-berlin.de
xesearch claims:
The package finds strings (e.g. (parts of) words or phrases) and
manipulates them (apply any macro), thus turning each word or
phrase into a possible command.
(that's from the catalogue entry, which i derived from the author's
ctan upload notice.)
it would seem to meet your requirement; requires xetex. i've never
used it, so can't be sure...
--
Robin Fairbairns, Cambridge
Rather than using macros, I'd do this either w/ ligature commands in a
virtual font, or by using an Omega Translation Process --- see my TUG
2003 presentation for an example:
http://www.tug.org/TUGboat/Articles/tb24-2/tb77adams.pdf
or something like to an OTP if you can't use those --- XeTeX has built-
in support for the SIL TECkit which should be adaptable for this sort
of thing:
http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&cat_id=TECkit
William
Okay. It seems pretty specific, but I do not want to use XeTeX, as my
package will not be used by XeTeX-only users. :-(
- --
Merciadri Luca
See http://www.student.montefiore.ulg.ac.be/~merciadri/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
iEYEARECAAYFAksG0PMACgkQM0LLzLt8MhwsiACgpI5IyiZEKPHoFG5327XoQ87F
9k0An1FoMFnmwpOIbIJOqsbYFPvfnUI+
=9PAe
-----END PGP SIGNATURE-----
rf...@cl.cam.ac.uk (Robin Fairbairns) writes:
> xesearch claims:
>
> The package finds strings (e.g. (parts of) words or phrases) and
> manipulates them (apply any macro), thus turning each word or
> phrase into a possible command.
>
> (that's from the catalogue entry, which i derived from the author's
> ctan upload notice.)
>
> it would seem to meet your requirement; requires xetex. i've never
> used it, so can't be sure...
Thanks Robin. It is a very good idea, but, again, I cannot use XeTeX
as every person who will use my package won't necessarily use XeTeX. :-(
- --
Merciadri Luca
See http://www.student.montefiore.ulg.ac.be/~merciadri/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
iEYEARECAAYFAksG0TMACgkQM0LLzLt8MhzcxgCfRmLmyZV/XffbqUtWpsUD6Vwp
ewgAni2YA0JpcoXemwFBlOkHSovbiK8i
=L/Uk
-----END PGP SIGNATURE-----
Oliver Corff <co...@samoa.zedat.fu-berlin.de> writes:
> If you are not afraid of going through an intermediate stage you can
> typeset your file with your wordlist and process the resulting dvi
> file with dvitype which will output the ligatures it found as single
> character entities.
Thanks.
What do you mean, precisely? I am not afraid of going through
intermediate stages. Typesetting my .sty with my wordlist and
processing the resulting dvi?! I do not understand. I think that we
misunderstood on what I want to do: my aim is to write a package to
make these ligatures automatic, as lots of persons forget which words
need ligatures or not.
Sorry if I did not understand what you meant. I hope we shall find a
nice solution.
- --
Merciadri Luca
See http://www.student.montefiore.ulg.ac.be/~merciadri/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
iEYEARECAAYFAksG0doACgkQM0LLzLt8Mhx5+wCgptjq8Cx5taAXywQH6OwadS8R
v1cAn2GvsLmuw/ZV6A/JV0AiQVgzWSzs
=nRH2
-----END PGP SIGNATURE-----
: Oliver Corff <co...@samoa.zedat.fu-berlin.de> writes:
: > If you are not afraid of going through an intermediate stage you can
: > typeset your file with your wordlist and process the resulting dvi
: > file with dvitype which will output the ligatures it found as single
: > character entities.
: Thanks.
: What do you mean, precisely? I am not afraid of going through
: intermediate stages. Typesetting my .sty with my wordlist and
: processing the resulting dvi?! I do not understand. I think that we
: misunderstood on what I want to do: my aim is to write a package to
: make these ligatures automatic, as lots of persons forget which words
: need ligatures or not.
: Sorry if I did not understand what you meant. I hope we shall find a
: nice solution.
Well, dvitype is a utility for processing dvi files. When you run LaTeX
on a .tex document it will generate a dvi file (device-independent, that
is) from where a printer driver or postscript converter will do its
work.
The dvi file contains the textual information as well as the geometry
information (i.e. where to put everything on the page).
dvitype lets you extract both the text and the position information,
you can choose various levels of verbosity, and text is either given
as [text] or in the form of character code positions; both can be easily
captured and post-processed. Ligatures --- which in fact are individual
character entities --- have their own numbers, different from letters
(anything that is not in the range of n=64+{1..26,32+{1..26}} is not
a Latin letter, it can only be a number or punctuation mark or ligature.
And since you do not opt for XeTeX but want it as simple and plain and
traditional as possible, the route via dvi and dvitype is quite
feasible.
Post-processing requires only a few lines of perl in order to
reconstruct the original text (be it without any mark-up, though)
and you can build lists where your chosen ligatures appear properly.
Oliver.
: - --
: Merciadri Luca
: See http://www.student.montefiore.ulg.ac.be/~merciadri/
: -----BEGIN PGP SIGNATURE-----
: Version: GnuPG v1.4.9 (GNU/Linux)
: Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
: iEYEARECAAYFAksG0doACgkQM0LLzLt8Mhx5+wCgptjq8Cx5taAXywQH6OwadS8R
: v1cAn2GvsLmuw/ZV6A/JV0AiQVgzWSzs
: =nRH2
: -----END PGP SIGNATURE-----
--
: Hello,
: I am currently writing a package which aims at making ligatures in an
: automatic way, for each word of a .tex file (which uses this package),
: if it needs to be ligatured.
: Is there a way to parse all the words of the .tex file and replace
: them easily, i.e. if there is
: "caetera" in the .tex file, replacing it by
: "c\ae tera"?
Actually the effect you desire takes place whenever a document in TeX
is typeset and the font has ligature information. You do not have to do
anything yourself.
Normal ligatures are fi, ff, fl, ffl, ffi, these will be inserted
automatically.
Whether ae is treated as a ligature depends on the font, more precisely
on the ligature table which is part of the meta-information (and
traditionally this information is found in the .tfm file of a TeX font).
If, however, you have to write {\ae} in order to generate the ligature
of this letter, then, strictly speaking you are not using the ligature
mechanism of TeX but writing the following sequence:
[c][\ae][t][e][r][a]
The typed word a f f l u e n t is converted by TeX to
[a][ffl][u][e][n][t]
but only if the font has the ligature ffl.
So instead of going through TeX/LaTeX you can directly access the
ligature tables in the fonts and write a small perl script that
puts brackets around those groups of characters for which it finds
ligature information in the font file.
Oliver.
: : Hello,
Have a look at the ligature table of the basic font Computer Modern
Roman: in lines 10 and 12 of romlig.mf you'll find:
ligtable "f": "i"=:oct"014", "f"=:oct"013", "l"=:oct"015",
ligtable oct"013": "i"=:oct"016", "l"=:oct"017",
which reads as follows:
an "f" followed by an "i" results in character oct 014,
an "f" followed by an "f" results in character oct 013,
an "f" followed by an "l" results in character oct 015,
an "ff" (see above, oct 013) followed by an "i" results in character oct 016,
an "ff" (see above, oct 013) followed by an "l" results in character oct 017.
You have the five ligatures
oct 013: ff
oct 014: fi
oct 015: fl
oct 016: ffi
oct 017: ffl
In Perl, you write a one-liner:
#!/usr/bin/perl -p
s/(ffl|ffi|fl|fi|ff)/{\\$1}/g;
and it will show your ligatures like
a{\ffl}uent
(which you cannot directly typeset as there is no such control sequence
\ffl as long as you do not define it).
Thanks Oliver.
Oliver Corff <co...@samoa.zedat.fu-berlin.de> writes:
> Well, dvitype is a utility for processing dvi files. When you run LaTeX
> on a .tex document it will generate a dvi file (device-independent, that
> is) from where a printer driver or postscript converter will do its
> work.
>
> The dvi file contains the textual information as well as the geometry
> information (i.e. where to put everything on the page).
I did not learn anything until here.
> dvitype lets you extract both the text and the position information,
> you can choose various levels of verbosity, and text is either given
> as [text] or in the form of character code positions; both can be easily
> captured and post-processed. Ligatures --- which in fact are individual
> character entities --- have their own numbers, different from letters
> (anything that is not in the range of n=64+{1..26,32+{1..26}} is not
> a Latin letter, it can only be a number or punctuation mark or ligature.
>
> And since you do not opt for XeTeX but want it as simple and plain and
> traditional as possible, the route via dvi and dvitype is quite
> feasible.
dvitype is actually a very good idea, as you preconize it.
> Post-processing requires only a few lines of perl in order to
> reconstruct the original text (be it without any mark-up, though)
> and you can build lists where your chosen ligatures appear properly.
Thanks. Very clear.
- --
Merciadri Luca
See http://www.student.montefiore.ulg.ac.be/~merciadri/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
iEYEARECAAYFAksHv5QACgkQM0LLzLt8MhzZ6ACfc8R+GZIUbQ2f7X66Z5o6T2v2
9l8An06F+Ht+hCjym3ATPpKGCYI5iTVt
=mKrh
-----END PGP SIGNATURE-----
Oliver Corff <co...@samoa.zedat.fu-berlin.de> writes:
> Actually the effect you desire takes place whenever a document in TeX
> is typeset and the font has ligature information. You do not have to do
> anything yourself.
>
> Normal ligatures are fi, ff, fl, ffl, ffi, these will be inserted
> automatically.
I should have specified that I am only looking for *linguistic*
ligatures, the other ones being automatically set by LaTeX when using
a dedicated font.
> Whether ae is treated as a ligature depends on the font, more precisely
> on the ligature table which is part of the meta-information (and
> traditionally this information is found in the .tfm file of a TeX
> font).
For sure.
> If, however, you have to write {\ae} in order to generate the ligature
> of this letter, then, strictly speaking you are not using the ligature
> mechanism of TeX but writing the following sequence:
> [c][\ae][t][e][r][a]
>
> The typed word a f f l u e n t is converted by TeX to
> [a][ffl][u][e][n][t]
> but only if the font has the ligature ffl.
>
> So instead of going through TeX/LaTeX you can directly access the
> ligature tables in the fonts and write a small perl script that
> puts brackets around those groups of characters for which it finds
> ligature information in the font file.
It does not seem to be difficult. Thanks.
- --
Merciadri Luca
See http://www.student.montefiore.ulg.ac.be/~merciadri/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
iEYEARECAAYFAksHwNgACgkQM0LLzLt8MhzGrgCgjLDVWsMOOOhj8TgDh+rLmNAZ
Hf8Anjun4AberWVdb9H/hoCAHac6mZec
=JZdu
-----END PGP SIGNATURE-----
Oliver Corff <co...@samoa.zedat.fu-berlin.de> writes:
> Have a look at the ligature table of the basic font Computer Modern
> Roman: in lines 10 and 12 of romlig.mf you'll find:
>
> ligtable "f": "i"=:oct"014", "f"=:oct"013", "l"=:oct"015",
> ligtable oct"013": "i"=:oct"016", "l"=:oct"017",
>
> which reads as follows:
> an "f" followed by an "i" results in character oct 014,
> an "f" followed by an "f" results in character oct 013,
> an "f" followed by an "l" results in character oct 015,
> an "ff" (see above, oct 013) followed by an "i" results in character oct 016,
> an "ff" (see above, oct 013) followed by an "l" results in character oct 017.
>
> You have the five ligatures
> oct 013: ff
> oct 014: fi
> oct 015: fl
> oct 016: ffi
> oct 017: ffl
>
> In Perl, you write a one-liner:
>
> #!/usr/bin/perl -p
> s/(ffl|ffi|fl|fi|ff)/{\\$1}/g;
>
> and it will show your ligatures like
>
> a{\ffl}uent
>
> (which you cannot directly typeset as there is no such control sequence
> \ffl as long as you do not define it).
Okay. Thanks for these precise info.
- --
Merciadri Luca
See http://www.student.montefiore.ulg.ac.be/~merciadri/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
iEYEARECAAYFAksHwQUACgkQM0LLzLt8MhxROQCfUG5wk2IsjejVJNnDlNc9wK7T
/ckAn2aJHYCMCW09KPFXGHIm4993/HIA
=RAKR
-----END PGP SIGNATURE-----