/Troels
You can use the lineno package to give the number of typeset
lines and count the number of characters using wc.
/Troels
The memoir manual (CTAN macros/latex/contrib/memoir/memman.pdf) includes
a table relating line length, characters per line and length (in points) of
the typeset lowercase alphabet. The memoir class includes two macros that,
given a font specification, will calculate the length of lines containing 45
and 65 characters.
Peter W.
> Do you think it is possible to get LaTeX to do the calculation
> automatically?
Depending on what you mean by 'get LaTeX to do the calculation'.
wc is an external script. You can make a shell escape from LaTeX
but I am not sure it is usefull. What are you trying to do?
Currently I am (semi-manually) counting the number of characters per 10
lines and devide by 10. This is a tedious procedure.
/Troels
Roman
Bringhurst provides a method to find the proper text width for any
font. A table gives the relationship between the width of the
lowercase alphabet, the number of characters in a line and the text
width. Plotting his data reveals a linear relationship between the
parameters.
The following is a curve fit of the Bringhurst table and it gives a
very accurate representations of his data.
\newcommand*{\GetTextWidth}[3][\normalfont]{{#1%
\settowidth{#2}{abcdefghijklmnopqrstuvwxyz}%
\setlength{#2}{0.03209#2}%
\addtolength{#2}{0.43753pt}%
\setlength{#2}{#3#2}%
\global#2=#2}}
Usage:
\newlength\txtwdth
\GetTextWidth{\txtwdth}{65} % e.g. for 65 characters
Note that the Bringhurst table is based on the classic way in which
English books were set. It take inter-word spacing, etc. into
account (I think). If you compare that to a direct measurement of a
very long line of English text (look at the algorithm of Rowland
McDonnell in rmpage.sty), you would notice that it differs in the
given width per character. I prefer to go with the Bringhurst table
for English.
Memoir also uses the Bringhust table, but the curve fit was on a
subset of the data and it differs slightly, but not to serious from
the one above.
Danie Els
If I read your code correctly, you have the equation
txtwdth = nChar*(aWdth*0.0309 +0.43753pt),
where txtwdth, nChar, and aWdth denote the text width, the number of
characters per line and the alphabet width, respectively.
> Note that the Bringhurst table is based on the classic way in which
> English books were set. It take inter-word spacing, etc. into
> account (I think). If you compare that to a direct measurement of a
> very long line of English text (look at the algorithm of Rowland
> McDonnell in rmpage.sty), you would notice that it differs in the
> given width per character. I prefer to go with the Bringhurst table
> for English.
Mainly I write in English, but I am a native Danish speaker, so I might
eventually have to write something in Danish. As the letter frequencies
are slightly different in Danish and English, I might need to change
different constants in the above equation. However, I have no idea
whether the optimal text width actually depends on the language. I think
that the above equation can still be used as a guide line.
/Troels
> Martin Heller wrote:
>> Troels Pedersen wrote:
>>
>>> Do you think it is possible to get LaTeX to do the calculation
>>> automatically?
>>
>>
>> Depending on what you mean by 'get LaTeX to do the calculation'. wc
>> is an external script. You can make a shell escape from LaTeX but I
>> am not sure it is usefull. What are you trying to do?
> I am trying to find out how many characters I have per line in order
> to compare readability.
Is characters-per-line a better measure than words-per-line?
> I various books I have read some "optimum"
> line-lengths. Usually these figures are given in characters per line.
That would only make sense in typewriter-type. I suspect a lot of this
has historical roots in the methods of casting-off typescript when
preparing for publication.
> In
> LaTeX the line length is set in cm or pt. The line length should
> therefore be adjusted to the chosen font and the leading (which in my
> case have to be chosen a bit larger than standard to allow for some
> vector notations).
>
> Currently I am (semi-manually) counting the number of characters per
> 10 lines and devide by 10. This is a tedious procedure.
Traditional casting-off was a manual process, averaging the character-
count of the typescript and dividing it by the average character-count
of a size of type to a given measure. Monotype's _Copyfitting Tables_
and Ludlow's cards gave tables and factors for different faces which
speeded it up. Maybe we should do a set for CM and the "35".
But I digress: others have some good suggestions for doing what you
want.
///Peter
i) visual separation of the lines,
ii) ensure that the reader's eye is able to follow one line and then
jump down to the start of the next line.
I have no idea if these are governed by the number of words per line or
the number of characters per line. Probably there are some people that
have investigated this in further detail.
>>I various books I have read some "optimum"
>>line-lengths. Usually these figures are given in characters per line.
>
>
> That would only make sense in typewriter-type. I suspect a lot of this
> has historical roots in the methods of casting-off typescript when
> preparing for publication.
I disagree. If a small font is used, you would probably like the text
width to be small than if a larger font is used. I should have written
"average number of characters per line" in the above.
> Traditional casting-off was a manual process, averaging the character-
> count of the typescript and dividing it by the average character-count
> of a size of type to a given measure. Monotype's _Copyfitting Tables_
> and Ludlow's cards gave tables and factors for different faces which
> speeded it up. Maybe we should do a set for CM and the "35".
I think that it would be nice to have such kind of tables or guidelines.
(Since I am only a stupid engineer I am not quite sure what a Ludlow's
card is. I assume that "35" is some special font. I seldomly use CM
because I find it hard to read on the computer monitor or in a photo copy)
/Troels
w = lc alphabet width, in points
L = desired width of the line, in picas
c = number of characters in the line
c = (L/10)*[3200/(0.857w + 11.42857)]
Not precisely the same as Bringhurst's figures, but the figures are
merely descriptive (hopefully) and not proscriptive.
>> Is characters-per-line a better measure than words-per-line?
> I do not know when we are speaking about readability. Thinking on my
> feet I would say that the readability is governed by two measures
>
> i) visual separation of the lines,
> ii) ensure that the reader's eye is able to follow one line and then
> jump down to the start of the next line.
Basically, yes, although there are at least a dozen other things which
affect it.
> I have no idea if these are governed by the number of words per line
> or the number of characters per line. Probably there are some people
> that have investigated this in further detail.
There is a copious literature on the subject :-) My point was that
counting the letters is unlikely to provide a meaningful measure on its
own.
> I think that it would be nice to have such kind of tables or
> guidelines.
The best explanation of how to do this kind of calculation is in Fyffe's
_Basic Copyfitting_ (Studio Vista, London, 1969) which seems to be at
http://www.amazon.co.uk/exec/obidos/ASIN/0289797055/qid=1134080240/sr=1-1/ref=sr_1_16_1/203-6981017-5341559
It shouldn't be hard to create them from .pl files, but you'd need to
sample a large quantity of common-language text set in each font, and
derive the factor number (the width of the average lowercase character
as a decimal of a pica em), which is a non-trivial task.
> (Since I am only a stupid engineer I am not quite sure what a Ludlow's
> card is. I assume that "35" is some special font. I seldomly use CM
> because I find it hard to read on the computer monitor or in a photo
> copy)
Engineers are not stupid! They rule the world!
Ludlow is a manufacturer of casting machines which produce a single
line-slug of hot metal from a hand-set line of letter-moulds, originally
used in casting headlines, still in use for a lot of hand-set job work.
The "35" are the 35 fonts that come built into PostScript printers, and
which are usually available in all typesetting systems, including LaTeX.
But I think Peter Wilson's suggestion is the way to go.
///Peter
Danie Els (dnjels at sun dot ac dot za)
ps. Beware of line wraps in the code
%% bfit.m %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
clear all
global D W T;
%% T - text width column heads in table
%% textwidth in pica x 12 to get big points 1/72 inc
%%
T= [10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40]*12;
%% Table data (17 x 38)
%% column 1 - lc alphabet width in big points
%% columns 2-17 - characters per text width
%%
Data=[...
80 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160
85 38 45 53 60 68 76 83 91 98 106 113 121 128 136 144 151
90 36 43 50 57 64 72 79 86 93 100 107 115 122 129 136 143
95 34 41 48 55 62 69 75 82 89 96 103 110 117 123 130 137
100 33 40 46 53 59 66 73 79 86 92 99 106 112 119 125 132
105 32 38 44 51 57 63 70 76 82 89 95 101 108 114 120 127
110 30 37 43 49 55 61 67 73 79 85 92 98 104 110 116 122
115 29 35 41 47 53 59 64 70 76 82 88 94 100 105 111 117
120 28 34 39 45 50 56 62 67 73 78 84 90 95 101 106 112
125 27 32 38 43 48 54 59 65 70 75 81 86 91 97 102 108
130 26 31 36 41 47 52 57 62 67 73 78 83 88 93 98 104
135 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
140 24 29 34 39 44 48 53 58 63 68 73 77 82 87 92 97
145 23 28 33 37 42 47 51 56 61 66 70 75 80 84 89 94
150 23 28 32 37 41 46 51 55 60 64 69 74 78 83 87 92
155 22 27 31 36 40 45 49 54 58 63 67 72 76 81 85 90
160 22 26 30 35 39 43 48 52 56 61 65 69 74 78 82 87
165 21 25 30 34 38 42 46 51 55 59 63 68 72 76 80 84
170 21 25 29 33 37 41 45 49 53 57 62 66 70 74 78 82
175 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80
180 20 23 27 31 35 39 43 47 51 55 59 62 66 70 74 78
185 19 23 27 30 34 38 42 46 49 53 57 61 65 68 72 76
190 19 22 26 30 33 37 41 44 48 52 56 59 63 67 70 74
195 18 22 25 29 32 36 40 43 47 50 54 58 61 65 68 72
200 18 21 25 28 32 35 39 42 46 49 53 56 60 63 67 70
210 17 20 23 27 30 33 37 40 43 47 50 53 57 60 63 67
220 16 19 22 25 29 32 35 38 41 45 48 51 54 57 60 64
230 15 18 21 24 27 30 33 36 40 43 46 49 52 55 58 61
240 15 17 20 23 26 29 32 35 38 41 44 46 49 52 55 58
250 14 17 20 22 25 28 31 34 36 39 42 45 48 50 53 56
260 14 16 19 22 24 27 30 32 35 38 41 43 46 49 51 54
270 13 16 18 21 23 26 29 31 34 36 39 42 44 47 49 52
280 13 15 18 20 23 25 28 30 33 35 38 40 43 45 48 50
290 12 15 17 20 22 24 27 29 32 34 37 39 41 44 46 49
300 12 14 17 19 21 24 26 28 31 33 35 38 40 42 45 47
320 11 13 16 18 20 22 25 27 29 31 34 36 38 40 43 45
340 10 13 15 17 19 21 23 25 27 29 32 34 36 38 40 42
360 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40];
W=Data(:,1); %% Alphabet width
D=Data(:,2:end); %% Characters per width
[N,M]=size(D);
%%% Method 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Fit straight line through every data row.
%%%
%% Show linear relation between between
%% text width [T] and data row [D(i,:)].
%%
figure(1)
clf reset
hold on
P=[];
for i=1:N
X=D(i,:);
p=polyfit(X,T,1);
P=[P; p];
plot(T,X,'r.')
plot(polyval(p,X),X,'b')
end
xlabel('Text width [big points]');
ylabel('Number of characters');
%% Fit a straight line through all the
%% slopes previous lines. Assume that all
%% goes through origin.
%%
pw=polyfit(W,P(:,1),1);
%% The number of characters [NC] is then in
%% terms of the text width [T] given by:
%% NC = T/(pw(1)*W(i)+pw(2))
%%Plot the fitted line to inspect the fit.
%%
figure(2)
clf reset
plot(W,P(:,1),'r.',W,polyval(pw,W),'b')
xlabel('Alphabet width [big points]');
ylabel('Slope');
hold on
%% Calculate the error between the fit and
%% the given table
%%
Dx=zeros(size(D));
for i=1:N
Dx(i,:)=T/(pw(1)*W(i)+pw(2));
end
Dx=(D-Dx);
Dx2=Dx.*Dx;
pw;
e=sum(sum(Dx2));
disp(['Method 1']);
disp(['err=' num2str(e) ...
', pw(1)=' num2str(pw(1)) ...
', pw(2)=' num2str(pw(2))]);
%%% Method 2 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Fit a function in a least squares fasion on
%%% the data points. Use previous value as starting
%%% point
options = optimset('fminsearch');
options = optimset(options, 'TolX',0.01);
[pw,e,flag]=fminsearch('minfunc',pw,options);
if (flag==0)
disp('Did not converged');
end
disp(['Method 2']);
disp(['err=' num2str(e) ...
', pw(1)=' num2str(pw(1)) ...
', pw(2)=' num2str(pw(2))]);
%% Write a tex file to show the difference between
%% the curve fit and the original Bringhust table
%%
fh=fopen('bringfit.tex','w');
fprintf(fh,'\\newcount\\tcnta \n\n');
fprintf(fh,'\\newcommand\\wC[1]{%% \n');
fprintf(fh,' \\tcnta=#1\\relax \n');
fprintf(fh,' \\ifnum\\tcnta<0 \n');
fprintf(fh,' \\multiply\\tcnta -1\\relax \n');
fprintf(fh,' ${-}$\\the\\tcnta \n');
fprintf(fh,' \\else \\ifnum\\tcnta>0 \n');
fprintf(fh,' #1 \n');
fprintf(fh,' \\else \n');
fprintf(fh,' ${\\cdot}$ \n');
fprintf(fh,' \\fi\\fi} \n\n');
fprintf(fh,'\\begin{tabular}{@{}r|@{\\qquad}' );
fprintf(fh,'*{15}{r@{\\hspace{2mm}}}r@{}} \n');
fprintf(fh,'& %d',T/12);
fprintf(fh,'\\\\ \n');
fprintf(fh,'\\hline \n');
for i=1:N
fprintf(fh,'%d ',W(i));
for j=1:M
fprintf(fh,'& \\wC{%d}',round(Dx(i,j)));
end
fprintf(fh,'\\\\\n');
end
fprintf(fh,'\\end{tabular} \n');
fclose(fh);
%% minfunc.m %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function e=minfunc(P)
global D W T;
[N,M]=size(D);
Dx=zeros(size(D));
for i=1:N
Dx(i,:)=T/(P(1)*W(i)+P(2));
end
Dx=D-Dx;
Dx2=Dx.*Dx;
e=sum(sum(Dx2));
This, and all other approaches I've seen in this thread, provide a
calculation that is a function of alphabet width (AW). It would seem
that the width of a space ought to be taken into account. Is there any
data that matches both AW and space?
Here's a possible formula, where AW is the alphabet width, SS is the
space skip (in TeX \fontdimen2\font), NC is the number of characters
desired and LL is the resulting line length:
LL = (0.03143*AW + 0.1547*SS)* NC
This is based on the observation that the formulas in memman.pdf have
an added constant that is suspiciously close to 10 spaces (in cmr10)
for NC=65 and to 7 spaces for NC=45.
Now 10/65 and 7/45 are approximately .1538 and .1536, which I've
averaged to get the coefficient of SS above. If I use Danie's numbers
instead of memman's, one would have something like
LL= (0.03209*AW + 0.1317*SS)*NC
which could be thought of as assuming a slight lower proportion of
spaces to letters.
One would expect AW and SS to be closely related, but there can reasons
to adjust the size of spaces. Also, for some fonts, the TFM files are
often adjusted from one version to another, with changes in \fontdimen2
possible. I believe the TeX support files for Times adjusted the space
width at least once in its history.
Dan
If:
w = lc alphabet width, points
L = desired width of line, **points**
c = number of characters in the line
then
c=L*35/(1.125*w+15)
or, given a desired character count in a line, find the length of that
line:
L = c*(1.125*w+15)/35
I use points for L because TeX uses points in its dimension registers.
This accords very closely with Bringhurst's chart, and is very easy to
implement in TeX.
SGM