
如何計算文檔每行的平均字元數?
答案1
這並不完美,但在 Linux 終端機中您可以:
$ pdftotext -layout test.pdf
$ wc -l -w -m test.txt
輸出將是這樣的:
96 986 6673 test.txt
所以總字元數(6637)除以行數(96)就是你想要的平均值(也計算空格,但考慮到有986個單詞,你可以粗略地計算出沒有空格)
對於Windows用戶,我認為有相同或相似的程式(但我沒有測試過)廁所,pdf文件(包括 pdftotext)和免費 pdf 轉文字轉換器。
這種方法的一個問題是你無法區分普通文字和標題、數字等。
在沒有外部程式的情況下,有一些方法可以做出粗略的預測。有些是用這個 tex (不是最小的)範例不言自明的:
\documentclass{article}
% \usepackage[chars=60, lines=30, hyphen=true, noindent]{stdpage}
\usepackage{lineno}
\usepackage{canoniclayout}
\usepackage{amssymb}
\usepackage{hyperref}
\usepackage{calc}
\usepackage{lipsum}
\usepackage{xcolor}
\newlength{\oneem}
\setlength{\oneem}{1em}
\newlength{\ispace}
\settowidth{\ispace}{i}
\newlength{\mspace}
\settowidth{\mspace}{m}
\newlength{\alphabet}
\settowidth{\alphabet}
{abcdefghijklmnopqrstuvwxyz}
\usepackage{geometry}
\geometry{textwidth=2.5\alphabet}
%\geometry{textwidth=26ex}
\newlength{\avgchar}
\setlength{\avgchar}
{\textwidth/65}
\pagestyle{empty}
\setlength{\parskip}
{\bigskipamount}
\begin{document}
\section*{How to estimate characters per line win \LaTeX}
\subsection*{A dirty way: Fix {\tt textwidth} to $n$ alphabets or $n$ em units}
As you can see in the preamble, the text width of this text
is fixed to 2.5 times the length of the alphabet with 26
characters (\the\alphabet) with a default font size of
\the\oneem, resulting in \the\textwidth per line.
Therefore should be 26 * 2.5 = 65 characters per line (a
good value according to the Bringhurst rule) where each
character have a with of \the\avgchar~in average (note that is
roughly one half of the font size, i.e, each character is
aprox. $\thickapprox\frac{1}{2}$em and then fixing
{\tt \textbackslash{textwidth}} in em units allow also an
easy calculation of the number of characters.
But this prediction is only very useful if you plan to
write $n$ times of complete alphabets. Only then really 130
characters (5 alphabets) fill two lines ...
\begin{linenumbers}
\noindent \textcolor{blue}{abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmn
opqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz}
\end{linenumbers}
.
.. 260 characters fill 4 lines, and so on.
Unfortunately, for any other text this rule is less because
the proportion of each character changes. For example, the 10
first paragrahs of \emph{Lore Ipsum} (below only the first)
with this format produce 96 lines with 986 words and 6673
characters. That is 69 characters per line (counting the
spaces), not 65.
\begin{linenumbers}
\textcolor{blue}{\lipsum[1]}
\end{linenumbers}
Not too bad prediction after all, taking into account that
the thin $i$ (\the\ispace) appears 503 times while the tick
$m$ (\the\mspace) appears only 218 times, and moreover, there
are variable spaces and signs of punctuation.
Of course, you can also fix any predetermined
{\tt \textbackslash{textwidth}} in pt, cm, etc., calculate the
width of your preferred font (
\url{http://tex.stackexchange.com/questions/60277/average-width-of-popular-tex-fonts}) and then simply do some math.
\subsection*{A quicker way: the {\tt canoniclayout} package}
\begin{itemize}
\item Put {\tt \textbackslash{}usepackage\{canoniclayout\}} in the preamble
\item And {\tt \textbackslash{currentfontletters}} in the body of the document:
\framebox{\begin{minipage}[t]{1\columnwidth}%
\currentfontletters
\end{minipage}}
\item Or {\tt \textbackslash{charactersperpage}}:
\framebox{\begin{minipage}[t]{1\columnwidth}%
\charactersperpage
\end{minipage}}
\end{itemize}
Note that this package make estimates of the amount of
characters and lines that the page layout could have, do not
count the number of compiled lines\footnote{
For this you can use {\tt lineno} package as above in the \emph{Lore Ipsum} ouput}.
\subsection*{A strange way: the {\tt stdpage} package}
Produce a format with a nonproportional font but with 30
lines and 60 characters and about 1440 character (german
“Normseite”) by default. The number of characters and lines
can be adjusted:
{\tt \textbackslash{}usepackage{[}chars=65, lines=30, noindent{]}\{stdpage\}}
Probably you don't want print the final version in this
format, but temporally could be useful to compare the lengths
of text in each line, as well as the amount of pages, with the
version of the proportional font and so obtain some
information about the character density in your text.
\subsection*{Another unexplored ways}
It saw that in ConTeX (I never used) there are a
\textbackslash{averagecharwidth} command. Please see
\url{http://tex.stackexchange.com/questions/68105/macro-for-the-average-width-of-a-character}
\end{document}
但更實用的方法是 TeXcount,因為可以更好地控制您的計數。讓我用這個文件來解釋:
% CAUTION !!!
% 1) Need --enable-write18 or --shell-escape
% 2) This file MUST be saved
% as "borra.tex" before the compilation
% in your working directory
% 3) This code will write wordcount.tex
% and charcount.tex in /tmp of your disk.
% (Windows users must change this path)
% 4) Do not compile if you are unsure
% of what you are doing.
\documentclass{article}
\usepackage{lineno} % for line numbers
\usepackage{moreverb} % for verbatim ouput
% Only for format purposes
\usepackage{geometry}
\geometry{verbose,tmargin=2cm,bmargin=2cm,lmargin=6cm,rmargin=3cm}
\usepackage{graphicx}
\setlength{\parskip}{\bigskipamount}
\setlength{\parindent}{1em}
% Count of words
\immediate\write18{texcount -inc -incbib
-sum borra.tex > /tmp/wordcount.tex}
\newcommand\wordcount{
\verbatiminput{/tmp/wordcount.tex}}
% Count of characters
\immediate\write18{texcount -char -freq
borra.tex > /tmp/charcount.tex}
\newcommand\charcount{
\verbatiminput{/tmp/charcount.tex}}
% Only two example lengths
\newlength{\ispace}
\settowidth{\ispace}{i}
\newlength{\mspace}
\settowidth{\mspace}{m}
\begin{document}
% Note that the next line is NOT a comment
%TC:ignore
{\bf Note}: Comparison of source and compiled
version of this file must be self
explanatory. See {\tt lineno}
and {\tt\TeX count} documentation
if you need more information.
\noindent\resizebox{\textwidth}{!}{\bf How to determine characters per line with \LaTeX}
With few \LaTeX{} commands we can see in the
compiled document the number of lines (with
package {\tt lineno}) and count the number of
words and characters for the whole documents
or some parts with the aid of the {\tt\TeX
count} program (included in \TeX{} Live).
The rest is child's play:
We can determine that in the example (see
below) the text of the example section
(without head nor float nor subsection),
there are 7-1=6 lines (see left margin) with
70 words (see page 2) and 350 characters
(see page 3), so that there is an average of
70/6 = 11.6 words and 350/6 = 58.3
characters per line.
Moreover, as frequency and with of each
character can be determined (see page 3)
it is also easy obtain the average width of
these characters in a long reference text to
make predictions of characters per line in
texts of the same language/style
that have not yet been written.
In the whole example there are 454
characters: 25 $i$ with widths of
\the\ispace{} , 16 $m$ with widths of
\the\mspace{} \dots and so on.
Therefore the average will be
$$\frac{(25*2.77)+(16*8.33)+ ...}{454}$$
And the text width (\the\textwidth{} in
the example) divided by this average will
give the prediction of characters per line.
That's all.
\dotfill Start of the example text \dotfill
%TC:endignore
\linenumbers
\section{Section: text example with a float}
Words and characters of this example file are
automatically counted from the source file
when compiled (therefore generated text as
\textbackslash{}lipsum[1-10] is {\bf not}
counted). The results are showed at the end
of the compiled version.
Counts are made in headers, caption floats
and normal text for the whole file. Subcounts
for structured parts (sections, subsections,
etc.) are also made. Number of headers,
floats and math chunks are also counted.
\begin{figure}[h]
\centering
\framebox{This is only a example float}
\caption{This is a example caption}
\end{figure}
\subsection*{Subsection: Little text
with math chunks}
In line math: $\pi +2 = 2+\pi$ \\
Display math: \[\pi +2 = 2+\pi\]
\nolinenumbers
%TC:ignore
\dotfill End of the example text \dotfill
\newpage
\subsubsection*{Counts of words}
\wordcount
\newpage
\subsubsection*{Counts of characters
and frequencies}
\charcount
%TC:endignore
\end{document}