한 줄의 평균 문자 수

한 줄의 평균 문자 수

내 문서의 한 줄당 평균 문자 수를 어떻게 계산합니까?

답변1

완벽하지는 않지만 Linux 터미널에서는 다음을 수행할 수 있습니다.

$ pdftotext -layout test.pdf
$ wc -l -w -m  test.txt

출력은 다음과 같습니다.

96  986 6673 test.txt

따라서 총 문자(6637)를 줄(96)으로 나눈 값은 원하는 평균입니다(공백도 계산하지만 986 단어가 있다는 점을 고려하면 공백 없이 대략적으로 계산할 수 있습니다).

Windows 사용자의 경우 다음과 동일하거나 유사한 프로그램이 있다고 생각합니다(그러나 테스트한 것은 없습니다).화장실,xpdf(pdftotext 포함) 및무료 PDF를 텍스트 변환기로.

이 접근 방식의 한 가지 문제점은 일반 텍스트와 머리글, 그림 등을 구별할 수 없다는 것입니다.

외부 프로그램이 없으면 대략적인 예측을 할 수 있는 몇 가지 방법이 있습니다. 일부는 다음 tex(최소 아님) 예제로 자체 설명됩니다.

        \documentclass{article}
        % \usepackage[chars=60, lines=30, hyphen=true, noindent]{stdpage}
        \usepackage{lineno} 
        \usepackage{canoniclayout}
        \usepackage{amssymb}
        \usepackage{hyperref}
        \usepackage{calc}
        \usepackage{lipsum}
        \usepackage{xcolor}
        \newlength{\oneem}
        \setlength{\oneem}{1em}
        \newlength{\ispace}
        \settowidth{\ispace}{i}
        \newlength{\mspace}
        \settowidth{\mspace}{m}
        \newlength{\alphabet}
        \settowidth{\alphabet}
         {abcdefghijklmnopqrstuvwxyz}
        \usepackage{geometry} 
        \geometry{textwidth=2.5\alphabet}
        %\geometry{textwidth=26ex}
        \newlength{\avgchar}
        \setlength{\avgchar}
         {\textwidth/65}

        \pagestyle{empty}
        \setlength{\parskip}
         {\bigskipamount}

        \begin{document}

        \section*{How to estimate characters per line win \LaTeX}

        \subsection*{A dirty way: Fix {\tt textwidth} to $n$ alphabets or $n$ em units}

        As you can see in the preamble, the text width of this text
     is fixed to 2.5 times the length of the alphabet with 26 
     characters (\the\alphabet) with a default font size of 
     \the\oneem, resulting in \the\textwidth per line.

        Therefore should be 26 * 2.5 = 65 characters per line (a 
     good value according to the Bringhurst rule) where each
     character have a with of \the\avgchar~in average (note that is
     roughly one half of the font size, i.e, each character is 
     aprox. $\thickapprox\frac{1}{2}$em and then fixing 
     {\tt \textbackslash{textwidth}} in em units allow also an 
     easy calculation of the number of characters. 

        But this prediction is only very useful if you plan to 
     write $n$ times of complete alphabets. Only then really 130 
     characters (5 alphabets) fill two lines ...



           \begin{linenumbers}
            \noindent \textcolor{blue}{abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmn 
    opqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz}
            \end{linenumbers}

            .

.. 260 characters fill 4 lines, and so on. 

        Unfortunately, for any other text this rule is less because 
     the proportion of each character changes. For example, the 10 
     first paragrahs of  \emph{Lore Ipsum} (below only the first) 
     with this format produce 96 lines with 986 words and 6673 
     characters. That is 69 characters per line (counting the 
     spaces), not 65. 

        \begin{linenumbers}
            \textcolor{blue}{\lipsum[1]}
        \end{linenumbers}


        Not too bad prediction after all, taking into account that 
     the thin  $i$ (\the\ispace) appears 503 times while the tick 
     $m$ (\the\mspace) appears only 218 times, and moreover, there 
     are variable spaces and signs of punctuation.

        Of course, you can also fix any predetermined 
    {\tt \textbackslash{textwidth}} in pt, cm, etc., calculate the 
    width of your preferred font (
    \url{http://tex.stackexchange.com/questions/60277/average-width-of-popular-tex-fonts}) and then simply do some math.

        \subsection*{A quicker way: the {\tt canoniclayout} package}

        \begin{itemize}
        \item Put {\tt \textbackslash{}usepackage\{canoniclayout\}} in the preamble 

        \item And {\tt \textbackslash{currentfontletters}} in the body of the document:

        \framebox{\begin{minipage}[t]{1\columnwidth}%
        \currentfontletters
        \end{minipage}}

        \item Or {\tt \textbackslash{charactersperpage}}:

        \framebox{\begin{minipage}[t]{1\columnwidth}%
        \charactersperpage
        \end{minipage}}

        \end{itemize}

        Note that this package make estimates of the amount of
     characters and lines that the page layout could have, do not 
     count the number of compiled lines\footnote{
     For this you can use {\tt lineno} package as above in the \emph{Lore Ipsum} ouput}. 

        \subsection*{A strange way: the {\tt stdpage} package}

        Produce a format with a nonproportional font but with 30 
     lines and 60 characters and about 1440 character  (german 
     “Normseite”) by default. The number of characters and lines
     can be adjusted: 

        {\tt \textbackslash{}usepackage{[}chars=65, lines=30, noindent{]}\{stdpage\}}

        Probably you don't want print the final version in this 
     format, but temporally could be useful to compare the lengths 
     of text in each line, as well as the amount of pages, with the 
     version of the proportional font and so obtain some 
     information about the character density in your text.

        \subsection*{Another unexplored ways}

        It saw that in ConTeX (I never used) there are a  
    \textbackslash{averagecharwidth} command. Please see 
    \url{http://tex.stackexchange.com/questions/68105/macro-for-the-average-width-of-a-character}


        \end{document}

그러나 더 실용적인 접근 방식은 TeXcount입니다. 계산하는 내용을 더 많이 제어할 수 있기 때문입니다. 이 파일로 설명하겠습니다.

% CAUTION !!!
% 1) Need --enable-write18 or --shell-escape 
% 2) This file MUST be saved 
%    as "borra.tex" before the compilation
%    in your working directory
% 3) This code will write wordcount.tex
%    and charcount.tex in /tmp of your disk.
%    (Windows users must change this path)
% 4) Do not compile if you are unsure
%    of what you are doing.

\documentclass{article}
\usepackage{lineno} % for line numbers 
\usepackage{moreverb} % for verbatim ouput

% Only for format purposes
\usepackage{geometry}
\geometry{verbose,tmargin=2cm,bmargin=2cm,lmargin=6cm,rmargin=3cm}
\usepackage{graphicx} 
\setlength{\parskip}{\bigskipamount}
\setlength{\parindent}{1em}

% Count of words

\immediate\write18{texcount -inc -incbib 
-sum borra.tex > /tmp/wordcount.tex}
\newcommand\wordcount{
\verbatiminput{/tmp/wordcount.tex}}

% Count of characters

\immediate\write18{texcount -char -freq
 borra.tex > /tmp/charcount.tex}
\newcommand\charcount{
\verbatiminput{/tmp/charcount.tex}}


% Only two example lengths 
\newlength{\ispace}
\settowidth{\ispace}{i}
\newlength{\mspace}
\settowidth{\mspace}{m}

\begin{document}

% Note that the next line is NOT a comment
%TC:ignore

{\bf Note}: Comparison of source and compiled 
version of this file must be self 
explanatory. See {\tt lineno} 
and {\tt\TeX count} documentation 
if you need more information.

\noindent\resizebox{\textwidth}{!}{\bf How to determine characters per line with \LaTeX}

With few \LaTeX{} commands we can see in the 
compiled document the number of lines (with 
package {\tt lineno}) and count the number of 
words and characters for the whole documents 
or some parts with the aid of the {\tt\TeX 
count} program (included in \TeX{}  Live). 
The rest is child's play:  

We can determine that in the example (see 
below) the text of the example section 
(without head nor float nor subsection), 
there are 7-1=6 lines (see left margin) with 
70 words (see page 2) and 350 characters 
(see  page 3), so that there is an average of 
70/6  = 11.6 words and 350/6 = 58.3 
characters per line.

Moreover, as frequency and with of each 
character can be determined (see page 3) 
it is also easy obtain the average width of 
these characters in a long reference text to 
make predictions of characters per line in 
texts of the same language/style 
that have not yet been written. 

In the whole example there are 454 
characters: 25 $i$ with widths of 
\the\ispace{} , 16 $m$ with widths of 
\the\mspace{} \dots and so on. 
Therefore the average will be 
$$\frac{(25*2.77)+(16*8.33)+ ...}{454}$$ 
And the text width (\the\textwidth{} in 
the example) divided by this average will 
give the prediction of characters per line. 
That's all.


\dotfill Start of the example text \dotfill

%TC:endignore

\linenumbers

\section{Section: text example with a float}

Words and characters of this example file are 
automatically counted from the source file 
when compiled (therefore generated text as 
\textbackslash{}lipsum[1-10] is {\bf not} 
counted). The results are showed at the end 
of the compiled version.
Counts are made in headers, caption floats 
and normal text for the whole file. Subcounts 
for structured parts (sections, subsections, 
etc.) are also made. Number of headers, 
floats and math chunks are also counted. 

\begin{figure}[h]
\centering
\framebox{This is only a example float} 
\caption{This is a example caption}
\end{figure}

\subsection*{Subsection: Little text 
with math chunks}

In line math: $\pi +2 = 2+\pi$ \\   
Display math: \[\pi +2 = 2+\pi\] 

\nolinenumbers

%TC:ignore  
\dotfill End of the example text \dotfill 

\newpage

\subsubsection*{Counts of words} 
\wordcount

\newpage

\subsubsection*{Counts of characters 
and frequencies}
\charcount
%TC:endignore   

\end{document}

관련 정보