行の端に短い単語が現れる改行を回避するにはどうすればよいでしょうか?

Question

ここでは 2 つの目標があります。

句読点の直後の短い単語の後に改行しないでください。
句読点の直前の短い単語の前では改行しないでください。

適切な行区切りの規則的な制約に従います。

1 つの簡単な解決策は、句読点を特に改行に適した場所として宣言することです (負のペナルティ、十分に大きい値)。これにより、TeX は句読点で改行しようとすることと、その他の改行に関する考慮事項 (悪さ、デメリット、その他のペナルティ) をトレードオフできますが、その種の改行がまったくないことは保証されません。

以下に、ビフォーアフターを示します。

ご覧のように、

変更後、最初の段落, itの 3 行目の末尾のが次の行に移動しました。
2 番目の段落では、変更後にel.4 行目の先頭のと6 行目の先頭のが前の行に移動しました。at,
3 番目の段落は、このトリックが保証ではないことを示すために含まれています。4it.行目の先頭のは、前の行に収める方法がないため、そのまま残ります。

これは次の方法で達成されました:

\catcode`.=\active \def.{\char`.\penalty -200\relax}
\catcode`,=\active \def,{\char`,\penalty -200\relax}

次の文書に記載されています。

\documentclass{article}
\begin{document}
\frenchspacing % Makes it easier
\hsize=20em
\parskip=10pt

% First, three paragraphs with the default settings
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut blandit placerat justo, sed dictum sem. Donec erat elit, tincidunt non, it vel, tincidunt vehicula velit. Etiam pharetra ante at porta elementum. In nulla purus, faucibus non accumsan non, consequat eget.

Natis nulla blandit luctus tellus, sit amet posuere lacus maxius quis. In sit amet mattis est, a vehiula velit. Nam interum solicitudin el. In faucibus vulputate purus nec consectelur crass metus ipsum, blandit iln ullamcorpert at, portitor vita dolor. Duis sed mauris i inset inculis malesuada. Quisque laoret eu dui eget sage melittis corpum verborum.

Volutpat libero ac auctor. Donec semper, as id ultrices rhoncus, lectus nulla consequat nisi, ac sagitis risus lectus vel felis. Ut gravida it. Nam malesuada ante turpis eget. Ipsum factum verbum verdit.

\pagebreak

% Now the same text, with the meanings of . and , changed.
\catcode`.=\active \def.{\char`.\penalty -200\relax}
\catcode`,=\active \def,{\char`,\penalty -200\relax}

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut blandit placerat justo, sed dictum sem. Donec erat elit, tincidunt non, it vel, tincidunt vehicula velit. Etiam pharetra ante at porta elementum. In nulla purus, faucibus non accumsan non, consequat eget.

Natis nulla blandit luctus tellus, sit amet posuere lacus maxius quis. In sit amet mattis est, a vehiula velit. Nam interum solicitudin el. In faucibus vulputate purus nec consectelur crass metus ipsum, blandit iln ullamcorpert at, portitor vita dolor. Duis sed mauris i inset inculis malesuada. Quisque laoret eu dui eget sage melittis corpum verborum.

Volutpat libero ac auctor. Donec semper, as id ultrices rhoncus, lectus nulla consequat nisi, ac sagitis risus lectus vel felis. Ut gravida it. Nam malesuada ante turpis eget. Ipsum factum verbum verdit.

% Change it back
\catcode`.=12 \catcode`,=12
\pagebreak

% Same text again, to show that nothing's permanently changed.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut blandit placerat justo, sed dictum sem. Donec erat elit, tincidunt non, it vel, tincidunt vehicula velit. Etiam pharetra ante at porta elementum. In nulla purus, faucibus non accumsan non, consequat eget.

Natis nulla blandit luctus tellus, sit amet posuere lacus maxius quis. In sit amet mattis est, a vehiula velit. Nam interum solicitudin el. In faucibus vulputate purus nec consectelur crass metus ipsum, blandit iln ullamcorpert at, portitor vita dolor. Duis sed mauris i inset inculis malesuada. Quisque laoret eu dui eget sage melittis corpum verborum.

Volutpat libero ac auctor. Donec semper, as id ultrices rhoncus, lectus nulla consequat nisi, ac sagitis risus lectus vel felis. Ut gravida it. Nam malesuada ante turpis eget. Ipsum factum verbum verdit.

\end{document}

ノート:

.このようにandの意味を変更すると、,何かが壊れても驚きません。(実際、この例では何も問題がなかったことに驚きましたが、catcode の変更は既に読み込まれているトークンには適用されないことに気付きました。)
ペナルティは調整可能です。-200は例として挙げましたが、-1から-9999までであればいくつかの効果があります。(この例では、これらすべての変更が有効になるしきい値は -175 のようですが、1 つの変更は -100 でも発生します。) ペナルティが -10000 以下になると、改行が強制されますが、これは望ましくありません。
より多くの句読点文字 ( ) に対して同じことを行ったり?!:;、異なる句読点文字に対して異なるペナルティを適用したりできます。
句読点の後のスペースが大きくなる（デフォルト）と、状況は少し難しくなります\nonfrenchspacing。実行可能かもしれませんが、これらの例を思いつくのは大変な作業なので、追求していません。練習として残しておきます :-)
LuaTeXでは改行アルゴリズムを変更することもできます。これは、保証行の端に短い単語が存在しない（必要な場合）。

編集: LuaTeX で「保証された」ソリューションを実装せずにはいられませんでした。このバージョンは、\frenchspacingとの両方で動作するはずです\nonfrenchspacing。特定のシーケンスを検出し、ブレークを防ぐために無限 (10000) のペナルティを挿入します。

(punct, space, short_word, space) -> (punct, space, short_word, penalty, space)

そして

(space, short_word, punct) -> (penalty, space, short_word, punct)

上記の例では、次の結果が生成されます。

最後の段落のオーバーフルボックスに注目してください。制約は非常に厳しいのですが、これが私たちの要求です。(いずれにしても、幅が広く長い段落ではオーバーフルボックスは発生しない可能性が高いため、書き直しや追加\emergencystretchなどの通常の方法で修正できます。)

上記 (およびアイデア) を生成したコードには、LuaTeX コンパイルがクラッシュする可能性のあるバグが含まれている可能性がありますが、ここにあります。

\documentclass{article}
\directlua{dofile("strict.lua")}
\begin{document}
\frenchspacing % Keeping same example as before
\hsize=20em
\parskip=10pt

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut blandit placerat justo, sed dictum sem. Donec erat elit, tincidunt non, it vel, tincidunt vehicula velit. Etiam pharetra ante at porta elementum. In nulla purus, faucibus non accumsan non, consequat eget.

Natis nulla blandit luctus tellus, sit amet posuere lacus maxius quis. In sit amet mattis est, a vehiula velit. Nam interum solicitudin el. In faucibus vulputate purus nec consectelur crass metus ipsum, blandit iln ullamcorpert at, portitor vita dolor. Duis sed mauris i inset inculis malesuada. Quisque laoret eu dui eget sage melittis corpum verborum.

Volutpat libero ac auctor. Donec semper, as id ultrices rhoncus, lectus nulla consequat nisi, ac sagitis risus lectus vel felis. Ut gravida it. Nam malesuada ante turpis eget. Ipsum factum verbum verdit.
\end{document}

どこstrict.luaですか:

function is_punct(n)
   if node.type(n.id) ~= 'glyph' then return false end
   if n.char > 127 then return false end
   c = string.char(n.char)
   if c == '.' or c =='?' or c == '!' or c == ':' or c == ';' or c == ',' then
      return true
   end
   return false
end

function no_punct_short_word_eol(head)
   -- Prevents having a line that ends like "<punctuation><space><short_word>"
   -- How we do this:
   --   (1) detect such short words (punct, space, short_word, space)
   --   (2) insert a penalty of 10000 between the short_word and the following space.
   -- More concretely:
   --   * A punctuation is one of .?!:;, which are the ones affected by \frenchspacing
   --   * A space is any glue node.
   --   * A short_word is a sequence of only glyph and kern nodes.
   -- So we maintain a state machine: default -> seen_punct -> seen_space -> seen_word
   -- where in the last state we maintain length. If we're in seen_word state and we see
   -- a glue, and length is less than threshold, insert a penalty before the glue.
   state = 'default'
   root = head
   while head do
      if state == 'default' then
         if is_punct(head) then
            state = 'seen_punct'
         end
      elseif state == 'seen_punct' then
         if node.type(head.id) == 'glue' then
            state = 'seen_space'
         else
            state = 'default'
         end
      elseif state == 'seen_space' then
         if node.type(head.id) == 'glyph' then
            state = 'seen_word'
            length = 1
         elseif is_punct(head) then
            state = 'seen_punct'
         else
            state = 'default'
         end
      elseif state == 'seen_word' then
         if node.type(head.id) == 'glue' and length <= 2 then
            -- Moment of truth
            penalty = node.new('penalty')
            penalty.penalty = 10000
            root, new = node.insert_before(root, head, penalty)
            -- TODO: Is 'head' invalidated now? Docs don't say anything...
            state = 'default'
         elseif node.type(head.id) == 'glyph' or node.type(head.id) == 'kern' then
            if node.type(head.id) == 'glyph' then length = length + 1 end
         else
            state = 'default'
         end
      else
         assert(false, string.format('Impossible state %s', state))
      end
      head = head.next
   end
   return root
end
luatexbase.add_to_callback('pre_linebreak_filter', no_punct_short_word_eol, 'Prevent short words after punctuation at end of sentence')

function no_bol_short_word_punct(head)
   -- Prevents having a line that starts like "<short_word><punctuation>"
   -- How we do this:
   --   (1) detect such short words (space, short_word, punct)
   --   (2) insert a penalty of 10000 between the space and the following short_word.
   -- More concretely:
   --   * A punctuation is one of .?!:;, which are the ones affected by \frenchspacing
   --   * A space is any glue node.
   --   * A short_word is a sequence of only glyph and kern nodes.
   -- So we maintain a state machine: default -> seen_space -> seen_word
   -- where in the last state we maintain length. If we're in seen_word state and we see
   -- a punct, and length is less than threshold, insert a penalty before the glue.
   -- Note that for this to work, we need to maintain a pointer to where we saw the glue.
   state = 'default'
   root = head
   before_space = nil
   while head do
      if state == 'default' then
         if node.type(head.id) == 'glue' then
            state = 'seen_space'
            before_space = head.prev
         end
      elseif state == 'seen_space' then
         if node.type(head.id) == 'glyph' then
            state = 'seen_word'
            length = 1
         else
            state = 'default'
         end
      elseif state == 'seen_word' then
         if is_punct(head) and length <= 2 then
            -- Moment of truth
            penalty = node.new('penalty')
            penalty.penalty = 10000
            root, new = node.insert_after(root, before_space, penalty)
            -- TODO: Is 'head' invalidated now? Docs don't say anything...
            state = 'default'
         elseif node.type(head.id) == 'glyph' or node.type(head.id) == 'kern' then
            if node.type(head.id) == 'glyph' then length = length + 1 end
         elseif node.type(head.id) == 'glue' then
            state = 'seen_space'
            before_space = head.prev
         else
            state = 'default'
         end
      else
         assert(false, string.format('Impossible state %s', state))
      end
      head = head.next
   end
   return root
end
luatexbase.add_to_callback('pre_linebreak_filter', no_bol_short_word_punct, 'Prevent short words at beginning of sentence before punctuation')

Answer 1