pdftex `\pdfmatch` 特殊類別的擴充和使用規則是什麼？

Question 1

pdftex 原始碼中的正規表示式原始程式碼對字元類別有一些可選支持，但沒有區域設定支持，因此不能可靠地用於非ascii 字符，任何UTF-8 輸入將被視為多個位元組而不是unicode 字元。

\pdfmatch {[+-]?([0-9]*[.])?[0-9]+} {-4.06}
\immediate\write500{1: \pdflastmatch0, \pdflastmatch1}

% [:digit:] is [:digt]  checking for those literal characters
\pdfmatch {[+-]?([:digit:]*[.])?[:digit:]+} {-dddg:ii.ggg}
\immediate\write500{2: \pdflastmatch0, \pdflastmatch1}

% [[:digit:]] is digit class
\pdfmatch {[+-]?([[:digit:]]*[.])?[[:digit:]]+} {-4.06}
\immediate\write500{3: \pdflastmatch0, \pdflastmatch1}

% full expansion happens for both arguments before regex processing
\def\aaa{[0-9]*[.]}
\def\bbb{[+-]?(\aaa)?}
\def\ccc{\bbb[0-9]+}

\def\DDD{4}
\def\EEE{06}
\def\FFF{-\DDD.\EEE}

\pdfmatch {\ccc} {\FFF}
\immediate\write500{4: \pdflastmatch0, \pdflastmatch1}

\chardef\DOLLAR=`$

\pdfmatch {\DOLLAR} {\$}
\immediate\write500{5: \pdflastmatch0}

\pdfmatch {\DOLLAR} {.D.*R}
\immediate\write500{6: \pdflastmatch0}


\pdfmatch {\DOLLAR} {.*}
\immediate\write500{7: \pdflastmatch0}


\pdfmatch {abc\DOLLAR} {a.*}
\immediate\write500{8: \pdflastmatch0}


\end

產生

1: 0->-4.06, 1->4.
2: 0->-dddg:ii.ggg, 1->dddg:ii.
3: 0->-4.06, 1->4.
4: 0->-4.06, 1->4.
5: -1->
6: -1->
7: -1->
8: -1->

其中測試 2 顯示的[:digit:]不是字元類，而只是字元集: d i g t

測試 3 顯示的[[:digit:]]是字元類別（感謝@egreg）

測試 4 顯示，在正規表示式處理開始之前，字串和正規表示式都已完全展開。

使用不可擴展的 chardef 標記進行的測試 5-8\DOLLAR顯示，如果擴充功能不純粹由字元標記組成，則不會符合任何內容。

Answer

pdftex 原始碼中的正規表示式原始程式碼對字元類別有一些可選支持，但沒有區域設定支持，因此不能可靠地用於非ascii 字符，任何UTF-8 輸入將被視為多個位元組而不是unicode 字元。

\pdfmatch {[+-]?([0-9]*[.])?[0-9]+} {-4.06}
\immediate\write500{1: \pdflastmatch0, \pdflastmatch1}

% [:digit:] is [:digt]  checking for those literal characters
\pdfmatch {[+-]?([:digit:]*[.])?[:digit:]+} {-dddg:ii.ggg}
\immediate\write500{2: \pdflastmatch0, \pdflastmatch1}

% [[:digit:]] is digit class
\pdfmatch {[+-]?([[:digit:]]*[.])?[[:digit:]]+} {-4.06}
\immediate\write500{3: \pdflastmatch0, \pdflastmatch1}

% full expansion happens for both arguments before regex processing
\def\aaa{[0-9]*[.]}
\def\bbb{[+-]?(\aaa)?}
\def\ccc{\bbb[0-9]+}

\def\DDD{4}
\def\EEE{06}
\def\FFF{-\DDD.\EEE}

\pdfmatch {\ccc} {\FFF}
\immediate\write500{4: \pdflastmatch0, \pdflastmatch1}

\chardef\DOLLAR=`$

\pdfmatch {\DOLLAR} {\$}
\immediate\write500{5: \pdflastmatch0}

\pdfmatch {\DOLLAR} {.D.*R}
\immediate\write500{6: \pdflastmatch0}


\pdfmatch {\DOLLAR} {.*}
\immediate\write500{7: \pdflastmatch0}


\pdfmatch {abc\DOLLAR} {a.*}
\immediate\write500{8: \pdflastmatch0}


\end

產生

1: 0->-4.06, 1->4.
2: 0->-dddg:ii.ggg, 1->dddg:ii.
3: 0->-4.06, 1->4.
4: 0->-4.06, 1->4.
5: -1->
6: -1->
7: -1->
8: -1->

其中測試 2 顯示的[:digit:]不是字元類，而只是字元集: d i g t

測試 3 顯示的[[:digit:]]是字元類別（感謝@egreg）

測試 4 顯示，在正規表示式處理開始之前，字串和正規表示式都已完全展開。

使用不可擴展的 chardef 標記進行的測試 5-8\DOLLAR顯示，如果擴充功能不純粹由字元標記組成，則不會符合任何內容。

Question 2

\pdfmatch手冊 (rev. 905) 第 45 頁說明了的語法

\pdfmatch [ icase ] [ subcount⟨整數⟩ ]⟨一般文字⟩ ⟨一般文字⟩（可擴充）

由於兩個參數都是 ⟨general text⟩，因此它們的內容物會像\message.

因此，如果您需要轉義字元來建立正規表示式，例如\+匹配文字+，則需要\noexpand\+or [+]（請參見下面的範例）。

支援字元類[:alpha:],[:digit:]和[:alnum:]（當然帶有雙括號）。

為了匹配您剛剛使用的字串結尾$，字串的開頭是^.

什麼字符集？回想一下，它pdftex是 8 位元的，因此不可能支援 UTF-8（但在某些情況下它可以與一起使用pdflatex）。

\documentclass{article}

\count255=\pdfmatch{[[:digit:]]x}{1x2y}
\message{^^J1: \the\count255; \pdflastmatch0}

\count255=\pdfmatch{[[:digit:]][[:alpha:]]}{12y}
\message{^^J2: \the\count255; \pdflastmatch0}

\count255=\pdfmatch{[[:alnum:]]*\noexpand\+}{a2c+d3f+}
\message{^^J3: \the\count255; \pdflastmatch0}

\count255=\pdfmatch{[[:alnum:]]*[+]$}{a2c+d3f+}
\message{^^J4: \the\count255; \pdflastmatch0}

\count255=\pdfmatch{^[[:alnum:]]*\noexpand\+}{a2c+d3f+}
\message{^^J5: \the\count255; \pdflastmatch0}

\count255=\pdfmatch{à}{aàa}
\message{^^J6: \the\count255; \pdflastmatch0}

\stop

控制台會列印

1: 1; 0->1x
2: 1; 1->2y
3: 1; 0->a2c+
4: 1; 4->d3f+
5: 1; 0->a2c+
6: 1; 1->à

如果你使用

\pdfmatch{\unexpanded{<regex>}}{<text>}

⟨regex⟩ 可以採用標準 POSIX 語法。例如上面的例子3可以是

\count255=\pdfmatch{\unexpanded{[[:alnum:]]*\+}}{a2c+d3f+}

Answer