Texindy 排序冰島語

Texindy 排序冰島語

我使用命令texindy -L icelandic -M lang/icelandic/utf8 dict_main.idx建立照片名稱、作者和許可證的清單。但排序不正確(例如冰島字母以這個字母順序結尾 þ æ ö)

微量元素:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[]{makeidx}
\usepackage[icelandic, czech]{babel}
\makeindex
\begin{document}
Hello
\index{Þari - Franz Eugen Kohler, Public Domain}
\index{Þistill - ŠARŽÍK František, COPYRIGHT/PD}
\index{Önd - Karney, Lee, PD}
\index{Æðarkóngur - Whitehouse, Laura L., PD}
\index{Avókadó - Forest \& Kim [[p:2684;Starr]], CC-BY}
\index{Auðnutittlingur - Arnstein Rønning, CC BY-SA 3.0}
\index{Asni - Zicha Ondřej, COPYRIGHT/CC-BY-NC}
\index{Á - hvalur.org, CC Unported Licence}
\index{Álft - Bukovský Jiří, COPYRIGHT/CC-BY-NC}
\index{Álka - Jack Spellingbacon from Scotland, CC BY-SA 3.0}

\printindex
\end{document}

運行此命令:

  pdflatex test.tex
  texlua utftexindy.lua -L icelandic test.idx
  pdflatex test.tex

代碼詳情在這個答案中

答案1

我嘗試使用建立索引lualatex。我關閉了幾個軟體包,特別是。babel因為有一些硬編碼的字元聲明作為簡寫,而且它不能很好地處理 utf-8 編碼字元。您可能需要這些套件來進行排版,但在某些情況下,我們可能會忽略它們只是為了產生索引。

如果我們檢查制定規則字母表(它是用 Perl 寫的),我們在文件中發現這一行/alphabets/icelandic/utf8.pl.in

['A', ['a','A'],['á','Á']@u{,['ǫ́','Ǫ́']}],

據我了解,該@u{}部分使某些排序階段的字母相等。它包含在其他一些行中:

['E', ['e','E']@u{,['ę','Ę']},['ë','Ë'],['é','É']],
['Æ', ['æ','Æ']@u{,['ǽ','Ǽ'],['ę́','Ę́'],['ǿ','Ǿ']},[ 'œ','Œ'],['ä','Ä']],
['Ö', ['ö','Ö'],['ø','Ø']@u{,[' ε','ε']}],

我們可能會期待同樣的行為。所以AÁ以及ÆǼ在某一點上是相等的。如果這不是所需的排序,xindy 社群可能正在修復它。我不確定這一點,但它是程式碼中常見的部分,其中alphabets/general/utf8.pl.in用於排序目的的變音符號經常被忽略。

我相信有一個小錯誤/打字錯誤。索引中的詞組通常使用大寫字母,但有:

我相信這是錯的:['ð', ['ð','Ð']],
這應該是正確的形式:['Ð', ['ð','Ð']],

我們也可以在下面的例子中發現,有一個小寫字母eth,而不是大寫字母。我附上 TeX 程式碼和第 2 頁的預覽。

lualatex mal-icelandic.tex
xindy -M texindy -L 冰島語 -C utf8 mal-icelandic.idx
lualatex mal-icelandic.tex

%! lualatex mal-icelandic.tex
%! xindy -M texindy -L icelandic -C utf8 mal-icelandic.idx
%! lualatex mal-icelandic.tex
% or with two changes: +xltxtra and -luatextra, we run xelatex
\documentclass{article}
%\usepackage[T1]{fontenc}
%\usepackage[utf8]{inputenc}
%\usepackage[icelandic,czech]{babel}
\usepackage{luatextra} % for lualatex run
%\usepackage{xltxtra} % for xelatex run
\usepackage[colorlinks]{hyperref}%hyperindex=false
\usepackage{makeidx}
\makeindex
\begin{document}
The first paragraph of text\ldots
\index{Þari - Franz Eugen Kohler, Public Domain}
\index{Þistill - ŠARŽÍK František, COPYRIGHT/PD}
\index{Önd - Karney, Lee, PD}
\index{Æðarkóngur - Whitehouse, Laura L., PD}
\index{Avókadó - Forest \& Kim [[p:2684;Starr]], CC-BY}
\index{Auðnutittlingur - Arnstein Rønning, CC BY-SA 3.0}
\index{Asni - Zicha Ondřej, COPYRIGHT/CC-BY-NC}
\index{Á - hvalur.org, CC Unported Licence}
\index{Álft - Bukovský Jiří, COPYRIGHT/CC-BY-NC}
\index{Álka - Jack Spellingbacon from Scotland, CC BY-SA 3.0}
\index{Å - a fake index entry}
% a bug? ['ð',  ['ð','Ð']],
\index{Ð - another fake index entry}
\index{E - a testing index entry}
\index{Ǽ a fake}
\begingroup
\pagestyle{empty}
\def\thispagestyle#1{}
\printindex
\endgroup
\end{document}

姆韋


編輯1:改進的通用版本(冰島排序+歐洲西式風格,帶有許多帶有變音符號的字母)

請將這兩個文件下載到您的工作目錄(我無法直接在此處發布第一個文件,因為它包含一些 TeX.SX 不顯示的特殊字元):

獲取http://striz7.fame.utb.cz/tex-sx/is/icelandicmal.xdy
獲取http://striz7.fame.utb.cz/tex-sx/is/icelandicmal-test.xdy

我為冰島語創建了一套新的排序規則,並與西歐的通用排序規則混合在一起。因此,即使它們不在冰島字母表中,您C, Q, W, Z也可以找到字母組。Å許多字母都添加了變音符號,因此捷克語、斯洛伐克語、波蘭語、德語以及可能還有更多語言中的單字排序都會被考慮在內(請參閱generalXindy 中的排序)。

要取得字母列表(字母組、字母順序),我使用:

lualatex 排版.tex

我運行這些行來獲取索引:

lualatex mal-icelandicmal.tex
xindy -M texindy -Micelandicmal-test -M mal-style mal-icelandicmal.idx
lualatex mal-icelandicmal.tex

這是字母列表(程式碼,預覽)和索引範例(程式碼,預覽)。請測試是否符合您的需求。

%! lualatex typesetme.tex
\documentclass[a4paper]{article}
\pagestyle{empty}
\usepackage{luatextra}
\newenvironment{alphabet}{\begin{tabular}{*{16}{l}}%
   }{\end{tabular}}
\addtolength{\voffset}{-1in}
\addtolength{\textheight}{1in}

\begin{document}
\section{Icelandicmal}
\subsection{Alphabet}
\begin{alphabet}
a\,A\\
á\,Á & à\,À & ă\,Ă & â\, & ã\,à & ä\,Ä & ą\,Ą\\
b\,B\\
c\,C & č\,Č & ć\,Ć & ĉ\,Ĉ & ç\,Ç\\
d\,D & ď\,Ď\\
ð\,Ð & đ\,Đ\\
e\,E\\
é\,É & è\,È & ě\,Ě & ê\,Ê & ë\,Ë & ę\,Ę\\
f\,F\\
g\,G & ĝ\,Ĝ & ğ\,Ğ\\
h\,H & ĥ\,Ĥ & ı\,I\\
i\,I\\
í\,Í & ì\,Ì & î\,Î & ï\,Ï\\
j\,J & ĵ\,Ĵ\\
k\,K\\
l\,L & ĺ\,Ĺ & ľ\,Ľ & ł\,Ł\\
m\,M\\
n\,N & ń\,Ń & ň\,Ň & ñ\,Ñ\\
o\,O\\
ó\,Ó & ő\,Ő & ò\,Ò\\
p\,P\\
q\,Q\\
r\,R & ŕ\,Ŕ & ř\,Ř\\
s\,S & ś\,Ś & š\,Š & ŝ\,Ŝ & ş\,Ş\\
t\,T & ť\,Ť\\
u\,U\\
ú\,Ú & ù\,Ù & ŭ\,Ŭ & ů\,Ů & û\,Û & ü\,Ü & ű\,Ű\\
v\,V\\
w\,W\\
x\,X\\
y\,Y\\
ý\,Ý & ÿ\,Ÿ\\
z\,Z & ź\,Ź & ż\,Ż & ž\,Ž\\
þ\,Þ\\
æ\,Æ & ǽ\,Ǽ & œ\,Œ\\
ö\,Ö & ø\,Ø & ǿ\,Ǿ & ô\,Ô & õ\,Õ\\
å\,Å
\end{alphabet}
\subsection{Ligatures}
\begin{flushleft}
`ß' is sorted like `s\,s', but \emph{after} it in otherwise equal words.
\end{flushleft}
\subsection{Upper-/lowercase words}
Capitalized or uppercase words are sorted \emph{before} otherwise equal lowercase words.
\subsection{Special characters}
The order of special characters and letters is:
\begin{flushleft}
?\hspace{4mm}!\hspace{4mm}.\hspace{4mm}letters\hspace{4mm}-\hspace{4mm}'
\end{flushleft}
\end{document}

字母的集合的預覽

%! lualatex mal-icelandicmal.tex
%! xindy -M texindy -M icelandicmal-test -M mal-style mal-icelandicmal.idx 
%! lualatex mal-icelandicmal.tex
% or with two changes: +xltxtra and -luatextra, we run xelatex
\documentclass{article}
%\usepackage[T1]{fontenc}
%\usepackage[utf8]{inputenc}
%\usepackage[icelandic,czech]{babel}
\usepackage{luatextra} % for lualatex run
%\usepackage{xltxtra} % for xelatex run
\usepackage[colorlinks]{hyperref}%hyperindex=false
\usepackage{makeidx}
\makeindex
\usepackage{filecontents}
\def\mygroup#1{\textbf{#1}}
\begin{filecontents*}{mal-style.xdy}
;; mal-style.xdy
(markup-letter-group :open-head "~n\mygroup{" :close-head "}")
\end{filecontents*}

\begin{document}
The first paragraph of text\ldots
\index{Þari -- Franz Eugen Kohler, Public Domain}
\index{Þistill -- Šaržík František, COPYRIGHT/PD}
\index{Önd -- Karney, Lee, PD}
\index{Æðarkóngur -- Whitehouse, Laura L., PD}
\index{Avókadó -- Forest \& Kim [[p:2684;Starr]], CC-BY}
\index{Auðnutittlingur -- Arnstein Rønning, CC BY-SA 3.0}
\index{Asni -- Zicha Ondřej, COPYRIGHT/CC-BY-NC}
\index{Á -- hvalur.org, CC Unported Licence}
\index{Álft -- Bukovský Jiří, COPYRIGHT/CC-BY-NC}
\index{Álka -- Jack Spellingbacon from Scotland, CC BY-SA 3.0}
\index{Å -- a fake index entry}
\index{Ð -- another fake index entry}
\index{E -- a testing index entry}
\index{Ǽ a fake}
\begingroup
\pagestyle{empty}
\def\thispagestyle#1{}
\printindex
\endgroup
\end{document}

姆韋


編輯 2:簡約版本(僅限冰島排序及其 32 個字母)

請下載兩個新檔案:

wget http://striz7.fame.utb.cz/tex-sx/is-min/icelandicmalmin.xdy  
wget http://striz7.fame.utb.cz/tex-sx/is-min/icelandicmalmin-test.xdy  

我們運行以下四行:

pdflatex mal-icelandicmalmin.tex  
texlua iec2utf.lua <mal-icelandicmalmin.idx >mal-temp.idx  
xindy -M texindy -M icelandicmalmin-test -M mal-style -o mal-icelandicmalmin.ind mal-temp.idx  
pdflatex mal-icelandicmalmin.tex  

iec2utf.lua由 michal.h21 編寫的函式庫運作良好。如果您想查看包含的信件,請運行:

pdflatex typesetme.tex

我附上兩個用 測試的新文件pdflatex、Xindy 風格的字母列表和冰島語索引樣本的預覽。

文件typesetme.tex

%! pdflatex typesetme.tex
\documentclass[a4paper]{article}
\pagestyle{empty}
%\usepackage{luatextra}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\newenvironment{alphabet}{\begin{tabular}{*{16}{l}}%
   }{\end{tabular}}
\addtolength{\voffset}{-1in}
\addtolength{\textheight}{1in}

\begin{document}
\section{Icelandicmalmin}
\subsection{Alphabet}
\begin{alphabet}
a\,A\\
á\,Á\\
b\,B\\
d\,D\\
ð\,Ð\\
e\,E\\
é\,É\\
f\,F\\
g\,G\\
h\,H\\
i\,I\\
í\,Í\\
j\,J\\
k\,K\\
l\,L\\
m\,M\\
n\,N\\
o\,O\\
ó\,Ó\\
p\,P\\
r\,R\\
s\,S\\
t\,T\\
u\,U\\
ú\,Ú\\
v\,V\\
x\,X\\
y\,Y\\
ý\,Ý\\
þ\,Þ\\
æ\,Æ\\
ö\,Ö
\end{alphabet}
%\subsection{Ligatures}
%\begin{flushleft}
%`ß' is sorted like `s\,s', but \emph{after} it in otherwise equal words.
%\end{flushleft}
\subsection{Upper-/lowercase words}
Capitalized or uppercase words are sorted \emph{before} otherwise equal lowercase words.
\subsection{Special characters}
The order of special characters and letters is:
\begin{flushleft}
?\hspace{4mm}!\hspace{4mm}.\hspace{4mm}letters\hspace{4mm}-\hspace{4mm}'
\end{flushleft}
\end{document}

文件mal-icelandicmalmin.tex

%! pdflatex mal-icelandicmalmin.tex
%! texlua iec2utf.lua <mal-icelandicmalmin.idx >mal-temp.idx
%! xindy -M texindy -M icelandicmalmin-test -M mal-style -o mal-icelandicmalmin.ind mal-temp.idx
%! pdflatex mal-icelandicmalmin.tex
%
% iec2utf.lua <--- https://github.com/michal-h21/iec2utf
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[icelandic,czech,english]{babel}
%\usepackage{luatextra} % for lualatex run
%\usepackage{xltxtra} % for xelatex run
\usepackage[colorlinks]{hyperref}%hyperindex=false
\usepackage{makeidx}
\makeindex
\usepackage{filecontents}
\def\mygroup#1{\textbf{#1}}
\begin{filecontents*}{mal-style.xdy}
;; mal-style.xdy
(markup-letter-group :open-head "~n\mygroup{" :close-head "}")
\end{filecontents*}

\begin{document}
The first paragraph of text\ldots
\index{Þari -- Franz Eugen Kohler, Public Domain}
\index{Þistill -- Šaržík František, COPYRIGHT/PD}
\index{Önd -- Karney, Lee, PD}
\index{Æðarkóngur -- Whitehouse, Laura L., PD}
\index{Avókadó -- Forest \& Kim [[p:2684;Starr]], CC-BY}
\index{Auðnutittlingur -- Arnstein Rønning, CC BY-SA 3.0}
\index{Asni -- Zicha Ondřej, COPYRIGHT/CC-BY-NC}
\index{Á -- hvalur.org, CC Unported Licence}
\index{Álft -- Bukovský Jiří, COPYRIGHT/CC-BY-NC}
\index{Álka -- Jack Spellingbacon from Scotland, CC BY-SA 3.0}
\index{Ð -- another fake index entry}
\index{E -- a testing index entry}
\index{É -- a testing index entry}
\begingroup
\pagestyle{empty}
\def\thispagestyle#1{}
\printindex
\endgroup
\end{document}

姆韋

相關內容