如何在混亂的句子中找到單字的位置？

2024-6-16 • tag-icon

sorting superscripts datatool database

如何在混亂的句子中找到單字的位置？

我有一個最初的句子：

敏捷的棕色狐狸跳過了懶狗。

我有一個新句子（它總是原句子的混亂）：

懶惰的狗跳過了敏捷的棕色狐狸。

在原來的句子中，對於每個單詞，我想根據混亂的句子在單字位置上標。有人可以指導我如何實現這個目標嗎？

任何新穎的方法（使用新的包裝）都會受到讚賞。提前致謝。在接下來的MWE中，我顯然沒有達到我真正想要的。

\documentclass[12pt]{memoir}
\usepackage{listofitems}
\usepackage{amsmath}

\newcommand{\wordsI}
{   1. The,
    2. quick,
    3. brown,       
    4. fox      
    +
    5. jumps,
    6. over,
    7. the,
    8. lazy,
    9. dog
}

\newcommand{\wordsII}
{   The
    lazy
    dog         
    jumps
    over
    the
    quick
    brown
    fox
}

% Tokenize the words in order to display them
\newcommand{\tokenize}[1]
{%
    \setsepchar{+/,/./}
    \readlist*\textarray{#1}
    \foreachitem\groupoflines\in\textarray
    {
        \setsepchar{,}
        \readlist*\linearray{\groupoflines} 
        \foreachitem\line\in\linearray
        {  
            \setsepchar{.}
            \readlist*\wordarray{\line}
            $ \text{\wordarray[2]} ^ {\wordarray[1]} $
        }%
        \newline
    }
}

\begin{document}
\noindent
Actual sentence:
\newline
% The splitting of the sentence in 2 lines is intentional
\tokenize{\wordsI}

\noindent
Jumbled sentence:
\textbf{\wordsII}   
\end{document}

在這個例子中，如果我有以下定義，我將得到我需要的結果：

\newcommand{\wordsI}
{   1. The,
    7. quick,
    8. brown,       
    9. fox      
    +
    4. jumps,
    5. over,
    6. the,
    2. lazy,
    3. dog
}

但是，我不想手動進行更改。我正在尋找一種方法，使其基於混亂的句子變得「動態」。

編輯： 即使在這樣的情況下我也想實現這個目標：

最初的句子：

敏捷的棕色狐狸跳過了懶狗。

亂七八糟的句子：

這隻懶狗跳過了那隻敏捷的棕色狐狸。

在這種情況下，我需要為初始句子中的單字添加某種“標籤”，以使混亂的句子不含糊。

\newcommand{\wordsI}
{   1. the,
    2. quick,
    3. brown,       
    4. fox      
    +
    5. jumps,
    6. over,
    7. the,
    8. lazy,
    9. dog
}

\newcommand{\wordsII}
{   7. the
    8. lazy
    9. dog         
    5. jumps
    6. over
    1. the
    2. quick
    3. brown
    4. fox
}

期望的輸出：

答案1

在我看來，TeX 最有趣的事情是它的排版，最糟糕的是它的編程工具，所以最好在 TeX 之外進行此類編程（盡可能遠！），並僅使用 TeX 進行排版。一切都可能是可能的使用 TeX，但它不一定是最簡單/最可維護的解決方案。

不過，如果使用 TeX，則使用 LuaTeX 更容易完成此類程式設計（至少對我來說，我想對大多數人來說）。編譯以下文件lualatex（我讓您的“標籤”是可選的：您可以標記每個單詞，例如the(1) quick(2) ...，或僅標記重複的單詞）：

\documentclass[12pt]{memoir}
\usepackage{amsmath} % For \text

\newcommand{\printword}[2]{$\text{#1} ^ {#2}$\quad} % Or whatever formatting you like.
\newcommand{\linesep}{\newline}

\directlua{dofile('jumble.lua')}
\newcommand{\printjumble}[2]{
  \directlua{get_sentence1_lines()}{#1}
  \directlua{get_sentence2_words()}{#2}
  %
  \noindent
  Actual sentence:
  \newline
  \directlua{print_sentence1_lines()}

  \noindent
  Jumbled sentence:
  \textbf{\directlua{print_sentence2()}}
}

\begin{document}
\printjumble{
  the(1) quick brown fox
  +
  jumps over the(7) lazy dog
}{
  the(7) lazy dog jumps over the(1) quick brown fox
}
\end{document}

其中jumble.lua（可以內聯到同一個.tex文件中，但我更喜歡分開）如下：

-- Expected from TeX: before calling print_sentence1_lines(),
--     call get_sentence1_lines() and get_sentence2_words()
--     define \printword and \linesep.
-- Globals: sentence2_words, position_for_word, sentence1_lines

function get_sentence1_lines()
   sentence1_lines = token.scan_string()
end

function get_sentence2_words()
   local sentence2 = token.scan_string()
   sentence2_words = {}
   position_for_word = {}
   local i = 0
   for word in string.gmatch(sentence2, "%S+") do
      i = i + 1
      assert(position_for_word[word] == nil, string.format('Duplicate word: %s', word))
      sentence2_words[i] = without_tags(word)
      position_for_word[word] = i
   end
end

function print_sentence2()
   for i, word in ipairs(sentence2_words) do
      tex.print(word)
   end
end

function print_sentence1_lines()
   for line in string.gmatch(sentence1_lines, "[^+]+") do
      for word in string.gmatch(line, "%S+") do
         position = position_for_word[word]
         assert(position_for_word[word] ~= nil, string.format('New word: %s', word))
         tex.print(string.format([[\printword{%s}{%s}]], without_tags(word), position))
      end
      tex.print([[\linesep]])
   end
end

function without_tags(word)
   local new_word = string.gsub(word, "%(.*%)", "")
   return new_word
end

這會產生

正如問題中那樣。

請注意，您可以透過移動內容來使其更短（例如，請參閱此答案的第一個修訂版），但我發現保留（盡可能多）文件中的排版指令和文件.tex中的程式設計是最乾淨的.lua。

答案2

像這樣的東西嗎？

\documentclass{article}
\usepackage{xparse}

\ExplSyntaxOn

\seq_new:N \l_jsp_sentence_temp_seq
\seq_new:N \l_jsp_sentence_original_seq
\seq_new:N \l_jsp_sentence_jumbled_seq
\prop_new:N \l_jsp_sentence_original_ind_prop
\prop_new:N \l_jsp_sentence_jumbled_ind_prop
\int_new:N \l_jsp_sentence_word_int

\NewDocumentCommand{\parseoriginalsentence}{m}
 {
  \seq_set_split:Nnn \l_jsp_sentence_temp_seq { + } { #1 }
  \seq_clear:N \l_jsp_sentence_original_seq
  \prop_clear:N \l_jsp_sentence_original_ind_prop
  \seq_map_inline:Nn \l_jsp_sentence_temp_seq
   {
    \int_zero:N \l_jsp_sentence_word_int
    \clist_map_inline:nn { ##1 }
     {
      \int_incr:N \l_jsp_sentence_word_int
      \seq_put_right:Nn \l_jsp_sentence_original_seq { ####1 }
      \prop_put:Nnx \l_jsp_sentence_original_ind_prop
       { ####1 } { \int_to_arabic:n { \l_jsp_sentence_word_int } }
     }
    \seq_put_right:Nn \l_jsp_sentence_original_seq { + }
   }
 }
\NewDocumentCommand{\parsejumbledsentence}{m}
 {
  \prop_clear:N \l_jsp_sentence_jumbled_ind_prop
  \seq_set_split:Nnn \l_jsp_sentence_jumbled_seq { , } { #1 }
  \int_zero:N \l_jsp_sentence_word_int
  \seq_map_inline:Nn \l_jsp_sentence_jumbled_seq
   {
    \int_incr:N \l_jsp_sentence_word_int
    \prop_put:Nnx \l_jsp_sentence_jumbled_ind_prop
     { ##1 } { \int_to_arabic:n { \l_jsp_sentence_word_int } }
   }
 }

\NewDocumentCommand{\printoriginalsentence}{s}
 {
  \IfBooleanTF{#1}
   {
    \jsp_sentence_print_from_original:
   }
   {
    \jsp_sentence_print_from_jumbled:
   }
 }

\cs_new_protected:Nn \jsp_sentence_print_from_original:
 {
  \seq_map_inline:Nn \l_jsp_sentence_original_seq
   {
    \tl_if_eq:nnTF { ##1 } { + }
     {
      \par
     }
     {
      \prop_item:Nn \l_jsp_sentence_original_ind_prop { ##1 }.\nobreakspace ##1 ~
     }
   }
 }

\cs_new_protected:Nn \jsp_sentence_print_from_jumbled:
 {
  \seq_map_inline:Nn \l_jsp_sentence_original_seq
   {
    \tl_if_eq:nnTF { ##1 } { + }
     {
      \par
     }
     {
      \prop_item:Nn \l_jsp_sentence_jumbled_ind_prop { ##1 }.\nobreakspace ##1 ~
     }
   }
 }
\ExplSyntaxOff

\begin{document}

\parseoriginalsentence{
  The,
  quick,
  brown,
  fox
  +
  jumps,
  over,
  the,
  lazy,
  dog
}
\parsejumbledsentence{
  The,
  lazy,
  dog,
  jumps,
  over,
  the,
  quick,
  brown,
  fox
}

\printoriginalsentence*

\bigskip

\printoriginalsentence

\end{document}

相關內容