如何將增量計數附加到文字檔案的每個預定義單字?

如何將增量計數附加到文字檔案的每個預定義單字?

如何將增量計數附加到文字檔案的每個預定義單字?

就像這個問題一樣: 如何將增量計數附加到文字檔案的每一行?

我想向文字檔案添加增量計數。但我不想向每行添加增量計數,而是想向預先定義的單字添加增量計數。

例如,如果我想統計文本中的“cinema”一詞,我希望將所有出現的“cinema”更改為“cinemaN”,其中 N 是增量數,N 的最大值取決於有多少個文本中出現“電影」一詞的次數。

這樣,包含此文字的輸入文字檔:

他開著車去電影院。隨後他進了電影院買了票,後來才發現,距離他上次去電影院已經有兩年多了。

產生包含以下內容的輸出檔:

他開車去電影院1。隨後,他走進電影院買票,事後發現,距離他上次去電影院已經兩年多了。

最好我還希望能夠按向後順序對所選單字進行編號。

即,這將產生具有以下內容的第二個輸出檔案:

他開車去電影院3。隨後,他走進電影院買票,事後發現,距離他上次去電影院已經兩年多了。

答案1

我比較喜歡perl這個:

$ cat ip.txt 
He drove his car to the cinema. He then went inside the cinema to purchase tickets, and afterwards discovered that it was more then two years since he last visited the cinema.

$ # forward counting is easy
$ perl -pe 's/\bcinema\b/$&.++$i/ge' ip.txt 
He drove his car to the cinema1. He then went inside the cinema2 to purchase tickets, and afterwards discovered that it was more then two years since he last visited the cinema3.
  • \bcinema\b要搜尋的單字,使用單字邊界,這樣它就不會作為另一個單字的部分部分進行匹配。例如,\bpar\b不會匹配apartparkspar
  • geg標誌用於全域替換。e允許在替換部分使用 Perl 程式碼
  • $&.++$i是匹配單字和預遞增值的串聯,其$i預設值為0


對於反向,我們需要先得到計數......

$ c=$(grep -ow 'cinema' ip.txt | wc -l) perl -pe 's/\bcinema\b/$&.$ENV{c}--/ge' ip.txt 
He drove his car to the cinema3. He then went inside the cinema2 to purchase tickets, and afterwards discovered that it was more then two years since he last visited the cinema1.
  • c成為可透過哈希存取的環境變量%ENV

或者,perl單獨使用整個文件

perl -0777 -pe '$c=()=/\bcinema\b/g; s//$&.$c--/ge' ip.txt 

答案2

使用 GNU awk 進行多字元 RS、不區分大小寫的匹配和字邊界:

$ awk -v RS='^$' -v ORS= -v word='cinema' '
    BEGIN { IGNORECASE=1 }
    { cnt=gsub("\\<"word"\\>","&"); while (sub("\\<"word"\\>","&"cnt--)); print }
' file
He drove his car to the cinema3. He then went inside the cinema2 to purchase tickets, and afterwards discovered that it was more then two years since he last visited the cinema1.

答案3

考慮單字後面的標點符號。
正向編號:

word="cinema"
awk -v word="$word" '
    { 
      for (i = 1; i <= NF; i++) 
        if ($i ~ word "([,.;:)]|$)") { 
          gsub(word, word "" ++count,$i) 
        }
      print 
    }' input-file

向後編號:

word="cinema"
count="$(awk -v word="$word" '
    { count += gsub(word, "") }
    END { print count }' input-file)"
awk -v word="$word" -v count="$count" '
    { 
      for (i = 1; i <= NF; i++) 
        if ($i ~ word "([,.;:)]|$)") { 
          gsub(word, word "" count--, $i) 
        }
      print 
    }' input-file

答案4

為了以降序標記單詞,我們反轉正則表達式並反轉數據,最後再次反轉日期以實現轉換:

perl -l -0777pe '$_ = reverse reverse =~ s/(?=\bamenic\b)/++$a/gre' input.data

結果

He drove his car to the cinema3. He then went inside the cinema2 to purchase tickets, and
afterwards discovered that it was more then two years since he last visited the cinema1.

為了按升序標記單字,我們對單字進行後向搜尋:

perl -lpe 's/\bcinema\b\K/++$a/eg' input.data

結果

He drove his car to the cinema1. He then went inside the cinema2 to purchase tickets, and
afterwards discovered that it was more then two years since he last visited the cinema3.

相關內容