如何將增量計數附加到文字檔案的每個預定義單字?
就像這個問題一樣: 如何將增量計數附加到文字檔案的每一行?
我想向文字檔案添加增量計數。但我不想向每行添加增量計數,而是想向預先定義的單字添加增量計數。
例如,如果我想統計文本中的“cinema”一詞,我希望將所有出現的“cinema”更改為“cinemaN”,其中 N 是增量數,N 的最大值取決於有多少個文本中出現“電影」一詞的次數。
這樣,包含此文字的輸入文字檔:
他開著車去電影院。隨後他進了電影院買了票,後來才發現,距離他上次去電影院已經有兩年多了。
產生包含以下內容的輸出檔:
他開車去電影院1。隨後,他走進電影院買票,事後發現,距離他上次去電影院已經兩年多了。
最好我還希望能夠按向後順序對所選單字進行編號。
即,這將產生具有以下內容的第二個輸出檔案:
他開車去電影院3。隨後,他走進電影院買票,事後發現,距離他上次去電影院已經兩年多了。
答案1
我比較喜歡perl
這個:
$ cat ip.txt
He drove his car to the cinema. He then went inside the cinema to purchase tickets, and afterwards discovered that it was more then two years since he last visited the cinema.
$ # forward counting is easy
$ perl -pe 's/\bcinema\b/$&.++$i/ge' ip.txt
He drove his car to the cinema1. He then went inside the cinema2 to purchase tickets, and afterwards discovered that it was more then two years since he last visited the cinema3.
\bcinema\b
要搜尋的單字,使用單字邊界,這樣它就不會作為另一個單字的部分部分進行匹配。例如,\bpar\b
不會匹配apart
或park
或spar
ge
此g
標誌用於全域替換。e
允許在替換部分使用 Perl 程式碼$&.++$i
是匹配單字和預遞增值的串聯,其$i
預設值為0
對於反向,我們需要先得到計數......
$ c=$(grep -ow 'cinema' ip.txt | wc -l) perl -pe 's/\bcinema\b/$&.$ENV{c}--/ge' ip.txt
He drove his car to the cinema3. He then went inside the cinema2 to purchase tickets, and afterwards discovered that it was more then two years since he last visited the cinema1.
c
成為可透過哈希存取的環境變量%ENV
或者,perl
單獨使用整個文件
perl -0777 -pe '$c=()=/\bcinema\b/g; s//$&.$c--/ge' ip.txt
答案2
使用 GNU awk 進行多字元 RS、不區分大小寫的匹配和字邊界:
$ awk -v RS='^$' -v ORS= -v word='cinema' '
BEGIN { IGNORECASE=1 }
{ cnt=gsub("\\<"word"\\>","&"); while (sub("\\<"word"\\>","&"cnt--)); print }
' file
He drove his car to the cinema3. He then went inside the cinema2 to purchase tickets, and afterwards discovered that it was more then two years since he last visited the cinema1.
答案3
考慮單字後面的標點符號。
正向編號:
word="cinema"
awk -v word="$word" '
{
for (i = 1; i <= NF; i++)
if ($i ~ word "([,.;:)]|$)") {
gsub(word, word "" ++count,$i)
}
print
}' input-file
向後編號:
word="cinema"
count="$(awk -v word="$word" '
{ count += gsub(word, "") }
END { print count }' input-file)"
awk -v word="$word" -v count="$count" '
{
for (i = 1; i <= NF; i++)
if ($i ~ word "([,.;:)]|$)") {
gsub(word, word "" count--, $i)
}
print
}' input-file
答案4
為了以降序標記單詞,我們反轉正則表達式並反轉數據,最後再次反轉日期以實現轉換:
perl -l -0777pe '$_ = reverse reverse =~ s/(?=\bamenic\b)/++$a/gre' input.data
結果
He drove his car to the cinema3. He then went inside the cinema2 to purchase tickets, and
afterwards discovered that it was more then two years since he last visited the cinema1.
為了按升序標記單字,我們對單字進行後向搜尋:
perl -lpe 's/\bcinema\b\K/++$a/eg' input.data
結果
He drove his car to the cinema1. He then went inside the cinema2 to purchase tickets, and
afterwards discovered that it was more then two years since he last visited the cinema3.