取得文件中每個單字的出現次數

Question 1

嘗試這個：

grep -o '\w*' doc.txt | sort | uniq -c | sort -nr

-o列印每個匹配項而不是匹配行
\w*匹配單字字符
sort在管道傳輸到之前對匹配項進行排序uniq。
uniq -c列印唯一行和出現的次數-c
sort -nr依出現次數進行反向排序。

輸出：

  2 word
  1 third
  1 second
  1 really

選擇：

用於awk精確輸出：

$ grep -o '\w*' doc.txt \
| awk '{seen[$0]++} END{for(s in seen){print s,seen[s]}}' \
| sort -k2r

word 2
really 1
second 1
third 1

Answer

嘗試這個：

grep -o '\w*' doc.txt | sort | uniq -c | sort -nr

-o列印每個匹配項而不是匹配行
\w*匹配單字字符
sort在管道傳輸到之前對匹配項進行排序uniq。
uniq -c列印唯一行和出現的次數-c
sort -nr依出現次數進行反向排序。

輸出：

  2 word
  1 third
  1 second
  1 really

選擇：

用於awk精確輸出：

$ grep -o '\w*' doc.txt \
| awk '{seen[$0]++} END{for(s in seen){print s,seen[s]}}' \
| sort -k2r

word 2
really 1
second 1
third 1

Question 2

perl -lnE '
  $count{$_}++ for /[[:alpha:]]+/g;
  END {
    say "@$_" for
      sort {$b->[1] <=> $a->[1] || $a->[0] cmp $b->[0]}
      map {[$_, $count{$_}]}
      keys %count
  }
' doc.txt

這將比 pLumo 的初始解決方案消耗更多的記憶體。

Answer

perl -lnE '
  $count{$_}++ for /[[:alpha:]]+/g;
  END {
    say "@$_" for
      sort {$b->[1] <=> $a->[1] || $a->[0] cmp $b->[0]}
      map {[$_, $count{$_}]}
      keys %count
  }
' doc.txt

這將比 pLumo 的初始解決方案消耗更多的記憶體。

取得文件中每個單字的出現次數

答案1

答案2

相關內容