計算數字大於 100 的行數

Question 1

讓我們考慮這個測試文件：

$ cat myfile
98
99
100
101
102
103
104
105

現在，我們來計算數字大於 100 的行數：

$ awk '$1>100{c++} END{print c+0}' myfile
5

怎麼運作的

$1>100{c++}

每當該行上的數字大於 100 時，該變數c就會增加 1。
END{print c+0}

當我們讀完文件後，變數c就會被印出來。

透過添加0到c，我們強制 awk 將其c視為數字。如果有任何帶有數字的行>100，則c已經是數字。如果沒有，那c就是一個空的（帽子提示：伊魯卡）。透過向其添加零，我們將空字串更改為 a 0，從而給出更正確的輸出。

Answer

讓我們考慮這個測試文件：

$ cat myfile
98
99
100
101
102
103
104
105

現在，我們來計算數字大於 100 的行數：

$ awk '$1>100{c++} END{print c+0}' myfile
5

怎麼運作的

$1>100{c++}

每當該行上的數字大於 100 時，該變數c就會增加 1。
END{print c+0}

當我們讀完文件後，變數c就會被印出來。

透過添加0到c，我們強制 awk 將其c視為數字。如果有任何帶有數字的行>100，則c已經是數字。如果沒有，那c就是一個空的（帽子提示：伊魯卡）。透過向其添加零，我們將空字串更改為 a 0，從而給出更正確的輸出。

Question 2

類似的解決方案perl

$ seq 98 105 | perl -ne '$c++ if $_ > 100; END{print $c+0 ."\n"}'
5

速度對比：連續 3 次運行報告的數字

隨機檔案：

$ perl -le 'print int(rand(200)) foreach (0..10000000)' > rand_numbers.txt
$ perl -le 'print int(rand(100200)) foreach (0..10000000)' >> rand_numbers.txt

$ shuf rand_numbers.txt -o rand_numbers.txt 
$ tail -5 rand_numbers.txt 
114
100
66125
84281
144
$ wc rand_numbers.txt 
20000002 20000002 93413515 rand_numbers.txt
$ du -h rand_numbers.txt 
90M rand_numbers.txt

和awk

$ time awk '$1>100{c++} END{print c+0}' rand_numbers.txt 
14940305

real    0m7.754s
real    0m8.150s
real    0m7.439s

和perl

$ time perl -ne '$c++ if $_ > 100; END{print $c+0 ."\n"}' rand_numbers.txt 
14940305

real    0m4.145s
real    0m4.146s
real    0m4.196s

只是為了好玩grep（更新：甚至比 LC_ALL=C 的 Perl 還要快）

$ time grep -xcE '10[1-9]|1[1-9][0-9]|[2-9][0-9]{2,}|1[0-9]{3,}' rand_numbers.txt 
14940305

real    0m10.622s

$ time LC_ALL=C grep -xcE '10[1-9]|1[1-9][0-9]|[2-9][0-9]{2,}|1[0-9]{3,}' rand_numbers.txt
14940305

real    0m0.886s
real    0m0.889s
real    0m0.892s

sed一點也不好玩：

$ time sed -nE '/^10[1-9]|1[1-9][0-9]|[2-9][0-9]{2,}|1[0-9]{3,}$/p' rand_numbers.txt | wc -l
14940305

real    0m11.929s

$ time LC_ALL=C sed -nE '/^10[1-9]|1[1-9][0-9]|[2-9][0-9]{2,}|1[0-9]{3,}$/p' rand_numbers.txt | wc -l
14940305

real    0m6.238s

Answer

類似的解決方案perl

$ seq 98 105 | perl -ne '$c++ if $_ > 100; END{print $c+0 ."\n"}'
5

速度對比：連續 3 次運行報告的數字

隨機檔案：

$ perl -le 'print int(rand(200)) foreach (0..10000000)' > rand_numbers.txt
$ perl -le 'print int(rand(100200)) foreach (0..10000000)' >> rand_numbers.txt

$ shuf rand_numbers.txt -o rand_numbers.txt 
$ tail -5 rand_numbers.txt 
114
100
66125
84281
144
$ wc rand_numbers.txt 
20000002 20000002 93413515 rand_numbers.txt
$ du -h rand_numbers.txt 
90M rand_numbers.txt

和awk

$ time awk '$1>100{c++} END{print c+0}' rand_numbers.txt 
14940305

real    0m7.754s
real    0m8.150s
real    0m7.439s

和perl

$ time perl -ne '$c++ if $_ > 100; END{print $c+0 ."\n"}' rand_numbers.txt 
14940305

real    0m4.145s
real    0m4.146s
real    0m4.196s

只是為了好玩grep（更新：甚至比 LC_ALL=C 的 Perl 還要快）

$ time grep -xcE '10[1-9]|1[1-9][0-9]|[2-9][0-9]{2,}|1[0-9]{3,}' rand_numbers.txt 
14940305

real    0m10.622s

$ time LC_ALL=C grep -xcE '10[1-9]|1[1-9][0-9]|[2-9][0-9]{2,}|1[0-9]{3,}' rand_numbers.txt
14940305

real    0m0.886s
real    0m0.889s
real    0m0.892s

sed一點也不好玩：

$ time sed -nE '/^10[1-9]|1[1-9][0-9]|[2-9][0-9]{2,}|1[0-9]{3,}$/p' rand_numbers.txt | wc -l
14940305

real    0m11.929s

$ time LC_ALL=C sed -nE '/^10[1-9]|1[1-9][0-9]|[2-9][0-9]{2,}|1[0-9]{3,}$/p' rand_numbers.txt | wc -l
14940305

real    0m6.238s

計算數字大於 100 的行數

答案1

怎麼運作的

答案2

相關內容