分割檔：用 `sed` 取代 `egrep`

Question 1

使用wc,head和tail:

half=$(( $(wc -l "$file")/2 ))
head -$half | egrep -c dead | xargs echo "$file" $half > log_1
tail -$half | egrep -c dead | xargs echo "$file" $half > log_2

使用split：

split -a1 --numeric-suffixes=1 -n 'l/2' "$file" "$file"_
echo "$file" "$file"_1 $(egrep -c dead "$file_1") > log_1
echo "$file" "$file"_2 $(egrep -c dead "$file"_2) > log_2
rm "$file"_[12]

Answer

使用wc,head和tail:

half=$(( $(wc -l "$file")/2 ))
head -$half | egrep -c dead | xargs echo "$file" $half > log_1
tail -$half | egrep -c dead | xargs echo "$file" $half > log_2

使用split：

split -a1 --numeric-suffixes=1 -n 'l/2' "$file" "$file"_
echo "$file" "$file"_1 $(egrep -c dead "$file_1") > log_1
echo "$file" "$file"_2 $(egrep -c dead "$file"_2) > log_2
rm "$file"_[12]

Question 2

這是一個 awk 解決方案。

awk '/dead/ { a[++n] = NR }
    END { for (i=1; i<=n; i++) if (a[i] > NR/2) break
        print ARGV, int(NR/2), i-1 >"log_1";
        print ARGV, int(NR/2)+(int(NR/2)!=NR/2), n-i+1 >"log_2" }' file

a我們將匹配的行號收集到數組中。然後我們計算出數組中有多少行號小於最中間的行號；它們的計數分配給第一個分區。（我們必須使用，因為當我們跳出循環i-1時，我們已經過了分區點。）break

一般來說，您希望避免多次重新讀取同一個文件，特別是當它很大時；其次，盡量減少進程數量。

目前尚不清楚您期望中間輸出欄位包含什麼內容。如果檔案包含奇數行，則第一個「一半」將比第二個分割區少一行。（這並不難改變，但你必須決定其中一種方式。）

Answer

這是一個 awk 解決方案。

awk '/dead/ { a[++n] = NR }
    END { for (i=1; i<=n; i++) if (a[i] > NR/2) break
        print ARGV, int(NR/2), i-1 >"log_1";
        print ARGV, int(NR/2)+(int(NR/2)!=NR/2), n-i+1 >"log_2" }' file

a我們將匹配的行號收集到數組中。然後我們計算出數組中有多少行號小於最中間的行號；它們的計數分配給第一個分區。（我們必須使用，因為當我們跳出循環i-1時，我們已經過了分區點。）break

一般來說，您希望避免多次重新讀取同一個文件，特別是當它很大時；其次，盡量減少進程數量。

目前尚不清楚您期望中間輸出欄位包含什麼內容。如果檔案包含奇數行，則第一個「一半」將比第二個分割區少一行。（這並不難改變，但你必須決定其中一種方式。）

分割檔：用 `sed` 取代 `egrep`

答案1

答案2

相關內容