搜尋模式並建立同名文件

Question 1

您的文件由一組 JSON 物件組成。每個物件都包含一個.location_country鍵。我們可以從每個物件建立一個 shell 命令，將物件本身的序列化副本寫入由鍵值命名的檔案中.location_country。然後這些 shell 指令可以由 shell 執行。

使用jq，

jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt

@json可以使用in 運算子建立序列化對象jq，該運算子將發出包含輸入文件（在本例中為目前對象）的 JSON 編碼字串。然後將其輸入@sh以正確引用 shell 的字串。此@sh運算符也用於根據.location_country鍵的值建立部分輸出檔名。

該命令本質上創建 shell 程式碼，該程式碼將呼叫printf、輸出當前物件並將輸出重定向到特定檔案。

鑑於中的範例數據file.txt，這將發出以下內容：

printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

您可以將其重定向到一個單獨的檔案並運行它來sh執行命令，或者您可以eval直接在 shell 中使用：

eval "$( jq ...as above... )"

由於我們使用正確的 JSON 解析器，jq因此即使輸入 JSON 文件的格式不是每行一個對象，上面的程式碼也能運作。

$ cat file.txt
{
  "full_name": "name1",
  "location_country": "united kingdom"
}
{
  "full_name": "name2",
  "location_country": "united states"
}
{
  "full_name": "name3",
  "location_country": "china"
}

$ jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt
printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

$ eval "$( jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt )"
$ ls
china.txt           file.txt            united kingdom.txt  united states.txt
$ cat 'united kingdom.txt'
{"full_name":"name1","location_country":"united kingdom"}

Answer

您的文件由一組 JSON 物件組成。每個物件都包含一個.location_country鍵。我們可以從每個物件建立一個 shell 命令，將物件本身的序列化副本寫入由鍵值命名的檔案中.location_country。然後這些 shell 指令可以由 shell 執行。

使用jq，

jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt

@json可以使用in 運算子建立序列化對象jq，該運算子將發出包含輸入文件（在本例中為目前對象）的 JSON 編碼字串。然後將其輸入@sh以正確引用 shell 的字串。此@sh運算符也用於根據.location_country鍵的值建立部分輸出檔名。

該命令本質上創建 shell 程式碼，該程式碼將呼叫printf、輸出當前物件並將輸出重定向到特定檔案。

鑑於中的範例數據file.txt，這將發出以下內容：

printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

您可以將其重定向到一個單獨的檔案並運行它來sh執行命令，或者您可以eval直接在 shell 中使用：

eval "$( jq ...as above... )"

由於我們使用正確的 JSON 解析器，jq因此即使輸入 JSON 文件的格式不是每行一個對象，上面的程式碼也能運作。

$ cat file.txt
{
  "full_name": "name1",
  "location_country": "united kingdom"
}
{
  "full_name": "name2",
  "location_country": "united states"
}
{
  "full_name": "name3",
  "location_country": "china"
}

$ jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt
printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

$ eval "$( jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt )"
$ ls
china.txt           file.txt            united kingdom.txt  united states.txt
$ cat 'united kingdom.txt'
{"full_name":"name1","location_country":"united kingdom"}

Question 2

使用awk

輸入

$ cat input_file
{"full_name":"name1","location_country":"united kingdom"}
{"full_name":"name2","location_country":"united states"}
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

awk -F"[\"|:]" '$10~/[A-Za-z]/ {print > $10".txt"}' input_file

輸出

$ cat china.txt
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

$ cat united\ kingdom.txt
{"full_name":"name1","location_country":"united kingdom"}

$ cat united\ states.txt
{"full_name":"name2","location_country":"united states"}

Answer

使用awk

輸入

$ cat input_file
{"full_name":"name1","location_country":"united kingdom"}
{"full_name":"name2","location_country":"united states"}
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

awk -F"[\"|:]" '$10~/[A-Za-z]/ {print > $10".txt"}' input_file

輸出

$ cat china.txt
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

$ cat united\ kingdom.txt
{"full_name":"name1","location_country":"united kingdom"}

$ cat united\ states.txt
{"full_name":"name2","location_country":"united states"}

Question 3

鑑於您在下面的評論，這應該可以使用 GNU awk 將第三個參數用於 match() 並處理許多同時開啟的檔案來完成您想要的操作：

awk 'match($0,/"location_country":"([^"]+)"/,a) { print > (a[1] ".txt") }' file

對於執行速度來說，裝飾/排序/使用/取消裝飾方法可能是最好的，例如：

awk -v OFS='"' 'match($0,/"location_country":"[^"]+"/) { print substr($0,RSTART+20,RLENGTH-21), $0 }' file |
sort -t'"' -k1,1 |
awk -F'"' '$1!=prev { close(out); out=$1 ".txt"; prev=$1 } { print > out }' |
cut -d'"' -f2-

這適用於任何排序、awk 和 cut。

原答案：

如果您的資料總是那麼簡單/規則，那麼您所需要的就是使用 GNU awk （處理許多同時開啟的輸出檔案）：

awk -F'"' '{ print > ($5 ".txt") }' file

或與任何 awk 一起使用：

awk -F'"' '{
    out = $5 ".txt"
    if ( !seen[out]++ ) {
        printf "" > out
    }
    print >> out
    close(out)
}' file

無論您的輸入檔案有多大，只要您有可用於建立輸出檔案的磁碟空間，上述方法都會起作用。

如果您願意的話，可以透過先對國家/地區名稱進行排序來更有效地完成此操作：

sort -t'"' -k5,5 file |
awk -F'"' '$5 != prev{ close(out); out=$5 ".txt"; prev=$5 } { print > out }'

最後一個腳本適用於任何排序和任何 awk，但它可以重新排列每個國家的輸入行的順序。如果你關心這一點並且有 GNU 排序，那麼添加參數-s。如果您關心並且沒有 GNU 排序，請告訴我，因為有一個非常簡單的解決方法。

Answer