パターンを検索し、同じ名前のファイルを作成します

Question 1

ファイルは JSON オブジェクトのセットで構成されます。各オブジェクト.location_countryにはキーが含まれています。各オブジェクトから、オブジェクト自体のシリアル化されたコピーをキーの値で指定されたファイルに書き込むシェルコマンドを作成できます.location_country。これらのシェルコマンドは、シェルによって実行できます。

を使用するとjq、

jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt

@jsonシリアル化されたオブジェクトは、演算子 in を使用して作成できますjq。この演算子は、入力ドキュメント (この場合は現在のオブジェクト) を含む JSON エンコードされた文字列を出力します。次に、この文字列がに渡され、@shシェル用に文字列が適切に引用されます。@sh演算子は、キーの値から出力ファイル名の一部を作成するためにも使用されます.location_country。

printfこのコマンドは基本的に、を呼び出して現在のオブジェクトを出力し、その出力を特定のファイルにリダイレクトするシェルコードを作成します。

のサンプルデータを指定するとfile.txt、次の結果が出力されます。

printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

shこれを別のファイルにリダイレクトして実行し、コマンドを実行することもできますし、evalシェルで直接使用することもできます。

eval "$( jq ...as above... )"

適切な JSON パーサーを使用しているため、jq入力 JSON ドキュメントが 1 行に 1 つのオブジェクトでフォーマットされていない場合でも上記は機能します。

$ cat file.txt
{
  "full_name": "name1",
  "location_country": "united kingdom"
}
{
  "full_name": "name2",
  "location_country": "united states"
}
{
  "full_name": "name3",
  "location_country": "china"
}

$ jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt
printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

$ eval "$( jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt )"
$ ls
china.txt           file.txt            united kingdom.txt  united states.txt
$ cat 'united kingdom.txt'
{"full_name":"name1","location_country":"united kingdom"}

Answer

ファイルは JSON オブジェクトのセットで構成されます。各オブジェクト.location_countryにはキーが含まれています。各オブジェクトから、オブジェクト自体のシリアル化されたコピーをキーの値で指定されたファイルに書き込むシェルコマンドを作成できます.location_country。これらのシェルコマンドは、シェルによって実行できます。

を使用するとjq、

jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt

@jsonシリアル化されたオブジェクトは、演算子 in を使用して作成できますjq。この演算子は、入力ドキュメント (この場合は現在のオブジェクト) を含む JSON エンコードされた文字列を出力します。次に、この文字列がに渡され、@shシェル用に文字列が適切に引用されます。@sh演算子は、キーの値から出力ファイル名の一部を作成するためにも使用されます.location_country。

printfこのコマンドは基本的に、を呼び出して現在のオブジェクトを出力し、その出力を特定のファイルにリダイレクトするシェルコードを作成します。

のサンプルデータを指定するとfile.txt、次の結果が出力されます。

printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

shこれを別のファイルにリダイレクトして実行し、コマンドを実行することもできますし、evalシェルで直接使用することもできます。

eval "$( jq ...as above... )"

適切な JSON パーサーを使用しているため、jq入力 JSON ドキュメントが 1 行に 1 つのオブジェクトでフォーマットされていない場合でも上記は機能します。

$ cat file.txt
{
  "full_name": "name1",
  "location_country": "united kingdom"
}
{
  "full_name": "name2",
  "location_country": "united states"
}
{
  "full_name": "name3",
  "location_country": "china"
}

$ jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt
printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

$ eval "$( jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt )"
$ ls
china.txt           file.txt            united kingdom.txt  united states.txt
$ cat 'united kingdom.txt'
{"full_name":"name1","location_country":"united kingdom"}

Question 2

使用awk

入力

$ cat input_file
{"full_name":"name1","location_country":"united kingdom"}
{"full_name":"name2","location_country":"united states"}
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

awk -F"[\"|:]" '$10~/[A-Za-z]/ {print > $10".txt"}' input_file

出力

$ cat china.txt
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

$ cat united\ kingdom.txt
{"full_name":"name1","location_country":"united kingdom"}

$ cat united\ states.txt
{"full_name":"name2","location_country":"united states"}

Answer

使用awk

入力

$ cat input_file
{"full_name":"name1","location_country":"united kingdom"}
{"full_name":"name2","location_country":"united states"}
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

awk -F"[\"|:]" '$10~/[A-Za-z]/ {print > $10".txt"}' input_file

出力

$ cat china.txt
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

$ cat united\ kingdom.txt
{"full_name":"name1","location_country":"united kingdom"}

$ cat united\ states.txt
{"full_name":"name2","location_country":"united states"}

Question 3

以下のコメントを考慮すると、match() の 3 番目の引数に GNU awk を使用して、同時に開いている多数のファイルを処理することで、必要な処理を実行できるはずです。

awk 'match($0,/"location_country":"([^"]+)"/,a) { print > (a[1] ".txt") }' file

ただし、実行速度を考えると、装飾/並べ替え/使用/非装飾のアプローチがおそらく最適です。例:

awk -v OFS='"' 'match($0,/"location_country":"[^"]+"/) { print substr($0,RSTART+20,RLENGTH-21), $0 }' file |
sort -t'"' -k1,1 |
awk -F'"' '$1!=prev { close(out); out=$1 ".txt"; prev=$1 } { print > out }' |
cut -d'"' -f2-

そして、これは任意の sort、awk、cut で動作します。

元の回答:

データが常にそのように単純/規則的である場合、必要なのは GNU awk を使用した次の処理だけです (同時に開いている多数の出力ファイルを処理するため)。

awk -F'"' '{ print > ($5 ".txt") }' file

または任意のawkでこれを実行します:

awk -F'"' '{
    out = $5 ".txt"
    if ( !seen[out]++ ) {
        printf "" > out
    }
    print >> out
    close(out)
}' file

出力ファイルを作成するためのディスク容量があれば、入力ファイルのサイズに関係なく上記は機能します。

必要に応じて、最初に国名で並べ替えると、より効率的に行うことができます。

sort -t'"' -k5,5 file |
awk -F'"' '$5 != prev{ close(out); out=$5 ".txt"; prev=$5 } { print > out }'

最後のスクリプトは、任意のソートおよび任意の awk で動作しますが、各国の入力行の順序が並べ替えられる可能性があります。その点を気にしていて、GNU ソートを使用している場合は、-s引数を追加してください。その点を気にしていて、GNU ソートを使用していない場合は、非常に簡単な回避策があるのでお知らせください。

Answer