awk 条件を満たす列を抽出します

awk 条件を満たす列を抽出します

私は、列番号 5、6、7、8、10、13 を取り、列 44 の行が 7 に等しく、同時に列 3 の行が 1 に等しい行も取得する csv ファイルを準備するスクリプトを作成しています。

入力:

"ID_Bcn_2019","ID_Bcn_2016","Codi_Principal_Activitat","Nom_Principal_Activitat","Codi_Sector_Activitat","Nom_Sector_Activitat","Codi_Grup_Activitat","Nom_Grup_Activitat","Codi_Activitat_2019","Nom_Activitat","Codi_Activitat_2016","Nom_Local","SN_Oci_Nocturn","SN_Coworking","SN_Servei_Degustacio","SN_Obert24h","SN_Mixtura","SN_Carrer","SN_Mercat","Nom_Mercat","SN_Galeria","Nom_Galeria","SN_CComercial","Nom_CComercial","SN_Eix","Nom_Eix","X_UTM_ETRS89","Y_UTM_ETRS89","Latitud","Longitud","Direccio_Unica","Codi_Via","Nom_Via","Planta","Porta","Num_Policia_Inicial","Lletra_Inicial","Num_Policia_Final","Lletra_Final","Solar","Codi_Parcela","Codi_Illa","Seccio_Censal","Codi_Barri","Nom_Barri","Codi_Districte","Nom_Districte","Referencia_cadastral","Data_Revisio"
1059038,"68849","1","Actiu","2","Serveis","14","Restaurants, bars i hotels (Inclòs hostals, pensions i fondes)","1400002","Restaurants","1400002","QUATRE COSES","1","1","1","1","1","0","1","","1","","1","","0","Rambla Catalunya","430088.542","4582365.352","41.38978196","2.16378361","089004, 329-329, LOC 10","089004","CONSELL DE CENT","LOC","10","329","","329","","114142","019","60490","079","07","la Dreta de l'Eixample","02","Eixample","0125419DF3802E","20190509"
1075454,"","1","Actiu","2","Serveis","16","Altres","1600400","Serveis a les empreses i oficines","16004","SORIGUE","1","1","1","1","1","0","1","","1","","1","","1","","427229.272","4577543.637","41.34610100","2.13016600","222206, 19-19, LOC 10","222206","MOTORS","LOC","10","19","","19","","","","","025","12","la Marina del Prat Vermell","03","Sants-Montjuïc","","20190925"
1075453,"","1","Actiu","2","Serveis","16","Altres","1600102","Activitats emmagatzematge","1600102","CEJIDOS SIVILA S.A","1","1","1","1","1","0","1","","1","","1","","1","","427178.393","4577526.160","41.34593900","2.12956000","222206, 278-282, LOC 10","222206","MOTORS","LOC","10","278","","282","","","","","025","12","la Marina del Prat Vermell","03","Sants-Montjuïc","","20190925"

出力:

"Codi_Sector_Activitat","Nom_Sector_Activitat","Codi_Grup_Activitat","Nom_Grup_Activitat","Nom_Activitat","SN_Oci_Nocturn"
"2","Serveis","14","Restaurants, bars i hotels (Inclòs hostals, pensions i fondes)","Restaurants","1"

現時点では、私のスクリプトには次の内容が含まれています。

#!/bin/awk -f

BEGIN { FS = OFS = "," }

NR == 1 { print $5, $6, $7, $8, $10, $13 }

NR != 1 {
         if ($44 == 7) {print}
         if ($3 == 1) {print}
}

しかし、最後の部分についてはよくわかりません。そこで質問なのですが、これらの条件を満たす行だけを抽出するにはどうしたらいいのでしょう($44 == 7)($3 == 1)?

答え1

開始時の注意: 44 個のフィールド セルのいずれも 7 に等しくありません07。となります。

これはawkではなく、ミラー役に立つと思う

mlr --csv  -N filter -S '$3=="1" && $44=="07" || $1=~"ID"' then cut -f 5,6,7,8,10,13  input.csv >outuput.csv

いくつかのコメント:

  • filter条件を使用してフィルタリングし、見出し行を出力します。
  • cut必要なフィールドを抽出する

出力には

コーディ_セクター_アクティビティ ノム_セクター_活動 Codi_Grup_Activitat グループ活動名 ノム_アクティビティ SN_Oci_ノクターン
2 サーヴェイス 14 レストラン、バー、ホテル(ホステル、ペンション、ファンドを含む) レストラン 1

答え2

すべての条件を 1 つのパターン ルールにまとめることができます。ただし、CSV フィールドはすべて引用符で囲まれているため、パターンでそれを考慮する必要があることに注意してください。

$ cat prepare.awk
#!/bin/awk -f

BEGIN { FS = OFS = "," }

NR == 1 || $44 == "\"7\"" || $3 == "\"1\"" {
  print $5, $6, $7, $8, $10, $13
}

それから

$ ./prepare.awk Input
"Codi_Sector_Activitat","Nom_Sector_Activitat","Codi_Grup_Activitat","Nom_Grup_Activitat","Nom_Activitat","SN_Oci_Nocturn"
"2","Serveis","14","Restaurants, pensions i fondes)","1400002"
"2","Serveis","16","Altres","Serveis a les empreses i oficines","1"
"2","Serveis","16","Altres","Activitats emmagatzematge","1"

サンプル入力のヘッダー以外の行はすべて"1"3 列目にあることに注意してください。

行を選択する場合は、両方ヘッダー以外の条件が一致する場合は、条件を次のように変更します。

NR == 1 || ( $44 == "\"7\"" && $3 == "\"1\"" )

関連情報