script awk para preparar arquivo csv

script awk para preparar arquivo csv

Estou preso ao criar um script awk que prepara um arquivo csv antes da análise. Preciso criar um arquivo de saída com as colunas 1-2, 10, 13-15, 19-21. Também preciso substituir os números da coluna 2 pelos dias da semana (então, 1 = segunda-feira, 2 = terça-feira...) e converter a 21ª coluna de milhas náuticas para km; e exclusão ""das colunas 10, 13 e 14.

Entrada:

"DAY_OF_MONTH","DAY_OF_WEEK","OP_UNIQUE_CARRIER","OP_CARRIER_AIRLINE_ID","OP_CARRIER","TAIL_NUM","OP_CARRIER_FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN_AIRPORT_SEQ_ID","ORIGIN","DEST_AIRPORT_ID","DEST_AIRPORT_SEQ_ID","DEST","DEP_TIME","DEP_DEL15","DEP_TIME_BLK","ARR_TIME","ARR_DEL15","CANCELLED","DIVERTED","DISTANCE",
1,2,"EV",20366,"EV","N48901","4397",13930,1393007,"ORD",11977,1197705,"GRB","1003",0.00,"1000-1059","1117",0.00,0.00,0.00,174.00,
1,2,"EV",20366,"EV","N16976","4401",15370,1537002,"TUL",13930,1393007,"ORD","1027",0.00,"1000-1059","1216",0.00,0.00,0.00,585.00,
1,2,"EV",20366,"EV","N12167","4404",11618,1161802,"EWR",15412,1541205,"TYS","1848",0.00,"1800-1859","2120",0.00,0.00,0.00,631.00,

Saída:

"DAY_OF_MONTH","DAY_OF_WEEK","ORIGIN","DEST","DEP_TIME","DEP_DEL15","CANCELLED","DIVERTED","DISTANCE"
1,Tuesday,ORD,GRB,1003,0.00,0.00,0.00,322.248
1,Tuesday,TUL,ORD,1027,0.00,0.00,0.00,1083.42
1,Tuesday,EWR,TYS,1848,0.00,0.00,0.00,1168.61

Até agora, recebi o comando para obter as colunas necessárias:

cut -d "," -f1-2,10,13-15,19-21 'Jan_2020_ontime.csv' > 'flights_jan_20.csv'

E também o código para substituir os números da coluna 2 pelos respectivos dias da semana:

awk 'BEGIN {FS = OFS = ","} 
     $2 == 1 {$2 = "Monday"} 
     $2 == 2 {$2 = "Tuesday"} 
     $2 == 3 {$2 = "Wednesday"} 
     $2 == 4 {$2 = "Thursday"} 
     $2 == 5 {$2 = "Friday"} 
     $2 == 6 {$2 = "Saturday"} 
     $2 == 7 {$2 = "Sunday"} 
     {print}' 
file.csv

Também estou sentindo falta de uma maneira de agrupar todo o código no script para executá-lo mais tarde.

Responder1

awk '
    BEGIN {
        split("Monday Tuesday Wednesday Thursday Friday Saturday Sunday",days)
        FS=OFS=","
    }
    NR > 1 {
        gsub(/"/,"")
        $2 = days[$2]
        $21 *= 1.852
    }
    { print $1, $2, $10, $13, $14, $15, $19, $20, $21 }
' file
"DAY_OF_MONTH","DAY_OF_WEEK","ORIGIN","DEST","DEP_TIME","DEP_DEL15","CANCELLED","DIVERTED","DISTANCE"
1,Tuesday,ORD,GRB,1003,0.00,0.00,0.00,322.248
1,Tuesday,TUL,ORD,1027,0.00,0.00,0.00,1083.42
1,Tuesday,EWR,TYS,1848,0.00,0.00,0.00,1168.61

Responder2

#!/bin/awk -f
BEGIN {
    dow[1] = "Monday"
    dow[2] = "Tuesday"
    dow[3] = "Wednesday"
    dow[4] = "Thursday"
    dow[5] = "Friday"
    dow[6] = "Saturday"
    dow[7] = "Sunday"

    FS=OFS=","
}

NR == 1 {print $1, $2, $10, $13, $14, $15, $19, $20, $21}

NR != 1 {
    $2 = dow[$2]
    $21 *= 1.852
    gsub(/"/, "", $10)
    gsub(/"/, "", $13)
    gsub(/"/, "", $14)
    print $1, $2, $10, $13, $14, $15, $19, $20, $21
}

Salve isso em um arquivo, diga: sample.awk. Torne-o executável: chmod +x sample.awke execute como ./sample.awk data.

Para salvar a saída em outro arquivo, adicione o operador de redirecionamento de saída conforme a seguir:./sample.awk data > out.csv

informação relacionada