Estou preso ao criar um script awk que prepara um arquivo csv antes da análise. Preciso criar um arquivo de saída com as colunas 1-2, 10, 13-15, 19-21. Também preciso substituir os números da coluna 2 pelos dias da semana (então, 1 = segunda-feira, 2 = terça-feira...) e converter a 21ª coluna de milhas náuticas para km; e exclusão ""
das colunas 10, 13 e 14.
Entrada:
"DAY_OF_MONTH","DAY_OF_WEEK","OP_UNIQUE_CARRIER","OP_CARRIER_AIRLINE_ID","OP_CARRIER","TAIL_NUM","OP_CARRIER_FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN_AIRPORT_SEQ_ID","ORIGIN","DEST_AIRPORT_ID","DEST_AIRPORT_SEQ_ID","DEST","DEP_TIME","DEP_DEL15","DEP_TIME_BLK","ARR_TIME","ARR_DEL15","CANCELLED","DIVERTED","DISTANCE",
1,2,"EV",20366,"EV","N48901","4397",13930,1393007,"ORD",11977,1197705,"GRB","1003",0.00,"1000-1059","1117",0.00,0.00,0.00,174.00,
1,2,"EV",20366,"EV","N16976","4401",15370,1537002,"TUL",13930,1393007,"ORD","1027",0.00,"1000-1059","1216",0.00,0.00,0.00,585.00,
1,2,"EV",20366,"EV","N12167","4404",11618,1161802,"EWR",15412,1541205,"TYS","1848",0.00,"1800-1859","2120",0.00,0.00,0.00,631.00,
Saída:
"DAY_OF_MONTH","DAY_OF_WEEK","ORIGIN","DEST","DEP_TIME","DEP_DEL15","CANCELLED","DIVERTED","DISTANCE"
1,Tuesday,ORD,GRB,1003,0.00,0.00,0.00,322.248
1,Tuesday,TUL,ORD,1027,0.00,0.00,0.00,1083.42
1,Tuesday,EWR,TYS,1848,0.00,0.00,0.00,1168.61
Até agora, recebi o comando para obter as colunas necessárias:
cut -d "," -f1-2,10,13-15,19-21 'Jan_2020_ontime.csv' > 'flights_jan_20.csv'
E também o código para substituir os números da coluna 2 pelos respectivos dias da semana:
awk 'BEGIN {FS = OFS = ","}
$2 == 1 {$2 = "Monday"}
$2 == 2 {$2 = "Tuesday"}
$2 == 3 {$2 = "Wednesday"}
$2 == 4 {$2 = "Thursday"}
$2 == 5 {$2 = "Friday"}
$2 == 6 {$2 = "Saturday"}
$2 == 7 {$2 = "Sunday"}
{print}'
file.csv
Também estou sentindo falta de uma maneira de agrupar todo o código no script para executá-lo mais tarde.
Responder1
awk '
BEGIN {
split("Monday Tuesday Wednesday Thursday Friday Saturday Sunday",days)
FS=OFS=","
}
NR > 1 {
gsub(/"/,"")
$2 = days[$2]
$21 *= 1.852
}
{ print $1, $2, $10, $13, $14, $15, $19, $20, $21 }
' file
"DAY_OF_MONTH","DAY_OF_WEEK","ORIGIN","DEST","DEP_TIME","DEP_DEL15","CANCELLED","DIVERTED","DISTANCE"
1,Tuesday,ORD,GRB,1003,0.00,0.00,0.00,322.248
1,Tuesday,TUL,ORD,1027,0.00,0.00,0.00,1083.42
1,Tuesday,EWR,TYS,1848,0.00,0.00,0.00,1168.61
Responder2
#!/bin/awk -f
BEGIN {
dow[1] = "Monday"
dow[2] = "Tuesday"
dow[3] = "Wednesday"
dow[4] = "Thursday"
dow[5] = "Friday"
dow[6] = "Saturday"
dow[7] = "Sunday"
FS=OFS=","
}
NR == 1 {print $1, $2, $10, $13, $14, $15, $19, $20, $21}
NR != 1 {
$2 = dow[$2]
$21 *= 1.852
gsub(/"/, "", $10)
gsub(/"/, "", $13)
gsub(/"/, "", $14)
print $1, $2, $10, $13, $14, $15, $19, $20, $21
}
Salve isso em um arquivo, diga: sample.awk
. Torne-o executável: chmod +x sample.awk
e execute como ./sample.awk data
.
Para salvar a saída em outro arquivo, adicione o operador de redirecionamento de saída conforme a seguir:./sample.awk data > out.csv