CSV ファイルを使用する送信先があり、6 番目のフィールドには単語が含まれていますが、最大文字長は 16 です。フィールドの長さが 16 文字を超える場合は、行を複製して、単語を分割せずに分割したいと思います。
現行ファイル
"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK LMNOP Q RS TUV W XYZ 12 3456 7890"
"9","8","7","6","5","LMN O PQ R"
望ましい出力
"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK"
"1","2","3","4","5","LMNOP Q RS TUV W"
"1","2","3","4","5","XYZ 12 3456 7890"
"9","8","7","6","5","LMN O PQ R"
答え1
GNU Awk ( gawk
)fold
を使ってGetline/変数/コプロセス
gawk -F, '
BEGIN{
OFS=FS;
cmd="fold -sw 16";
}
# if total length (16 + 2 for quotes) is within limit, print as-is
length($NF) <= 18 {print; next}
# else
{
# trim the quotes, then fold
print substr($NF,2,length($NF)-2) |& cmd;
close(cmd,"to");
NF--;
while((cmd |& getline var) > 0){
# (optional) trim trailing whitespace
sub(/[ \t]+$/,"",var);
print $0, "\"" var "\"" ;
}
close(cmd,"from");
}
' file.csv
sub
操作から末尾の空白を削除しますfold
。
表示されている正確な出力を得るには、 を使用し、fold -sw17
16 文字と (その後削除される) 末尾のスペースで改行する必要があることに注意してください。ただし、これを行うと、折り返された出力の最後の行で 16 文字の制限を超える可能性があります。
答え2
私は二重引用符を保持する、かなり不完全な awk スクリプトを作成しました。これがそれです:
{
for ( i=0; i<= length($6); i+=16 )
{
if ( i+17 < length($6) )
{
if ( i == 0 )
printf ("%s,%s,%s,%s,%s,%s\"\n", $1, $2, $3, $4, $5, substr($6,i,16))
else
printf ("%s,%s,%s,%s,%s,\"%s\"\n", $1, $2, $3, $4, $5, substr($6,i+1,16))
}
else
{
if ( i == 0 )
printf ("%s,%s,%s,%s,%s,%s\n", $1, $2, $3, $4, $5, substr($6,i,16))
else
printf ("%s,%s,%s,%s,%s,\"%s\n", $1, $2, $3, $4, $5, substr($6,i+1,16))
}
}
}
出力は次のようになります。
$ awk -F, -f awks csvfields
"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK"
"1","2","3","4","5"," LMNOP Q RS TUV "
"1","2","3","4","5","W XYZ 12 3456 78"
"1","2","3","4","5","90"
"9","8","7","6","5","LMN O PQ R"
$
唯一の問題は、境界にスペースがある場合、それが削除された例とは異なり、それが保持されることです。
答え3
以下のコードを試してみましたが、問題なく動作しました
k=16;for ((j=1;j<=50;j++)); do awk -v j="$j" -v k="$k" -F "," '{if(length($NF) > 16){print $1,$2,$3,$4,$5,substr($NF,j,k)}else {print $0}}' filename; j=$(($j+16)); done|sort | uniq
出力
"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK"
"1","2","3","4","5","LMNOP Q RS TUV W"
"1","2","3","4","5","XYZ 12 3456 7890"
"9","8","7","6","5","LMN O PQ R"
答え4
SHELL のみのアプローチ (Bash および Ksh93 でテスト済み)。ただし、fold
既存のツールを使用するため、このアプローチは気に入っています。
# read from stdin, output to stdout
# Note no Shebang line at top so it made it easier for to try bash/ksh as interpreters
OIFS="$IFS"
IFS=,
while read f1 f2 f3 f4 f5 f6; do
f6=${f6#\"}
f6=${f6%\"} # strip DQs
if ((${#f6}<17)); then # no action
IFS="$OIFS"
echo "$f1,$f2,$f3,$f4,$f5,\"$f6\""
IFS=","
continue
else
IFS="$OIFS"
while ((${#f6}>17)); do
n6=${f6:0:16}
f6=${f6#$n6}
n6=${n6# }
n6=${n6% }
echo "$f1,$f2,$f3,$f4,$f5,\"$n6\""
done
echo "$f1,$f2,$f3,$f4,$f5,\"${f6# }\""
fi
IFS=","
done
IFS="$OIFS"
exit
結果:
"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK"
"1","2","3","4","5","LMNOP Q RS TUV W"
"1","2","3","4","5","XYZ 12 3456 7890"
"9","8","7","6","5","LMN O PQ R"
または同様のものを使用せずに単語の区切りの問題を解決するにはfold
、上記のコメントアウトされた行を次のコードに置き換えます。また、2 番目のecho
コマンド ラインを次のように置き換えます。
c6="$f6"
n6=""
while (((${#n6}+${#nw})<=16)); do
n6=$n6${c6%% *}\
n6=${n6# }
eval c6=\${c6\#${c6%% *} }
nw=${c6%% *}
done
#n6=${f6:0:16} ### replace by above
置き換えて
echo "$f1,$f2,$f3,$f4,$f5,\"${f6# }\""
と
((${#f6}>0)) && echo "$f1,$f2,$f3,$f4,$f5,\"${f6# }\""
フィールド 6 に null の残りが発生しないようにするためです。
次のテスト ファイルが使用されました。
"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK LMNOP Q RS TUV W XYZ 12 3456 7890"
"9","8","7","6","5","LMN O PQ R"
"1","2","3","4","5","A BB CCC DDD EEEE FFFFF GGGGGG HHHHHHH"
結果:
"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK"
"1","2","3","4","5","LMNOP Q RS TUV W"
"1","2","3","4","5","XYZ 12 3456 7890"
"9","8","7","6","5","LMN O PQ R"
"1","2","3","4","5","A BB CCC DDD"
"1","2","3","4","5","EEEE FFFFF"
"1","2","3","4","5","GGGGGG HHHHHHH"
ただし、のような既存のツールを使用する方がfold
はるかに簡単で、既存のシンプルなツールを基に構築するという UNIX 哲学に従います。ただし、シェル プログラミングが好きな場合は、上記がソリューションを得るための 1 つの方法です。コードの説明が必要な場合は、私に連絡してください。