我有一個使用 CSV 文件的目標,第 6 個字段包含單詞,但最大字符長度為 16。
目前文件
"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK LMNOP Q RS TUV W XYZ 12 3456 7890"
"9","8","7","6","5","LMN O PQ R"
所需輸出
"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK"
"1","2","3","4","5","LMNOP Q RS TUV W"
"1","2","3","4","5","XYZ 12 3456 7890"
"9","8","7","6","5","LMN O PQ R"
答案1
使用GNU Awk ( gawk
) 運行fold
取得行/變數/協進程
gawk -F, '
BEGIN{
OFS=FS;
cmd="fold -sw 16";
}
# if total length (16 + 2 for quotes) is within limit, print as-is
length($NF) <= 18 {print; next}
# else
{
# trim the quotes, then fold
print substr($NF,2,length($NF)-2) |& cmd;
close(cmd,"to");
NF--;
while((cmd |& getline var) > 0){
# (optional) trim trailing whitespace
sub(/[ \t]+$/,"",var);
print $0, "\"" var "\"" ;
}
close(cmd,"from");
}
' file.csv
從操作中刪除sub
尾隨空格fold
。
請注意,要獲得顯示的精確輸出,需要fold -sw17
在 16 個字元處加上(隨後刪除的)尾隨空格進行換行。但是,這樣做可能會導致折疊輸出的最後一行超過 16 個字元的限制。
答案2
我創建了一個相當蹩腳的 awk 腳本,它保留了雙引號。它來了:
{
for ( i=0; i<= length($6); i+=16 )
{
if ( i+17 < length($6) )
{
if ( i == 0 )
printf ("%s,%s,%s,%s,%s,%s\"\n", $1, $2, $3, $4, $5, substr($6,i,16))
else
printf ("%s,%s,%s,%s,%s,\"%s\"\n", $1, $2, $3, $4, $5, substr($6,i+1,16))
}
else
{
if ( i == 0 )
printf ("%s,%s,%s,%s,%s,%s\n", $1, $2, $3, $4, $5, substr($6,i,16))
else
printf ("%s,%s,%s,%s,%s,\"%s\n", $1, $2, $3, $4, $5, substr($6,i+1,16))
}
}
}
輸出是:
$ awk -F, -f awks csvfields
"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK"
"1","2","3","4","5"," LMNOP Q RS TUV "
"1","2","3","4","5","W XYZ 12 3456 78"
"1","2","3","4","5","90"
"9","8","7","6","5","LMN O PQ R"
$
唯一的問題是,如果邊界處有空格,它會被保留,與已刪除的範例不同。
答案3
嘗試使用下面的程式碼,效果也很好
k=16;for ((j=1;j<=50;j++)); do awk -v j="$j" -v k="$k" -F "," '{if(length($NF) > 16){print $1,$2,$3,$4,$5,substr($NF,j,k)}else {print $0}}' filename; j=$(($j+16)); done|sort | uniq
輸出
"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK"
"1","2","3","4","5","LMNOP Q RS TUV W"
"1","2","3","4","5","XYZ 12 3456 7890"
"9","8","7","6","5","LMN O PQ R"
答案4
僅 SHELL 方法(在 Bash 和 Ksh93 上測試)。不過,我確實喜歡這種fold
方法,因為它使用現有的工具。
# read from stdin, output to stdout
# Note no Shebang line at top so it made it easier for to try bash/ksh as interpreters
OIFS="$IFS"
IFS=,
while read f1 f2 f3 f4 f5 f6; do
f6=${f6#\"}
f6=${f6%\"} # strip DQs
if ((${#f6}<17)); then # no action
IFS="$OIFS"
echo "$f1,$f2,$f3,$f4,$f5,\"$f6\""
IFS=","
continue
else
IFS="$OIFS"
while ((${#f6}>17)); do
n6=${f6:0:16}
f6=${f6#$n6}
n6=${n6# }
n6=${n6% }
echo "$f1,$f2,$f3,$f4,$f5,\"$n6\""
done
echo "$f1,$f2,$f3,$f4,$f5,\"${f6# }\""
fi
IFS=","
done
IFS="$OIFS"
exit
結果:
"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK"
"1","2","3","4","5","LMNOP Q RS TUV W"
"1","2","3","4","5","XYZ 12 3456 7890"
"9","8","7","6","5","LMN O PQ R"
若要在不使用 using 或類似的情況下解決分詞問題fold
,以下程式碼應取代上面顯示的註解掉的行。也將第二個echo
命令列替換為:
c6="$f6"
n6=""
while (((${#n6}+${#nw})<=16)); do
n6=$n6${c6%% *}\
n6=${n6# }
eval c6=\${c6\#${c6%% *} }
nw=${c6%% *}
done
#n6=${f6:0:16} ### replace by above
並替換
echo "$f1,$f2,$f3,$f4,$f5,\"${f6# }\""
和
((${#f6}>0)) && echo "$f1,$f2,$f3,$f4,$f5,\"${f6# }\""
以避免出現任何空白欄位 6 餘數。
使用以下測試檔案:
"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK LMNOP Q RS TUV W XYZ 12 3456 7890"
"9","8","7","6","5","LMN O PQ R"
"1","2","3","4","5","A BB CCC DDD EEEE FFFFF GGGGGG HHHHHHH"
結果:
"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK"
"1","2","3","4","5","LMNOP Q RS TUV W"
"1","2","3","4","5","XYZ 12 3456 7890"
"9","8","7","6","5","LMN O PQ R"
"1","2","3","4","5","A BB CCC DDD"
"1","2","3","4","5","EEEE FFFFF"
"1","2","3","4","5","GGGGGG HHHHHHH"
然而,現有工具的使用fold
要容易得多,並且遵循 UNIX 哲學——構建在現有的簡單工具之上。但如果您喜歡 Shell 編程,那麼上述是獲得解決方案的一種方法。如果有人需要代碼的解釋,請與我聯絡。