也許我運氣不好,因為我的雙引號逗號分隔的 CSV 檔案在有用文字中包含雙引號和逗號。
所以我想把這個:
"record 1","name 1","text 1, text 2"
"record 2","name ""2""","text 2"
"record 3","name 3",""
對此:
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""
name ""2""
請注意,我刪除了to中的雙引號name 2
,但保留了第 3 行中的雙引號:,""
答案1
使用csvformat
將分隔符號轉換為製表符 ( csvformat -T
),刪除所有雙引號 ( tr -d '"'
),然後在引用每個欄位(管道的最後一位)時將分隔符號傳回為逗號:
$ csvformat -T file.csv | tr -d '"' | csvformat -t -U1
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""
csvformat
是其一部分csvkit
。
答案2
無論您的輸入中包含哪些字符,這都將起作用(帶引號的字段中的換行符除外,但這是另一個問題)。
使用 GNU awk 進行 FPAT:
$ awk -v FPAT='("[^"]*")+' -v OFS='","' '{
for ( i=1; i<=NF; i++ ) {
gsub(/"/,"",$i)
}
print "\"" $0 "\""
}' file
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""
或任何 awk 的等效項:
$ awk -v OFS='","' '{
orig=$0; $0=""; i=0;
while ( match(orig,/("[^"]*")+/) ) {
$(++i) = substr(orig,RSTART,RLENGTH)
gsub(/"/,"",$i)
orig = substr(orig,RSTART+RLENGTH)
}
print "\"" $0 "\""
}' file
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""