刪除 CSV 中以逗號分隔並用雙引號括起來的間隔雙引號

刪除 CSV 中以逗號分隔並用雙引號括起來的間隔雙引號

也許我運氣不好,因為我的雙引號逗號分隔的 CSV 檔案在有用文字中包含雙引號和逗號。

所以我想把這個:

"record 1","name 1","text 1, text 2"
"record 2","name ""2""","text 2"
"record 3","name 3",""

對此:

"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""

name ""2""請注意,我刪除了to中的雙引號name 2,但保留了第 3 行中的雙引號:,""

答案1

使用csvformat將分隔符號轉換為製表符 ( csvformat -T),刪除所有雙引號 ( tr -d '"'),然後在引用每個欄位(管道的最後一位)時將分隔符號傳回為逗號:

$ csvformat -T file.csv | tr -d '"' | csvformat -t -U1
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""

csvformat是其一部分csvkit

答案2

無論您的輸入中包含哪些字符,這都將起作用(帶引號的字段中的換行符除外,但這是另一個問題)。

使用 GNU awk 進行 FPAT:

$ awk -v FPAT='("[^"]*")+' -v OFS='","' '{
    for ( i=1; i<=NF; i++ ) {
        gsub(/"/,"",$i)
    }
    print "\"" $0 "\""
}' file
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""

或任何 awk 的等效項:

$ awk -v OFS='","' '{
    orig=$0; $0=""; i=0;
    while ( match(orig,/("[^"]*")+/) ) {
        $(++i) = substr(orig,RSTART,RLENGTH)
        gsub(/"/,"",$i)
        orig = substr(orig,RSTART+RLENGTH)
    }
    print "\"" $0 "\""
}' file
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""

也可以看看使用 awk 高效解析 csv 的最穩健方法是什麼

相關內容