在 Linux 上我有檔案orig-file.txt
.該文件現在包含 4 個字段,但它們可能會更少或更多(該文件是由其他應用程式生成的)。
orig-file.txt
將其轉換為類似文件的最佳選項是什麼output-file.txt
(可以使用 shell 腳本或 awk 等)
原始文件.txt
CREATE_TIMESTAMP TELEPHONE_NUMBER ID TYPE
------------------- -------------------- ---------- -----------------
24-09-2009 16:17:45 33633333333 20 other_mmm_phone
24-09-2009 17:45:07 33644444444 20 other_mmm_phone
07-10-2009 10:45:49 12312312312 20 legacyphone
07-10-2009 11:46:38 59320000043 20 other_mmm_phone
輸出檔.txt
CREATE_TIMESTAMP -> 24-09-2009 16:17:45
TELEPHONE_NUMBER -> 33633333333
ID -> 20
TYPE -> other_mmm_phone
---
CREATE_TIMESTAMP -> 24-09-2009 16:17:45
TELEPHONE_NUMBER -> 33633333333
ID -> 20
TYPE -> other_mmm_phone
---
來自 awk lang 的範例 -( 但它不起作用 -:(
# awk 'NR>2 {
> printf "\
> %-16s -> %s\n\
> %-16s -> %s\n\
> %-16s -> %s\n\
> %-16s -> %s\
> \n\n\n---\n\n\n",\
> "CREATE_TIMESTAMP", $1" "$2,\
> "TELEPHONE_NUMBER", $3,\
> "ID", $4,\
> "TYPE", $5}\
> ' orig-file.txt
awk: newline in string near line 2
awk: syntax error near line 3
awk: illegal statement near line 3
awk: newline in string near line 7
答案1
這是一些簡單的 ksh:
{
read t1 t2 t3 t4
maxlen=$(printf "%s\n" ${#t1} ${#t2} ${#t3} ${#t4} | sort -n | tail -1)
fmt=$(printf "%%-%ds -> %%s" $maxlen)
read line
while read date time tel id type; do
printf "$fmt\n" $t1 "$date $time" $t2 $tel $t3 $id $t4 $type
print "\n\n\n---\n\n"
done
} < orig-file.txt
更新對於靈活數量的欄位:
我替換了日期時間字段中的空格以使事情更容易解析
sed '3,$s/ /@@/' orig-file.txt |
{
read line
set -A headings $line
max=0
for head in "${headings[@]}"; do (( max < ${#head} )) && max=${#head}; done
fmt=$(printf "%%-%ds -> %%s" $max)
read line
while read line; do
set -A fields $line
i=0
while (( i < ${#headings[@]} )); do
printf "$fmt\n" ${headings[$i]} ${fields[$i]} | sed 's/@@/ /'
(( i=i+1 ))
done
print "\n\n\n---\n\n"
done
}
答案2
在這種情況下,這可以完成工作。如果添加更多字段,則需要進行簡單的修改。
awk 'NR>2{
printf "\
%-16s -> %s\n\
%-16s -> %s\n\
%-16s -> %s\n\
%-16s -> %s\
\n\n\n---\n\n\n",\
"CREATE_TIMESTAMP", $1" "$2,\
"TELEPHONE_NUMBER", $3,\
"ID", $4,\
"TYPE", $5}\
' orig-file.txt > output-file.txt
「CREATE_TIMESTAMP」需要兩者$1
,$2
因為日期本身是空格分隔的。
可以修改它以從標題中讀取欄位名稱,但存在日期以空格分隔的問題。如果其他欄位也允許包含空格,則始終需要手動修改來補償,例如在這種$1" "$2
情況下。