
我需要對 LDIF 檔案進行排序,其中有幾行屬於父檔。
例子
dn: 2
attr1: b
attr2: a
attr1: a
attr1: c
dn: 3
attr2: a
attr1: c
attr1: b
attr1: a
dn: 1
attr1: a
attr1: c
attr1: b
attr2: a
到這個
dn: 1
attr1: a
attr1: b
attr1: c
attr2: a
dn: 2
attr1: a
attr1: b
attr1: c
attr2: a
dn: 3
attr1: a
attr1: b
attr1: c
attr2: a
因此,所有以 dn 開頭的父行都被排序,下面所有 attrx 都被排序,如果 attrx 有多個值,那麼也會被排序。我已經使用讀取行完成了此操作,但這在大檔案上需要幾個小時。有沒有更快的方法可以對 bash 指令執行相同的操作?
屬性值始終只佔一行。如果有多個值,每個值佔一行。
答案1
使用您的範例文件
awk 'BEGIN {RS="\n\n\n";FS="\n\n";OFS="*";ORS=""} {print $1,$2,$3,$4,$5}' file |awk -F'*' '{for(i=1; i<=NF; i++) c[i]=$i; n=asort(c); for (i=1; i<=n; i++) printf "%s%s*", c[i], (i<n?OFS:RS); delete c}' |sed 's/^*//' |awk -F'*' '{print $5"*"$1"*"$2"*"$3"*"$4}' |sort |awk -F'*' 'BEGIN{OFS="\n\n";ORS="\n\n\n"} {print $1,$2,$3,$4,$5;}'
將每個文字區塊轉換為行並使用“*”分隔字段
awk 'BEGIN {RS="\n\n\n";FS="\n\n";OFS="*";ORS=""} {print $1,$2,$3,$4,$5}' file dn: 2*attr1: b*attr2: a*attr1: a*attr1: c dn: 3*attr2: a*attr1: c*attr1: b*attr1: a dn: 1*attr1: a*attr1: c*attr1: b*attr2: a
對行內的字段進行排序並使用“*”分隔字段
awk 'BEGIN {RS="\n\n\n";FS="\n\n";OFS="*";ORS=""} {print $1,$2,$3,$4,$5}' file |awk -F'*' '{for(i=1; i<=NF; i++) c[i]=$i; n=asort(c); for (i=1; i<=n; i++) printf "%s%s*", c[i], (i<n?OFS:RS); delete c}' |sed 's/^*//'
attr1: a *attr1: b *attr1: c *attr2: a *dn: 2 attr1: a *attr1: b *attr1: c *attr2: a *dn: 3 attr1: a *attr1: b *attr1: c *attr2: a *dn: 1
重新排列行中的欄位以將「print dn: x」放在第一位
awk 'BEGIN {RS="\n\n\n";FS="\n\n";OFS="*";ORS=""} {print $1,$2,$3,$4,$5}' file |awk -F'*' '{for(i=1; i<=NF; i++) c[i]=$i; n=asort(c); for (i=1; i<=n; i++) printf "%s%s*", c[i], (i<n?OFS:RS); delete c}' |sed 's/^*//' |awk -F'*' '{print $5"*"$1"*"$2"*"$3"*"$4}'
dn: 2*attr1: a *attr1: b *attr1: c *attr2: a dn: 3*attr1: a *attr1: b *attr1: c *attr2: a dn: 1*attr1: a *attr1: b *attr1: c *attr2: a
按第一列或欄位對行進行排序
awk 'BEGIN {RS="\n\n\n";FS="\n\n";OFS="*";ORS=""} {print $1,$2,$3,$4,$5}' file |awk -F'*' '{for(i=1; i<=NF; i++) c[i]=$i; n=asort(c); for (i=1; i<=n; i++) printf "%s%s*", c[i], (i<n?OFS:RS); delete c}' |sed 's/^*//' |awk -F'*' '{print $5"*"$1"*"$2"*"$3"*"$4}' |sort
dn: 1*attr1: a *attr1: b *attr1: c *attr2: a dn: 2*attr1: a *attr1: b *attr1: c *attr2: a dn: 3*attr1: a *attr1: b *attr1: c *attr2: a
將行轉換為一列並插入空白行
awk 'BEGIN {RS="\n\n\n";FS="\n\n";OFS="*";ORS=""} {print $1,$2,$3,$4,$5}' file |awk -F'*' '{for(i=1; i<=NF; i++) c[i]=$i; n=asort(c); for (i=1; i<=n; i++) printf "%s%s*", c[i], (i<n?OFS:RS); delete c}' |sed 's/^*//' |awk -F'*' '{print $5"*"$1"*"$2"*"$3"*"$4}' |sort |awk -F'*' 'BEGIN{OFS="\n\n";ORS="\n\n\n"} {print $1,$2,$3,$4,$5;}'
dn: 1
attr1: a
attr1: b
attr1: c
attr2: a
dn: 2
attr1: a
attr1: b
attr1: c
attr2: a
dn: 3
attr1: a
attr1: b
attr1: c
attr2: a
我知道我使用了太多步驟。