我想比較2個文件,locus_file.txt是一個非常大的文件,atrr.txt是一個小文件。我想將第一個檔案的前兩列與 atrr.txt 的第二列相匹配,並一起列印屬性。
locus_file.txt:大文件
LOC_Os02g47020, LOC_Os03g57840,0.88725114
LOC_Os02g47020, LOC_Os07g36080,0.94455624
LOC_Os02g47020, LOC_Os03g02590,0.81881344
attr.txt:屬性文件
blue LOC_Os02g47020
red LOC_Os02g40830
blue LOC_Os07g36080
yellow LOC_Os03g57840
red LOC_Os03g02590
期望的輸出:
LOC_Os02g47020, LOC_Os03g57840,0.88725114,blue, yellow
LOC_Os02g47020, LOC_Os07g36080,0.94455624,blue, blue
LOC_Os02g47020, LOC_Os03g02590,0.81881344,blue, red
請注意:例如,在所需輸出的第一行中,第 4 列的顏色為 atrr.txt 中的 LOC_Os02g47020,第 5 列的顏色為 atrr.txt 中的 LOC_Os03g57840
答案1
解決方案awk
:
$ awk '
FNR == NR {a[$2] = $1;next}
{
split($1,f1,",");
split($2,f2,",");
print $0,a[f1[1]],a[f2[1]];
}' OFS=, attr.txt locus_file.txt
LOC_Os02g47020, LOC_Os03g57840,0.88725114,blue,yellow
LOC_Os02g47020, LOC_Os07g36080,0.94455624,blue,blue
LOC_Os02g47020, LOC_Os03g02590,0.81881344,blue,red
答案2
這個任務聽起來像是 awk 的工作:
$ cat locus_file.txt
LOC_Os02g47020, LOC_Os03g57840,0.88725114
LOC_Os02g47020, LOC_Os07g36080,0.94455624
LOC_Os02g47020, LOC_Os03g02590,0.81881344
$ cat attr.txt
blue LOC_Os02g47020
red LOC_Os02g40830
blue LOC_Os07g36080
yellow LOC_Os03g57840
red LOC_Os03g02590
$ awk 'BEGIN { while(getline<"attr.txt">0) c[$2]=$1 ; FS=",[ ]*" ; OFS=", " } { print $1,$2,$3,c[$1],c[$2] }' locus_file.txt
LOC_Os02g47020, LOC_Os03g57840, 0.88725114, blue, yellow
LOC_Os02g47020, LOC_Os07g36080, 0.94455624, blue, blue
LOC_Os02g47020, LOC_Os03g02590, 0.81881344, blue, red
如果你想要“,”而不是“,”或不同的東西,只需更改OFS
:
$ awk 'BEGIN { while(getline<"attr.txt">0) c[$2]=$1 ; FS=",[ ]*" ; OFS="," } { print $1,$2,$3,c[$1],c[$2] }' locus_file.txt
LOC_Os02g47020,LOC_Os03g57840,0.88725114,blue,yellow
LOC_Os02g47020,LOC_Os07g36080,0.94455624,blue,blue
LOC_Os02g47020,LOC_Os03g02590,0.81881344,blue,red
答案3
怎麼樣
declare -A attr
while read x y; do attr[$y]="$x"; done < attr.txt
然後
while IFS=' ,' read a b c; do
d=${attr[$a]}
e=${attr[$b]}
printf "%s, %s,%s,%s, %s\n" "$a" "$b" "$c" "$d" "$e"
done < locus_file.txt