匹配並列印兩個文件中的多列

匹配並列印兩個文件中的多列

我有兩個文件,我需要做的是根據兩個文件中的 column1 查找公共行,如果匹配,則寫入新文件 col1(兩個文件之間公共) file1col2 和 file2col2

文件1:

col1                         file1col2
10:100000525-100001560(+)    0.971465226620556
10:100001724-100002618(+)    0.940918504451204
10:100002725-100002970(+)    0.946592696189412
10:100003104-100004184(+)    0.736305487299153
10:100004450-100005051(+)    0.70823022283736
10:100005158-100005876(+)    0.969728923411704
10:100006075-100007551(+)    0.855411430976336
10:100007764-100009009(+)    0.274219271261146
10:100009146-100011362(+)    0.927057564779308
10:100011583-100011887(+)    0.883431738847249

文件2

col1                         file2col2
10:100000525-100001560(+)    0.943385996874889
10:100001724-100002618(+)    0.981929023174133
10:100002725-100002970(+)    0.955549170283206
10:100003104-100004184(+)    0.736440826679551
10:100004450-100005051(+)    0.689045711238636
10:100005158-100005876(+)    0.964995337925152
10:100006075-100007551(+)    0.873411848029685
10:100007764-100009009(+)    0.37719743446494
10:100009146-100011362(+)    0.943862343124518
10:100011583-100011887(+)    0.902915705720447

期望的輸出

col1(common between two files)  file1col2   file2col2
10:100000525-100001560(+)   0.971465227 0.943385997
10:100001724-100002618(+)   0.940918504 0.981929023
10:100002725-100002970(+)   0.946592696 0.95554917
10:100003104-100004184(+)   0.736305487 0.736440827
10:100004450-100005051(+)   0.708230223 0.689045711
10:100005158-100005876(+)   0.969728923 0.964995338
10:100006075-100007551(+)   0.855411431 0.873411848
10:100007764-100009009(+)   0.274219271 0.377197434
10:100009146-100011362(+)   0.927057565 0.943862343
10:100011583-100011887(+)   0.883431739 0.902915706

答案1

加入+awk解決方案:

join --header file1 file2 | awk 'NR>1{ $2=sprintf("%1.9f",$2); $3=sprintf("%.9f",$3) }1' > result.txt

cat result.txt
col1 file1col2 file2col2
10:100000525-100001560(+) 0.971465227 0.943385997
10:100001724-100002618(+) 0.940918504 0.981929023
10:100002725-100002970(+) 0.946592696 0.955549170
10:100003104-100004184(+) 0.736305487 0.736440827
10:100004450-100005051(+) 0.708230223 0.689045711
10:100005158-100005876(+) 0.969728923 0.964995338
10:100006075-100007551(+) 0.855411431 0.873411848
10:100007764-100009009(+) 0.274219271 0.377197434
10:100009146-100011362(+) 0.927057565 0.943862343
10:100011583-100011887(+) 0.883431739 0.902915706

細節

  • 加入 --header選項 - 將每個文件中的第一行視為欄位標題,列印它們而不嘗試將它們配對

  • NR>1- 從第二筆記錄開始處理(NR- 目前記錄的編號),即 - 跳過標頭

  • sprintf("%1.9f",$2)- 將參數$2(第二列)格式化為小數點後 9 位元的浮點數

相關內容