從文件中提取模式的一次出現

從文件中提取模式的一次出現

我有一個大文件,其中包含類似於下圖所示的日誌。我想找到受該錯誤影響的所有交易(TR#)。我需要提取每個 TR# ID 的一次出現。

我該怎麼辦呢?

    Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
    Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
    Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
    Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
    Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
    Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
    Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
    Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
    Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
    Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704

所需輸出:

    Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
    Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704

答案1

這很簡單awk

$ awk 'c[$5]++==1' file 
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704

或者,在 Perl 中:

$ perl -ane '$k{$F[4]}++==1 && print' file 
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704

上面假設每個前面的數字TR#ID是 ID 的一部分。如果數字可以更改但您只需要其中之一,請改用:

$ awk -F'[:.]' 'c[$7]++==1' file 

或者

$ perl -F'[:.]' -ane '$k{$F[6]}++==1 && print' file 

答案2

要獲取並列印每個訊息的第一次出現,請嘗試

awk '! m[$5] {m[$5]=$0} END{for (e in m) print m[e]}'

我將範例中的時間戳記設為連續的以便對其進行測試(並且還更正了最終截斷的錯誤值):

$ awk '! m[$5] {m[$5]=$0} END{for (e in m) print m[e]}' tr2.log
Apr 30 16:51:27.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:31.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704

感謝@terdon

答案3

這是一個可以完成您想要的操作的 perl 腳本:

#!/usr/bin/perl

#Read each line
while ($line = <>) {
  # Extract the transaction ID by looking for the text TR followed by digits
  ($trid) = $line =~ /.*(TR#\d+).*/ ;
  # If we've not seen the ID before, print it out
  unless ($trids{$trid}) {
    print $line;
  }
  # Remember the ID so we don't print it out again
  $trids{$trid} = 1;
}

當我使用您的輸入調用它時,這就是我得到的:

temeraire:ul jenny$ ./extract.pl in.txt 
    Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
    Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704

答案4

透過 GNU sed,從所以回答,

sed '$!N; /^\(.*\)\n\1$/!P; D' file

相關內容