Tenho um arquivo grande contendo logs semelhantes ao mostrado abaixo. Gostaria de encontrar todas as transações (TR#) que foram afetadas pelo erro. Preciso extrair uma ocorrência de cada TR# ID.
Como eu poderia fazer isso?
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Saída necessária:
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Responder1
Isso é muito simples de fazer em awk
:
$ awk 'c[$5]++==1' file
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Ou, em Perl:
$ perl -ane '$k{$F[4]}++==1 && print' file
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
O acima assume que o número antes de cada um TR#ID
faz parte do ID. Se os números podem mudar, mas você só precisa de um deles, use isto:
$ awk -F'[:.]' 'c[$7]++==1' file
ou
$ perl -F'[:.]' -ane '$k{$F[6]}++==1 && print' file
Responder2
Para obter e imprimir a primeira ocorrência de cada mensagem, tente
awk '! m[$5] {m[$5]=$0} END{for (e in m) print m[e]}'
Tornei os carimbos de data e hora do seu exemplo sequenciais para testá-lo (e também corrigi o valor do erro truncado final):
$ awk '! m[$5] {m[$5]=$0} END{for (e in m) print m[e]}' tr2.log
Apr 30 16:51:27.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:31.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Com agradecimentos a @terdon
Responder3
Aqui está um script perl que faz o que você deseja:
#!/usr/bin/perl
#Read each line
while ($line = <>) {
# Extract the transaction ID by looking for the text TR followed by digits
($trid) = $line =~ /.*(TR#\d+).*/ ;
# If we've not seen the ID before, print it out
unless ($trids{$trid}) {
print $line;
}
# Remember the ID so we don't print it out again
$trids{$trid} = 1;
}
Quando eu chamo usando sua entrada, é isso que recebo:
temeraire:ul jenny$ ./extract.pl in.txt
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Responder4
Através do GNU sed
, roubado deesseEntão responde,
sed '$!N; /^\(.*\)\n\1$/!P; D' file