텍스트 파일 테이블 구문 분석 및 정보 집계

Question 1

질문 내용이 있다고 perl가정하는 한 가지 방법은 다음과 같습니다 (ID를 저장하기 위해 해시를 사용하기 때문에 ID가 출력에서 반드시 동일한 순서일 필요는 없습니다).infile

의 내용 script.pl:

use strict;
use warnings;

my (%data);

while ( <> ) { 

    ## Omit header.
    next if $. == 1;

    ## Remove last '\n'.
    chomp;

    ## Split line in spaces.
    my @f = split;

    ## If this ID exists, get previously values and add values of this
    ## line to them. Otherwise, begin to count now.
    my @counts = exists $data{ $f[0] } ? @{ $data{ $f[0] } } : (); 
    $counts[0]++;
    $counts[1]++ if substr( $f[5], 0, 4 ) eq q|Pass|;
    $counts[2] += $f[7];
    $counts[3] += $f[8];
    splice @{ $data{ $f[0] } }, 0, @{ $data{ $f[0] } }, @counts; 
}

## Format output.
my $print_format = qq|%-15s %-10s %-12s %-10s %-10s\n|;

## Print header.
printf $print_format, qw|Id CountId CountPass CountHe CountHo|;

## For every ID saved in the hash print acumulated values.
for my $id ( keys %data ) { 
    printf $print_format, $id, @{ $data{ $id } };
}

다음과 같이 실행하세요:

perl script.pl infile

다음 출력으로:

Id              CountId    CountPass    CountHe    CountHo   
cm|371443198    1          1            1          0         
cm|371443199    3          3            2          1         
cm|367079424    2          2            0          2

Answer

질문 내용이 있다고 perl가정하는 한 가지 방법은 다음과 같습니다 (ID를 저장하기 위해 해시를 사용하기 때문에 ID가 출력에서 반드시 동일한 순서일 필요는 없습니다).infile

의 내용 script.pl:

use strict;
use warnings;

my (%data);

while ( <> ) { 

    ## Omit header.
    next if $. == 1;

    ## Remove last '\n'.
    chomp;

    ## Split line in spaces.
    my @f = split;

    ## If this ID exists, get previously values and add values of this
    ## line to them. Otherwise, begin to count now.
    my @counts = exists $data{ $f[0] } ? @{ $data{ $f[0] } } : (); 
    $counts[0]++;
    $counts[1]++ if substr( $f[5], 0, 4 ) eq q|Pass|;
    $counts[2] += $f[7];
    $counts[3] += $f[8];
    splice @{ $data{ $f[0] } }, 0, @{ $data{ $f[0] } }, @counts; 
}

## Format output.
my $print_format = qq|%-15s %-10s %-12s %-10s %-10s\n|;

## Print header.
printf $print_format, qw|Id CountId CountPass CountHe CountHo|;

## For every ID saved in the hash print acumulated values.
for my $id ( keys %data ) { 
    printf $print_format, $id, @{ $data{ $id } };
}

다음과 같이 실행하세요:

perl script.pl infile

다음 출력으로:

Id              CountId    CountPass    CountHe    CountHo   
cm|371443198    1          1            1          0         
cm|371443199    3          3            2          1         
cm|367079424    2          2            0          2

Question 2

awk다음은 4개의 배열을 사용하여 필요한 4가지 정보를 계산하는 솔루션입니다 . 그런 다음 출력이 awk입력되어 column열이 잘 정렬됩니다. (이는 를 awk사용 하여 수행할 수도 있습니다 printf.)

awk 'NR>1 {
    id[$1]++
    if($6 ~ /Pass/) pass[$1]++
    if($8 ~ /1/) he[$1]++
    if($9 ~ /1/) ho[$1]++
} 
END {
   print "Id CountId Countpass CountHe CountHO"
   for(i in id)
      print i" "id[i]" "(pass[i]?pass[i]:0)" "(he[i]?he[i]:0)" "(ho[i]?ho[i]:0)
}' input.txt | column -t

산출:

Id            CountId  Countpass  CountHe  CountHO
cm|371443198  1        1          1        0
cm|371443199  3        3          2        1
cm|367079424  2        2          0        2

Answer

awk다음은 4개의 배열을 사용하여 필요한 4가지 정보를 계산하는 솔루션입니다 . 그런 다음 출력이 awk입력되어 column열이 잘 정렬됩니다. (이는 를 awk사용 하여 수행할 수도 있습니다 printf.)

awk 'NR>1 {
    id[$1]++
    if($6 ~ /Pass/) pass[$1]++
    if($8 ~ /1/) he[$1]++
    if($9 ~ /1/) ho[$1]++
} 
END {
   print "Id CountId Countpass CountHe CountHO"
   for(i in id)
      print i" "id[i]" "(pass[i]?pass[i]:0)" "(he[i]?he[i]:0)" "(ho[i]?ho[i]:0)
}' input.txt | column -t

산출:

Id            CountId  Countpass  CountHe  CountHO
cm|371443198  1        1          1        0
cm|371443199  3        3          2        1
cm|367079424  2        2          0        2

텍스트 파일 테이블 구문 분석 및 정보 집계

답변1

답변2

관련 정보