
請協助編寫以下 shell 腳本。我需要計算樣本(col2)中每個泳道(col1)中一致變數的數量。例如,由於所有三個樣本中lane1變數1的所有值(col4)都是樣本,因此將variable1計入一致變數。同樣,泳道 2 的變數 2 和 3 也不一致。
lane1 sample1 variable1 ab
lane1 sample2 variable1 ab
lane1 sample3 variable1 ab
lane1 sample1 variable2 cd
lane1 sample2 variable2 cd
lane1 sample3 variable2 cd
lane1 sample1 variable3 gh
lane1 sample2 variable3 ab
lane1 sample3 variable3 gh
lane2 sample1 variable1 ac
lane2 sample2 variable1 ac
lane2 sample3 variable1 ac
lane2 sample1 variable2 gt
lane2 sample2 variable2 gt
lane2 sample3 variable2 ac
lane2 sample1 variable3 ga
lane2 sample2 variable3 ga
lane2 sample3 variable3 ac
輸出
所有三個樣本中一致和不一致變數的數量
#Consistent #Inconsistent
lane1 2 1
lane2 1 2
答案1
Perl解決方案:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my %values;
while (<>) {
next if /^$/; # Skip empty lines
my ($lane, $sample, $var, $val) = split;
die "Duplicate $lane $sample $var\n" if $values{$lane}{$var}{$val}{$sample};
$values{$lane}{$var}{$val}{$sample} = 1;
}
my %results;
for my $lane (keys %values) {
for my $var (keys %{ $values{$lane} }) {
my $count = keys %{ $values{$lane}{$var} };
if (1 == $count) {
++$results{$lane}{consistent};
} else {
++$results{$lane}{inconsistent};
}
}
say join "\t", $lane, @{ $results{$lane} }{qw{ consistent inconsistent }};
}