gato input1.txt
##gff-version 2
##source-version geneious 5.6.4
Xm_ABL1 Geneious CDS 1 168 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 169 334 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 335 628 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 629 901 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 902 985 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 986 1165 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 1166 1350 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 1351 1504 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious BLAST Hit 169 334 . + .
Xm_ABL1 Geneious extracted region 1 168 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="351297 -> 351464"
Xm_ABL1 Geneious extracted region 169 334 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="371785 -> 371950"
Xm_ABL1 Geneious extracted region 335 628 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="372554 -> 372847"
Xm_ABL1 Geneious extracted region 629 901 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="374760 -> 375032"
Xm_ABL1 Geneious extracted region 902 985 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="375230 -> 375313"
Xm_ABL1 Geneious extracted region 986 1165 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="375992 -> 376171"
Xm_ABL1 Geneious extracted region 1166 1350 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="376575 -> 376759"
Xm_ABL1 Geneious extracted region 1351 1504 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="376914 -> 377067"
Verifique input1.txt, a coluna CDS e a região extraída têm o mesmo número de linhas. Se forem iguais, obtenha valores da coluna $ 14 da região extraída (351297, 351464,371785,371950) e substitua esses valores nas linhas CDS $ 4 e $ 5 (por exemplo, na 1ª linha 1 do CDS por 351297, 168 por 351464, 169 por 371785 , 334 com 371950 e assim por diante). Imprima as únicas linhas CDS substituídas da seguinte forma
gato saída1.txt
##gff-version 2
##source-version geneious 5.6.4
Xm_ABL1 Geneious CDS 351297 351464 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 371785 371950 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 372554 372847 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 374760 375032 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 375230 375313 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 375992 376171 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 376575 376759 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 376914 377067 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Eu tenho outro arquivo de entrada, input2.txt
gato input2.txt
##gff-version 2
##source-version geneious 5.6.3
gi371443188gbJH5566721_extraction_reversed Geneious CDS 1043 1132 . + . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi371443188gbJH5566721_extraction_reversed Geneious CDS 2063 2260 . + . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi371443188gbJH5566721_extraction_reversed Geneious CDS 2336 2593 . + . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi371443188gbJH5566721_extraction_reversed Geneious CDS 3474 3633 . + . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi371443188gbJH5566721_extraction_reversed Geneious extracted region 1 13933 . + . Name=Extracted region from gi|371443188|gb|JH556672.1|;Extracted interval="2010140 <- 2024072"
Quero pegar $ 14 da última linha (interval = "1960862) apenas o número (2010140) e adicionar à coluna $ 4 (1043,1132,2063..3633), ou seja, (1043 + 2010140 = 2011183, , 2063 + 2010140 = 2012203, ) e para a coluna $5 (1132+2010140=2011272,2260+2010140=2012400 ), ignore a última linha.
A saída deve ficar assim:
gato saída2.txt
##gff-version 2
##source-version geneious 5.6.3
gi371443188gbJH5566721_extraction_reversed Geneious CDS 2011183 2011272 . + . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi371443188gbJH5566721_extraction_reversed Geneious CDS 2012203 2012400 . + . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi371443188gbJH5566721_extraction_reversed Geneious CDS 2012476 2012733 . + . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi371443188gbJH5566721_extraction_reversed Geneious CDS 2013614 2013773 . + . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
Mas eu preciso de um script perl, com base na entrada do usuário (pode ser input1.txt ou input2.txt), verificar as condições e fornecer o output1.txt ou output2.txt
Responder1
Presumi que as linhas da região extraída seguem as linhas CDS para cada alinhamento.
Copie este código em script.pl:
use strict;
use warnings;
my $input = 1;
my @field = ('CDS','extracted region');
my (%data);
my (%counter);
&zero;
while ( <> ) {
## Omit header.
next if $. == 1;
next if $. == 2;
## Remove last '\n'.
chomp;
## Split line in tabs.
my @f = split /\t/;
## Is loop over?
if ( $f[2] =~ /$field[0]/ && $counter{$field[1]} > 1 )
{
&comparing;
&zero;
}
## Count number of $field[0] and $field[1] line
$counter{$f[2]}++;
## Storing data
@{$data{$f[2]}[$counter{$f[2]}]} = @f;
}
&comparing;
sub zero {
$data{$field[0]} = [];
$data{$field[1]} = [];
$counter{$field[0]} = 0;
$counter{$field[1]} = 0;
}
sub comparing {
## Is same line ($field[0] and $field[1])? if ( $input == 1 )
if ( $counter{$field[0]} == $counter{$field[1]} || $input == 2 )
{
&recover;
&stamp;
}
}
sub recover {
my $pos = &input2(0,0) if ( $input == 2 );
for my $i ( 1 .. $#{ $data{$field[0]} } ) {
&input1($i) if ( $input == 1 );
&input2($i,$pos) if ( $input == 2 );
}
}
sub input1 {
#;Extracted interval="376914 -> 377067"
$data{$field[1]}[$_[0]][8] =~ m/;Extracted interval="(\d+) /;
$data{$field[0]}[$_[0]][3] = $1;
$data{$field[1]}[$_[0]][8] =~ m/;Extracted interval="\d+ -> (\d+)"/;
$data{$field[0]}[$_[0]][4] = $1;
}
sub input2 {
if ( $_[0] == 0 )
{
#;Extracted interval="2010140 <- 2024072"
$data{$field[1]}[1][8] =~ m/;Extracted interval="(\d+) /;
$1;
}
else
{
$data{$field[0]}[$_[0]][3] = $_[1] + $data{$field[0]}[$_[0]][3];
$data{$field[0]}[$_[0]][4] = $_[1] + $data{$field[0]}[$_[0]][4];
}
}
sub stamp {
for my $i ( 1 .. $#{ $data{$field[0]} } ) {
for my $j ( 0 .. $#{ $data{$field[0]}[$i] } ) {
print "$data{$field[0]}[$i][$j]\t";
}
print "\n";
}
}
Você poderia executar o script perl com input1.txt:
perl script.pl input1.txt > output1.txt
se você modificar a linha:
my $input = 1;
com
my $input = 2;
você poderia executar o script perl com input2.txt:
perl script.pl input2.txt > output2.txt
O script Perl também pode obter dois argumentos: arquivo de entrada e tipo [12].
EDITAR
Emhttps://stackoverflow.com/questions/1730333/how-do-i-use-getoptions-to-get-the-default-argumentexistem alguns métodos para obter argumentos.
Se você modificar a linha:
my $input = 1;
com
my $input = 1;
$input = $ARGV[1] if defined $ARGV[1];
você poderia executar o script perl com input1.txt:
perl script.pl input1.txt > output1.txt
ou
perl script.pl input1.txt 1 > output1.txt
e você pode executar o script perl com input2.txt:
perl script.pl input2.txt 2 > output2.txt