![Perl-新增底線並對行進行排序](https://rvso.com/image/23939/Perl-%E6%96%B0%E5%A2%9E%E5%BA%95%E7%B7%9A%E4%B8%A6%E5%B0%8D%E8%A1%8C%E9%80%B2%E8%A1%8C%E6%8E%92%E5%BA%8F.png)
由於我是生物學家並且是 Perl 新手,我希望獲得 Perl 專家的幫助
貓輸入.txt
##gff-version 2
##source-version geneious 5.6.3
gi371443188gbJH5566721_extraction_reversed Geneious CDS 1043 1132 . + . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi371443188gbJH5566721_extraction_reversed Geneious CDS 2063 2260 . + . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi371443188gbJH5566721_extraction_reversed Geneious CDS 2336 2593 . + . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi371443188gbJH5566721_extraction_reversed Geneious CDS 3474 3633 . + . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi371443188gbJH5566721_extraction_reversed Geneious extracted region 1 13933 . + . Name=Extracted region from gi|371443188|gb|JH556672.1|;Extracted interval="2010140 <- 2024072"
我的輸出.txt
gi371443188gbJH5566721_extraction_reversed CDS 2023029 2022940 . - . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi371443188gbJH5566721_extraction_reversed CDS 2022009 2021812 . - . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi371443188gbJH5566721_extraction_reversed CDS 2021736 2021479 . - . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi371443188gbJH5566721_extraction_reversed CDS 2020598 2020439 . - . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
###
我的預期輸出
gi_371443188_gb_JH5566721 gene 2020598 2023029 . - . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.13412469767431
gi_371443188_gb_JH5566721 CDS 2020598 2020439 . - . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi_371443188_gb_JH5566721 CDS 2021736 2021479 . - . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi_371443188_gb_JH5566721 CDS 2022009 2021812 . - . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
gi_371443188_gb_JH5566721 CDS 2023029 2022940 . - . Name=Xm ITGB3;created by=User;modified by=User;ID=Pa0FVoXpt/GgL1I/VO7LY0UlFAc.1341246976743.1
###
我需要 Perl 專家的幫助來重新格式化下面給出的 Perl 程式碼中的輸出。
1.我想在輸出中將分數加入數組[0](即gi371443188gbJH5566721_extraction_reversed為gi_371443188_gb_JH5566721)
2.根據輸出的第3列和第4列的值按升序對CDS的行進行排序(請參閱預期輸出)
3.在檔案頂部新增行作為 gi_371443188_gb_JH556672.1 基因,其中包含 CDS 行的最小值和最大值(參見預期輸出)
我的 Perl 程式碼如下。
#usr/bin/perl;
open(FH,"$ARGV[0]");
my %num="";
my %all="";
while(<FH>){
chomp $_;
my @array=split("\t"); #print "$array[2]\n";
if($array[2] eq "extracted region"){
$array[8]=~/.*\w+=\"\d+ <- (\d+)"/gm;
$num{$array[0]}="$1";
}
if($array[2] eq "CDS"){
$all{$array[0]}.="$_\n";
}
}
foreach $i (keys %all){
my @line=split "\n",$all{$i};
for ($j=0;$j<=$#line;$j++){
my @new_line=split "\t",$line[$j];
my $pos1=$num{$i}-$new_line[3];
my $pos2=$num{$i}-$new_line[4]; #print $num{$i}; exit;
$new_line[6] =~ s/\+/-/g;
print "$new_line[0]\t$new_line[2]\t$pos1\t$pos2\t$new_line[5]\t$new_line[6]\t$new_line[7]\t$new_line[8]\n";
}
}
print "###\n";
答案1
這將做到這一點(即,我的輸出與您的輸出相匹配),儘管它不是最乾淨的。
用於添加下劃線的正規表示式可能可以組合起來。但對於排序,您需要將所有輸出行推送到一個列表,然後對其進行排序(也可以在傳入端完成,但您仍然需要先將所有輸出行放在一個位置)。
--- test.pl~ 2012-07-13 12:04:36.000000000 -0700
+++ test.pl 2012-07-13 12:17:58.000000000 -0700
@@ -1,4 +1,4 @@
-#usr/bin/perl
+#!/usr/bin/perl
use strict;
open(FH,"$ARGV[0]");
@@ -18,6 +18,7 @@
}
my $i;
+ my @output;
foreach $i (keys %all){
my @line=split "\n",$all{$i};
@@ -27,8 +28,15 @@
my $pos1=$num{$i}-$new_line[3];
my $pos2=$num{$i}-$new_line[4]; #print $num{$i}; exit;
$new_line[6] =~ s/\+/-/g;
- print "$new_line[0]\t$new_line[2]\t$pos1\t$pos2\t$new_line[5]\t$new_line[6]\t$new_line[7]\t$new_line[8]\n";
+ $new_line[0] =~ s/gi/gi_/;
+ $new_line[0] =~ s/gb/_gb_/;
+ $new_line[0] =~ s/_extraction_reversed//;
+ push @output, "$new_line[0]\t$new_line[2]\t$pos1\t$pos2\t$new_line[5]\t$new_line[6]\t$new_line[7]\t$new_line[8]\n";
}
}
+ @output = sort (@output);
+ foreach my $out (@output) {
+ print $out;
+ }
print "###\n";