我有這個意見:
##gff-version 3
chr1 TAIR10 mRNA 3631 5899 . + . ID AT1G01010.1 ;geneID AT1G01010 ;gene_name AT1G01010
chr1 TAIR10 exon 3631 3913 . + . Parent AT1G01010.1
chr1 TAIR10 exon 3996 4276 . + . Parent AT1G01010.1
chr1 TAIR10 exon 4486 4605 . + . Parent AT1G01010.1
chr1 TAIR10 exon 4706 5095 . + . Parent AT1G01010.1
chr1 TAIR10 exon 5174 5326 . + . Parent AT1G01010.1
chr1 TAIR10 exon 5439 5899 . + . Parent AT1G01010.1
我希望 ID、geneID 和gene_name 像下面的輸出一樣帶有雙引號:
##gff-version 3
chr1 TAIR10 mRNA 3631 5899 . + . ID "AT1G01010.1" ;geneID "AT1G01010" ;gene_name "AT1G01010"
chr1 TAIR10 exon 3631 3913 . + . Parent "AT1G01010.1"
chr1 TAIR10 exon 3996 4276 . + . Parent "AT1G01010.1"
chr1 TAIR10 exon 4486 4605 . + . Parent "AT1G01010.1"
chr1 TAIR10 exon 4706 5095 . + . Parent "AT1G01010.1"
chr1 TAIR10 exon 5174 5326 . + . Parent "AT1G01010.1"
chr1 TAIR10 exon 5439 5899 . + . Parent "AT1G01010.1"
我一直在測試
awk '{sub($10, "\"&\""); print}' file.gtf
感謝您閱讀我的問題
答案1
又快又髒
sed -E 's#(ID|Parent|gene_name) ([0-9A-Za-z.]+)#\1 \"\2\"#g'