計算 DNA 檔案中的 DNA 密碼子

Question 1

您得到該輸出是因為腳本的第一行啟動了新的bashshell。

該行應該讀作

#!/bin/bash

（注意#開頭的）。

然後，您awk以永遠行不通的方式將語法與 shell 程式碼混合在一起。

相反，保持簡單並將文件分成三個字元組，對它們進行排序併計算您獲得了多少個獨特的字元：

$ fold -w 3 dnafile | sort | uniq -c
   3 aac
   2 acg
   1 ttt

只要輸入始終包含三個字符的倍數，並且沒有嵌入空格或其他字符，這種方法就可以工作。

Answer

您得到該輸出是因為腳本的第一行啟動了新的bashshell。

該行應該讀作

#!/bin/bash

（注意#開頭的）。

然後，您awk以永遠行不通的方式將語法與 shell 程式碼混合在一起。

相反，保持簡單並將文件分成三個字元組，對它們進行排序併計算您獲得了多少個獨特的字元：

$ fold -w 3 dnafile | sort | uniq -c
   3 aac
   2 acg
   1 ttt

只要輸入始終包含三個字符的倍數，並且沒有嵌入空格或其他字符，這種方法就可以工作。

Question 2

(echo aacacgaactttaacacg ;echo aacacgaactttaacacg ) |
  perl -ne '# Split input into triplets (A3)
            # use each triplet as key in the hash table count
            #   and increase the value for the key
            map { $count{$_}++ } unpack("(A3)*",$_);
            # When we are at the end of the file
            END{ 
                 # Remove the key "" (which is wrong)
                 delete $count{""};
                 # For each key: Print key, count
                 print map { "$_ $count{$_}\n" } keys %count
            }'

Answer

(echo aacacgaactttaacacg ;echo aacacgaactttaacacg ) |
  perl -ne '# Split input into triplets (A3)
            # use each triplet as key in the hash table count
            #   and increase the value for the key
            map { $count{$_}++ } unpack("(A3)*",$_);
            # When we are at the end of the file
            END{ 
                 # Remove the key "" (which is wrong)
                 delete $count{""};
                 # For each key: Print key, count
                 print map { "$_ $count{$_}\n" } keys %count
            }'

Question 3

稍微冗長的awk版本

awk 'BEGINFILE{print FILENAME; delete codon}
     ENDFILE {
     if (NR!=1 || NF!=1 || length($0)%3!=0){
         print "is broken"}
     else{
         for (i=1; i<=length($0); i+=3) codon[substr($0,i,3)]++}; 
         for (c in codon) print c, codon[c]; 
         print ""}' file*

對於這個輸入

文件1：好的

aacacgaactttaacacg

文件2：空格

aacacgaact ttaacacg

文件3：換行符

aacacgaact
ttaacacg

file4：不是 3 個鹼基的倍數

aacacgaactttaacac

你得到

file1
aac 3
ttt 1
acg 2

file2
is broken

file3
is broken

file4
is broken

如果您只想修復文件並且沒有像file4您cat的文件tr從一端awk或另一端通過的文件，就像您的範例一樣

<<< $(cat file[1..3] | tr -d "\n ")

Answer