Eu tenho muitos arquivos que contêm uma string como esta:
/databis/defontis/Dossier_fasta_chrm_avec_piler/SRR6237661_chrm.fasta: N putative CRISPR arrays found
Onde the N
é um número que pode ser um 0
ou mais. Preciso mover todos os arquivos onde está N
para 0
o diretório Sans_crispr
e todos os arquivos onde N
é maior que 0
para o diretório Avec_crispr
.
Também posso ver ls
que todos os arquivos onde nenhum CRISPR foi encontrado (aqueles onde N
está 0
) são menores que 3355 bytes, então talvez isso possa ser usado.
Eu tentei isso:
find . -name "*.out" -type 'f' -size -5k -exec mv {} /databis/defontis/Dossier_fasta_chrm_avec_piler/Dossier_fasta_chrm_sortie_pilercr/Sans_Crispr/ \;
Mas para todos os meus arquivos eu tenho isso
mv: cannot move './SRR5273182_chrm.fasta.fa-pilercr.out' to '/databis/defontis/Dossier_fasta_chrm_avec_piler/Dossier_fasta_chrm_sortie_pilercr/Sans-Crispr/': Not a directory
Eu tentei alguns for f in ...do done
ou if then fi
. Tentei grep
o padrão ' 0 putative CRISPR arrays found'
Mas nenhum funcionou, sempre erro ou não encontrei o que queria.
Este é um exemplo dos meus arquivos:
E este é o conteúdo: Com Crispr
Help on reading this report
===========================
This report has three sections: Detailed, Summary by Similarity and Summary by Position.
The detailed section shows each repeat in each putative CRISPR array.
The summary sections give one line for each array.
An 'array' is a contiguous sequence of CRISPR repeats looking like this:
REPEAT Spacer REPEAT Spacer REPEAT ... Spacer REPEAT
Within one array, repeats have high similarity and spacers are, roughly speaking, unique within a window around the array. In a given array, each repeat has a similar length, and each spacer has a similar length. With default parameters, the algorithm allows a fair amount of variability in order to maximize sensitivity. This may allow identification of inactive ("fossil") arrays, and may in rare cases also induce false positives due to other classes of repeats such as microsatellites, LTRs and arrays of RNA genes.
Columns in the detailed section are:
Pos Sequence position, starting at 1 for the first base. Repeat Length of the repeat. %id Identity with the consensus sequence. Spacer Length of spacer to the right of this repeat. Left flank 10 bases to the left of this repeat. Repeat Sequence of this repeat.
Dots indicate positions where this repeat
agrees with the consensus sequence below. Spacer Sequence of spacer to the right of this repeat,
or 10 bases if this is the last repeat.
The left flank sequence duplicates the end of the spacer for the preceding repeat; it is provided to facilitate visual identification of cases where the algorithm does not correctly identify repeat endpoints.
At the end of each array there is a sub-heading that gives the average repeat length, average spacer length and consensus sequence.
Columns in the summary sections are:
Array Number 1, 2 ... referring back to the detailed report. Sequence FASTA label of the sequence. May be truncated. From Start position of array. To End position of array. # copies Number of repeats in the array. Repeat Average repeat length. Spacer Average spacer length. + +/-, indicating orientation relative to first array in group. Distance Distance from previous array. Consensus Consensus sequence.
In the Summary by Similarity section, arrays are grouped by similarity of their consensus sequences. If consensus sequences are sufficiently similar, they are aligned to each other to indicate probable relationships between arrays.
In the Summary by Position section, arrays are sorted by position within the input sequence file.
The Distance column facilitates identification of cases where a single array has been reported as two adjacent arrays. In such a case, (a) the consensus sequences will be similar or identical, and (b) the distance will be approximately a small multiple of the repeat length + spacer length.
Use the -noinfo option to turn off this help. Use the -help option to get a list of command line options.
pilercr v1.06 By Robert C. Edgar
/databis/defontis/Dossier_fasta_chrm_avec_piler/SRR2177954_chrm.fasta: 1 putative CRISPR arrays found.
DETAIL REPORT
Array 1
>SRR2177954.k141_500270 flag=1 multi=9.2309 len=7453
Pos Repeat %id Spacer Left flank Repeat Spacer
========== ====== ====== ====== ========== ==================================== ======
66 36 100.0 25 CAGAAGTATT .................................... CTCACACACGCTGATGCAGACAACA
127 36 100.0 26 GCAGACAACA .................................... GCGAGAGCAGGGATTTGGAACGTAAT
189 36 100.0 26 GGAACGTAAT .................................... ATGTTGATGGAAAAACTCCCACAGAC
251 36 100.0 TCCCACAGAC .................................... ACTGAATGTG
========== ====== ====== ====== ========== ====================================
4 36 25 ATCTACAAAAGTAGAAATTTTATAGAGGTATTTGGC
SUMMARY BY SIMILARITY
Array Sequence Position Length # Copies Repeat Spacer + Consensus
===== ================ ========== ========== ======== ====== ====== = =========
1 SRR2177954.k141_ 66 221 4 36 25 + ATCTACAAAAGTAGAAATTTTATAGAGGTATTTGGC
SUMMARY BY POSITION
>SRR2177954.k141_500270 flag=1 multi=9.2309 len=7453
Array Sequence Position Length # Copies Repeat Spacer Distance Consensus
===== ================ ========== ========== ======== ====== ====== ========== =========
1 SRR2177954.k141_ 66 221 4 36 25 ATCTACAAAAGTAGAAATTTTATAGAGGTATTTGGC
Sem Crispr
Help on reading this report
===========================
This report has three sections: Detailed, Summary by Similarity
and Summary by Position.
The detailed section shows each repeat in each putative
CRISPR array.
The summary sections give one line for each array.
An 'array' is a contiguous sequence of CRISPR repeats
looking like this:
REPEAT Spacer REPEAT Spacer REPEAT ... Spacer REPEAT
Within one array, repeats have high similarity and spacers
are, roughly speaking, unique within a window around the array.
In a given array, each repeat has a similar length, and each
spacer has a similar length. With default parameters, the
algorithm allows a fair amount of variability in order to
maximize sensitivity. This may allow identification of
inactive ("fossil") arrays, and may in rare cases also
induce false positives due to other classes of repeats
such as microsatellites, LTRs and arrays of RNA genes.
Columns in the detailed section are:
Pos Sequence position, starting at 1 for the first base.
Repeat Length of the repeat.
%id Identity with the consensus sequence.
Spacer Length of spacer to the right of this repeat.
Left flank 10 bases to the left of this repeat.
Repeat Sequence of this repeat.
Dots indicate positions where this repeat
agrees with the consensus sequence below.
Spacer Sequence of spacer to the right of this repeat,
or 10 bases if this is the last repeat.
The left flank sequence duplicates the end of the spacer for the preceding
repeat; it is provided to facilitate visual identification of cases
where the algorithm does not correctly identify repeat endpoints.
At the end of each array there is a sub-heading that gives the average
repeat length, average spacer length and consensus sequence.
Columns in the summary sections are:
Array Number 1, 2 ... referring back to the detailed report.
Sequence FASTA label of the sequence. May be truncated.
From Start position of array.
To End position of array.
# copies Number of repeats in the array.
Repeat Average repeat length.
Spacer Average spacer length.
+ +/-, indicating orientation relative to first array in group.
Distance Distance from previous array.
Consensus Consensus sequence.
In the Summary by Similarity section, arrays are grouped by similarity of their
consensus sequences. If consensus sequences are sufficiently similar, they are
aligned to each other to indicate probable relationships between arrays.
In the Summary by Position section, arrays are sorted by position within the
input sequence file.
The Distance column facilitates identification of cases where a single
array has been reported as two adjacent arrays. In such a case, (a) the
consensus sequences will be similar or identical, and (b) the distance
will be approximately a small multiple of the repeat length + spacer length.
Use the -noinfo option to turn off this help.
Use the -help option to get a list of command line options.
pilercr v1.06
By Robert C. Edgar
/databis/defontis/Dossier_fasta_chrm_avec_piler/ERR1544006_chrm.fasta: 0 putative CRISPR arrays found.
Obrigado pelo seu tempo
Responder1
Simplesmente itere sobre os arquivos e grep
para arquivos : 0 putative CRISPR regions
. Se grep
encontrar uma correspondência, mova o arquivo:
mkdir -p Sans_crispr Avec_crispr
for file in *pilercr.out; do
if grep -q ': 0 putative CRISPR arrays' "$file"; then
mv "$file" Sans_crispr
else
mv "$file" Avec_crispr
fi
done
O -q
sinalizador grep
diz para não imprimir nenhuma saída, mas ainda assim sairá com status de falha se nenhuma correspondência for encontrada e com sucesso se uma correspondência for encontrada. Então aqui usamos isso para mover os arquivos para a pasta apropriada.
O motivo pelo qual você estava recebendo este erro:
mv: cannot move './SRR5273182_chrm.fasta.fa-pilercr.out' to '/databis/defontis/Dossier_fasta_chrm_avec_piler/Dossier_fasta_chrm_sortie_pilercr/Sans-Crispr/': Not a directory
É porque o diretório /databis/defontis/Dossier_fasta_chrm_avec_piler/Dossier_fasta_chrm_sortie_pilercr/Sans-Crispr/
não existe. É por isso que o primeiro comando no pequeno script acima mkdir -p Sans_crispr Avec_crispr
significa "criar os diretórios Sans_crispr e Avec_crispr, a menos que eles ainda não existam".