내용에 따라 파일을 다른 디렉터리로 어떻게 이동할 수 있나요?

2024-6-11 • tag-icon

다음과 같은 문자열을 포함하는 파일이 많이 있습니다.

/databis/defontis/Dossier_fasta_chrm_avec_piler/SRR6237661_chrm.fasta: N putative CRISPR arrays found

여기서 는 N둘 중 하나 이상일 수 있는 숫자입니다 0. 가 N있는 모든 파일을 0디렉토리로 이동하고, 그보다 큰 Sans_crispr모든 파일을 디렉토리로 이동해야 합니다 .N0Avec_crispr

ls또한 CRISPR가 발견되지 않은 모든 파일(이 N있는 파일 0)은 3355바이트보다 작으므로 사용할 수 있음을 알 수 있습니다 .

나는 이것을 시도했다:

find . -name "*.out" -type 'f' -size -5k -exec mv {} /databis/defontis/Dossier_fasta_chrm_avec_piler/Dossier_fasta_chrm_sortie_pilercr/Sans_Crispr/ \;

하지만 내 모든 파일에는 이것이 있습니다

mv: cannot move './SRR5273182_chrm.fasta.fa-pilercr.out' to '/databis/defontis/Dossier_fasta_chrm_avec_piler/Dossier_fasta_chrm_sortie_pilercr/Sans-Crispr/': Not a directory

나는 일부 for f in ...do done또는 if then fi. grep패턴을 시도했지만 ' 0 putative CRISPR arrays found' 그 중 아무 것도 작동하지 않았습니다. 항상 오류가 발생하거나 원하는 것을 찾지 못했습니다.

다음은 내 파일의 예입니다.

그리고 내용물은 이렇습니다: 크리스퍼와 함께

Help on reading this report
===========================

This report has three sections: Detailed, Summary by Similarity and Summary by Position.

The detailed section shows each repeat in each putative CRISPR array.

The summary sections give one line for each array.

An 'array' is a contiguous sequence of CRISPR repeats looking like this:

    REPEAT Spacer REPEAT Spacer REPEAT ... Spacer REPEAT

Within one array, repeats have high similarity and spacers are, roughly speaking, unique within a window around the array. In a given array, each repeat has a similar length, and each spacer has a similar length. With default parameters, the algorithm allows a fair amount of variability in order to maximize sensitivity. This may allow identification of inactive ("fossil") arrays, and may in rare cases also induce false positives due to other classes of repeats such as microsatellites, LTRs and arrays of RNA genes.


Columns in the detailed section are:

  Pos               Sequence position, starting at 1 for the first base.   Repeat            Length of the repeat.   %id               Identity with the consensus sequence.   Spacer            Length of spacer to the right of this repeat.   Left flank        10 bases to the left of this repeat.   Repeat            Sequence of this repeat.
                      Dots indicate positions where this repeat
                      agrees with the consensus sequence below.   Spacer            Sequence of spacer to the right of this repeat,
                      or 10 bases if this is the last repeat.

The left flank sequence duplicates the end of the spacer for the preceding repeat; it is provided to facilitate visual identification of cases where the algorithm does not correctly identify repeat endpoints.

At the end of each array there is a sub-heading that gives the average repeat length, average spacer length and consensus sequence.

Columns in the summary sections are:

  Array             Number 1, 2 ... referring back to the detailed report.   Sequence          FASTA label of the sequence. May be truncated.   From              Start position of array.   To           End position of array.   # copies          Number of repeats in the array.   Repeat            Average repeat length.   Spacer            Average spacer length.   +                 +/-, indicating orientation relative to first array in group.   Distance          Distance from previous array.   Consensus         Consensus sequence.

In the Summary by Similarity section, arrays are grouped by similarity of their consensus sequences. If consensus sequences are sufficiently similar, they are aligned to each other to indicate probable relationships between arrays.

In the Summary by Position section, arrays are sorted by position within the input sequence file.

The Distance column facilitates identification of cases where a single array has been reported as two adjacent arrays. In such a case, (a) the consensus sequences will be similar or identical, and (b) the distance will be approximately a small multiple of the repeat length + spacer length.

Use the -noinfo option to turn off this help. Use the -help option to get a list of command line options.

pilercr v1.06 By Robert C. Edgar

/databis/defontis/Dossier_fasta_chrm_avec_piler/SRR2177954_chrm.fasta: 1 putative CRISPR arrays found.



DETAIL REPORT



Array 1
>SRR2177954.k141_500270 flag=1 multi=9.2309 len=7453

       Pos  Repeat     %id  Spacer  Left flank    Repeat                                  Spacer
==========  ======  ======  ======  ==========    ====================================    ======
        66      36   100.0      25  CAGAAGTATT    ....................................    CTCACACACGCTGATGCAGACAACA
       127      36   100.0      26  GCAGACAACA    ....................................    GCGAGAGCAGGGATTTGGAACGTAAT
       189      36   100.0      26  GGAACGTAAT    ....................................    ATGTTGATGGAAAAACTCCCACAGAC
       251      36   100.0          TCCCACAGAC    ....................................    ACTGAATGTG
==========  ======  ======  ======  ==========    ====================================
         4      36              25                ATCTACAAAAGTAGAAATTTTATAGAGGTATTTGGC


SUMMARY BY SIMILARITY



Array          Sequence    Position      Length  # Copies  Repeat  Spacer  +  Consensus
=====  ================  ==========  ==========  ========  ======  ======  =  =========
    1  SRR2177954.k141_          66         221         4      36      25  +  ATCTACAAAAGTAGAAATTTTATAGAGGTATTTGGC



SUMMARY BY POSITION



>SRR2177954.k141_500270 flag=1 multi=9.2309 len=7453

Array          Sequence    Position      Length  # Copies  Repeat  Spacer    Distance  Consensus
=====  ================  ==========  ==========  ========  ======  ======  ==========  =========
    1  SRR2177954.k141_          66         221         4      36      25              ATCTACAAAAGTAGAAATTTTATAGAGGTATTTGGC

크리스퍼 없이

Help on reading this report
===========================

This report has three sections: Detailed, Summary by Similarity
and Summary by Position.

The detailed section shows each repeat in each putative
CRISPR array.

The summary sections give one line for each array.

An 'array' is a contiguous sequence of CRISPR repeats
looking like this:

    REPEAT Spacer REPEAT Spacer REPEAT ... Spacer REPEAT

Within one array, repeats have high similarity and spacers
are, roughly speaking, unique within a window around the array.
In a given array, each repeat has a similar length, and each
spacer has a similar length. With default parameters, the
algorithm allows a fair amount of variability in order to
maximize sensitivity. This may allow identification of
inactive ("fossil") arrays, and may in rare cases also
induce false positives due to other classes of repeats
such as microsatellites, LTRs and arrays of RNA genes.


Columns in the detailed section are:

  Pos               Sequence position, starting at 1 for the first base.
  Repeat            Length of the repeat.
  %id               Identity with the consensus sequence.
  Spacer            Length of spacer to the right of this repeat.
  Left flank        10 bases to the left of this repeat.
  Repeat            Sequence of this repeat.
                      Dots indicate positions where this repeat
                      agrees with the consensus sequence below.
  Spacer            Sequence of spacer to the right of this repeat,
                      or 10 bases if this is the last repeat.

The left flank sequence duplicates the end of the spacer for the preceding
repeat; it is provided to facilitate visual identification of cases
where the algorithm does not correctly identify repeat endpoints.

At the end of each array there is a sub-heading that gives the average
repeat length, average spacer length and consensus sequence.

Columns in the summary sections are:

  Array             Number 1, 2 ... referring back to the detailed report.
  Sequence          FASTA label of the sequence. May be truncated.
  From              Start position of array.
  To                End position of array.
  # copies          Number of repeats in the array.
  Repeat            Average repeat length.
  Spacer            Average spacer length.
  +                 +/-, indicating orientation relative to first array in group.
  Distance          Distance from previous array.
  Consensus         Consensus sequence.

In the Summary by Similarity section, arrays are grouped by similarity of their
consensus sequences. If consensus sequences are sufficiently similar, they are
aligned to each other to indicate probable relationships between arrays.

In the Summary by Position section, arrays are sorted by position within the
input sequence file.

The Distance column facilitates identification of cases where a single
array has been reported as two adjacent arrays. In such a case, (a) the
consensus sequences will be similar or identical, and (b) the distance
will be approximately a small multiple of the repeat length + spacer length.

Use the -noinfo option to turn off this help.
Use the -help option to get a list of command line options.

pilercr v1.06
By Robert C. Edgar

/databis/defontis/Dossier_fasta_chrm_avec_piler/ERR1544006_chrm.fasta: 0 putative CRISPR arrays found.

시간 내주셔서 감사합니다

답변1

간단히 파일을 반복 grep하고 : 0 putative CRISPR regions. 일치하는 항목을 찾으면 grep파일을 이동합니다.

mkdir -p Sans_crispr Avec_crispr
for file in *pilercr.out; do
    if grep -q ': 0 putative CRISPR arrays' "$file"; then
        mv "$file" Sans_crispr
    else
        mv "$file" Avec_crispr
    fi
done

플래그 -q는 grep출력을 인쇄하지 않도록 지시하지만, 일치하는 항목이 없으면 실패 상태로 종료되고 일치하는 항목이 발견되면 성공으로 종료됩니다. 그래서 여기서는 이를 사용하여 파일을 적절한 폴더로 이동합니다.

이 오류가 발생한 이유:

mv: cannot move './SRR5273182_chrm.fasta.fa-pilercr.out' to '/databis/defontis/Dossier_fasta_chrm_avec_piler/Dossier_fasta_chrm_sortie_pilercr/Sans-Crispr/': Not a directory

디렉토리가 /databis/defontis/Dossier_fasta_chrm_avec_piler/Dossier_fasta_chrm_sortie_pilercr/Sans-Crispr/존재하지 않기 때문입니다. 이것이 위의 작은 스크립트의 첫 번째 명령이 mkdir -p Sans_crispr Avec_crispr"아직 존재하지 않는 경우 Sans_crispr 및 Avec_crispr 디렉터리를 생성합니다"를 의미하는 이유입니다.

답변1

관련 정보