Grep para buscar patrones en un archivo

Question 1

Si no le importa una columna adicional con un número, puede usar joiny greppara hacer esto.

$ join <(grep -of patterns.txt file.txt | nl) \
       <(grep -f patterns.txt file.txt | nl)
1 KO3322 proteinaseK (KO3322)
2 KO3435 Xxxxx KO3435;folding factor
3 KO3435 Yyyyy KO3435,xxxx

Answer

Si no le importa una columna adicional con un número, puede usar joiny greppara hacer esto.

$ join <(grep -of patterns.txt file.txt | nl) \
       <(grep -f patterns.txt file.txt | nl)
1 KO3322 proteinaseK (KO3322)
2 KO3435 Xxxxx KO3435;folding factor
3 KO3435 Yyyyy KO3435,xxxx

Question 2

Puedes usar un bucle de shell:

$ while read pat; do 
    grep "$pat" file | 
        while read match do 
            echo -e "$pat\t$match"
        done
 done < patterns 
KO3435  Xxxxx KO3435;folding factor
KO3435  Yyyyy KO3435,xxxx
KO3322  proteinaseK (KO3322)

Lo probé ejecutando esto en el archivo plano UniProt para humanos (625M) y usando 1000 ID de UniProt como patrones. Me tomó ~6 minutos en mi computadora portátil Pentium i7. Me tomó ~35 segundos cuando solo busqué 100 patrones.

Como se señala en los comentarios a continuación, puede hacer esto un poco más rápido omitiendo echoy usando greplas opciones --labely -H:

$ while read pat; do 
    grep "$pat" --label="$pat" -H < file
done < patterns

Ejecutar esto en sus archivos de ejemplo produce:

$ while read pat; do 
    grep "$pat" --label="$pat" -H < kegg.annotations; 
  done < allKO.IDs.txt > test1
terdon@oregano foo $ cat test1 
K02217:>aai:AARI_26600  ferritin-like protein; K02217 ferritin [EC:1.16.3.1]
K07448:>aai:AARI_33320  mrr; restriction system protein Mrr; K07448 restriction system protein

Answer

Puedes usar un bucle de shell:

$ while read pat; do 
    grep "$pat" file | 
        while read match do 
            echo -e "$pat\t$match"
        done
 done < patterns 
KO3435  Xxxxx KO3435;folding factor
KO3435  Yyyyy KO3435,xxxx
KO3322  proteinaseK (KO3322)

Lo probé ejecutando esto en el archivo plano UniProt para humanos (625M) y usando 1000 ID de UniProt como patrones. Me tomó ~6 minutos en mi computadora portátil Pentium i7. Me tomó ~35 segundos cuando solo busqué 100 patrones.

Como se señala en los comentarios a continuación, puede hacer esto un poco más rápido omitiendo echoy usando greplas opciones --labely -H:

$ while read pat; do 
    grep "$pat" --label="$pat" -H < file
done < patterns

Ejecutar esto en sus archivos de ejemplo produce:

$ while read pat; do 
    grep "$pat" --label="$pat" -H < kegg.annotations; 
  done < allKO.IDs.txt > test1
terdon@oregano foo $ cat test1 
K02217:>aai:AARI_26600  ferritin-like protein; K02217 ferritin [EC:1.16.3.1]
K07448:>aai:AARI_33320  mrr; restriction system protein Mrr; K07448 restriction system protein

Question 3

Puedes usarack:

$ ack "$(tr '\n' '|' < pattern.txt | sed -e 's/.$//')" --print0 --output='$& $_' file.txt
KO3322 proteinaseK (KO3322)
KO3435 Xxxxx KO3435;folding factor
KO3435 Yyyyy KO3435,xxxx

Answer

Puedes usarack:

$ ack "$(tr '\n' '|' < pattern.txt | sed -e 's/.$//')" --print0 --output='$& $_' file.txt
KO3322 proteinaseK (KO3322)
KO3435 Xxxxx KO3435;folding factor
KO3435 Yyyyy KO3435,xxxx

Grep para buscar patrones en un archivo

Respuesta1

Respuesta2

Respuesta3

información relacionada