comprobar si la cadena está presente en una lista y generar un tercer archivo si la cadena está presente

Question 1

Esto se expresa claramente en awk:

awk 'FNR==NR { h[$1]; next } { for(i=2; i<=NF; i++) $i = ($i in h)? 1 : 0 } 1' mylist.tab data.tab

O en un formato más legible:

parse.awk

# Collect mylist.tab into the `h` associative array
FNR==NR {
  h[$1]
  next
}

# For all but the first column in data.tab check and record if it is in `h`
{ 
  for(i=2; i<=NF; i++) 
    $i = ($i in h) ? 1 : 0 
}

# Short for { print $0 }
1

Ejecútelo así:

awk -f parse.awk mylist.tab data.tab

Producción:

Info_1 0 1 1
Info_2 1 0
Info_3 1
Info_4 1 0 0 0 1
Info_5

O para columnas delimitadas por tabulaciones:

awk -v OFS='\t' -f parse.awk mylist.tab data.tab

Producción:

Info_1  0   1   1
Info_2  1   0
Info_3  1
Info_4  1   0   0   0   1
Info_5

Answer

Esto se expresa claramente en awk:

awk 'FNR==NR { h[$1]; next } { for(i=2; i<=NF; i++) $i = ($i in h)? 1 : 0 } 1' mylist.tab data.tab

O en un formato más legible:

parse.awk

# Collect mylist.tab into the `h` associative array
FNR==NR {
  h[$1]
  next
}

# For all but the first column in data.tab check and record if it is in `h`
{ 
  for(i=2; i<=NF; i++) 
    $i = ($i in h) ? 1 : 0 
}

# Short for { print $0 }
1

Ejecútelo así:

awk -f parse.awk mylist.tab data.tab

Producción:

Info_1 0 1 1
Info_2 1 0
Info_3 1
Info_4 1 0 0 0 1
Info_5

O para columnas delimitadas por tabulaciones:

awk -v OFS='\t' -f parse.awk mylist.tab data.tab

Producción:

Info_1  0   1   1
Info_2  1   0
Info_3  1
Info_4  1   0   0   0   1
Info_5

Question 2

Perl al rescate!

Guarde los elementos de la lista en un hash, luego lea la tabla, divídala en espacios en blanco y verifique el hash para imprimir 0 o 1.

#!/usr/bin/perl
use warnings;
use strict;

my %in_list;
open my $LIST, '<', 'mylist.tab' or die $!;
while (<$LIST>) {
    chomp;
    $in_list{$_} = 1;
}

open my $TAB, '<', 'data.tab';
while (<$TAB>) {
    my @cells = split;
    print shift @cells, "\t";
    print join "\t", map $in_list{$_} ? 1 : 0, @cells;
    print "\n";
}

Answer

Perl al rescate!

Guarde los elementos de la lista en un hash, luego lea la tabla, divídala en espacios en blanco y verifique el hash para imprimir 0 o 1.

#!/usr/bin/perl
use warnings;
use strict;

my %in_list;
open my $LIST, '<', 'mylist.tab' or die $!;
while (<$LIST>) {
    chomp;
    $in_list{$_} = 1;
}

open my $TAB, '<', 'data.tab';
while (<$TAB>) {
    my @cells = split;
    print shift @cells, "\t";
    print join "\t", map $in_list{$_} ? 1 : 0, @cells;
    print "\n";
}

Question 3

Úselo sedpara crear un sedscript desde mylist.tab y ejecutarlo en data.tab:

sed \
    -e '1i s/^[ \\t]*//' \
    -e 's@\(.*\)@s/\\([ \\t]\\)\1\\b/\\11/@g' \
    -e '$as/\\([ \\t]\\)[^ \\t]\\{2,\\}\\b/\\10/g' mylist.tab \
    > /tmp/x.sed 
sed -f /tmp/x.sed data.tab

Nota: Supongo que todas las cadenas de "mylist.tab" tienen al menos 2 caracteres.

Answer

Úselo sedpara crear un sedscript desde mylist.tab y ejecutarlo en data.tab:

sed \
    -e '1i s/^[ \\t]*//' \
    -e 's@\(.*\)@s/\\([ \\t]\\)\1\\b/\\11/@g' \
    -e '$as/\\([ \\t]\\)[^ \\t]\\{2,\\}\\b/\\10/g' mylist.tab \
    > /tmp/x.sed 
sed -f /tmp/x.sed data.tab

Nota: Supongo que todas las cadenas de "mylist.tab" tienen al menos 2 caracteres.

Question 4

Otra perlsolución

$ perl -lne 'if(!$#ARGV){ $h{$_}=1 }
             else{ s/\h\K\H+/$h{$&} ? 1 : 0/ge; print }
            ' mylist.tab data.tab
Info_1    0     1     1
Info_2    1     0
Info_3    1
Info_4    1     0     0    0    1
Info_5

if(!$#ARGV){ $h{$_}=1 }construir un hash de palabras enmylist.tab
s/\h\K\H+/$h{$&} ? 1 : 0/gepara las líneas en data.tab, reemplácelas con 1si están presentes en la variable hash, de lo contrario 0. La \h\Kbúsqueda hacia atrás es positiva para detectar la presencia de espacios en blanco, lo que evita que la primera columna coincida.
Luego imprima la línea modificada.

Answer

Otra perlsolución

$ perl -lne 'if(!$#ARGV){ $h{$_}=1 }
             else{ s/\h\K\H+/$h{$&} ? 1 : 0/ge; print }
            ' mylist.tab data.tab
Info_1    0     1     1
Info_2    1     0
Info_3    1
Info_4    1     0     0    0    1
Info_5

if(!$#ARGV){ $h{$_}=1 }construir un hash de palabras enmylist.tab
s/\h\K\H+/$h{$&} ? 1 : 0/gepara las líneas en data.tab, reemplácelas con 1si están presentes en la variable hash, de lo contrario 0. La \h\Kbúsqueda hacia atrás es positiva para detectar la presencia de espacios en blanco, lo que evita que la primera columna coincida.
Luego imprima la línea modificada.

comprobar si la cadena está presente en una lista y generar un tercer archivo si la cadena está presente

Respuesta1

Respuesta2

Respuesta3

Respuesta4

información relacionada