Encontrar las diferentes combinaciones posibles

Question 1

Usando awk:

function wprint() {
    print w[1], w[2], w[3];
}

function wshift(e) {
    w[1] = w[2]; w[2] = w[3]; w[3] = e;
}

BEGIN { FS = OFS = "," }

{
    wshift($1);
    wshift($2);
    wshift($3);
    wprint();

    for (i = 4; i <= NF; ++i) {
        wshift($i);
        wprint();
    }
}

Entonces:

$ awk -f script data.in
A,B,C
B,C,D
C,D,E
P,Q,R
G,D,V
D,V,K
L,Q,X
Q,X,I
X,I,U
I,U,G

El awkscript utiliza una ventana móvil de tres elementos w. Para cada fila de entrada, completa los tres elementos de la ventana con los tres primeros campos y los imprime como una lista separada por comas (seguida de una nueva línea). Luego itera sobre los campos restantes en la línea, desplazándolos a la ventana e imprimiendo la ventana para cada elemento.

Si alguna línea en los datos de entrada contiene menos de dos campos, obtendrá cosas como

A,,

o

A,B,

en la salida.

Si está seguro de que cada línea de entrada tiene al menos tres campos (o si desea ignorar cualquier línea que no los tenga), puede acortar awkligeramente el script:

function wprint() {
    print w[1], w[2], w[3];
}

function wshift(e) {
    w[1] = w[2]; w[2] = w[3]; w[3] = e;
}

BEGIN { FS = OFS = "," }

{
    for (i = 1; i <= NF; ++i) {
        wshift($i);
        if (i >= 3) {
            wprint();
        }
    }
}

Una generalización de la primera variación del script con un tamaño de ventana variable:

function wprint(i) {
    for (i = 1; i < n; ++i) {
        printf("%s%s", w[i], OFS);
    }
    print w[n]
}

function wshift(e,i) {
    for (i = 1; i < n; ++i) {
        w[i] = w[i + 1];
    }
    w[n] = e;
}

BEGIN { FS = OFS = "," }

{
    for (i = 1; i <= n; ++i) {
        wshift($i);
    }
    wprint();

    for (i = n + 1; i <= NF; ++i) {
        wshift($i);
        wprint();
    }
}

Usándolo:

$ awk -v n=4 -f script data.in
A,B,C,D
B,C,D,E
P,Q,R,
G,D,V,K
L,Q,X,I
Q,X,I,U
X,I,U,G

Answer

Usando awk:

function wprint() {
    print w[1], w[2], w[3];
}

function wshift(e) {
    w[1] = w[2]; w[2] = w[3]; w[3] = e;
}

BEGIN { FS = OFS = "," }

{
    wshift($1);
    wshift($2);
    wshift($3);
    wprint();

    for (i = 4; i <= NF; ++i) {
        wshift($i);
        wprint();
    }
}

Entonces:

$ awk -f script data.in
A,B,C
B,C,D
C,D,E
P,Q,R
G,D,V
D,V,K
L,Q,X
Q,X,I
X,I,U
I,U,G

El awkscript utiliza una ventana móvil de tres elementos w. Para cada fila de entrada, completa los tres elementos de la ventana con los tres primeros campos y los imprime como una lista separada por comas (seguida de una nueva línea). Luego itera sobre los campos restantes en la línea, desplazándolos a la ventana e imprimiendo la ventana para cada elemento.

Si alguna línea en los datos de entrada contiene menos de dos campos, obtendrá cosas como

A,,

o

A,B,

en la salida.

Si está seguro de que cada línea de entrada tiene al menos tres campos (o si desea ignorar cualquier línea que no los tenga), puede acortar awkligeramente el script:

function wprint() {
    print w[1], w[2], w[3];
}

function wshift(e) {
    w[1] = w[2]; w[2] = w[3]; w[3] = e;
}

BEGIN { FS = OFS = "," }

{
    for (i = 1; i <= NF; ++i) {
        wshift($i);
        if (i >= 3) {
            wprint();
        }
    }
}

Una generalización de la primera variación del script con un tamaño de ventana variable:

function wprint(i) {
    for (i = 1; i < n; ++i) {
        printf("%s%s", w[i], OFS);
    }
    print w[n]
}

function wshift(e,i) {
    for (i = 1; i < n; ++i) {
        w[i] = w[i + 1];
    }
    w[n] = e;
}

BEGIN { FS = OFS = "," }

{
    for (i = 1; i <= n; ++i) {
        wshift($i);
    }
    wprint();

    for (i = n + 1; i <= NF; ++i) {
        wshift($i);
        wprint();
    }
}

Usándolo:

$ awk -v n=4 -f script data.in
A,B,C,D
B,C,D,E
P,Q,R,
G,D,V,K
L,Q,X,I
Q,X,I,U
X,I,U,G

Question 2

Con perl:

perl -F, -le 'BEGIN { $, = "," } while(@F >= 3) { print @F[0..2]; shift @F }' file

Con awk:

awk -F, -v OFS=, 'NF>=3 { for(i=1; i<=NF-2; i++) print $i, $(i+1), $(i+2) }' file

Answer

Con perl:

perl -F, -le 'BEGIN { $, = "," } while(@F >= 3) { print @F[0..2]; shift @F }' file

Con awk:

awk -F, -v OFS=, 'NF>=3 { for(i=1; i<=NF-2; i++) print $i, $(i+1), $(i+2) }' file

Question 3

Usando Perl podemos abordarlo como:

perl -lne '/(?:([^,]+)(?=((?:,[^,]+){2}))(?{ print $1,$2 }))*$/' yourfile
perl -F, -lne '$,=","; print shift @F, @F[0..1] while @F >= 3' 
perl -F, -lne '$,=","; print splice @F, 0, 3, @F[1,2] while @F >= 3'

que se puede escribir de forma ampliada como se muestra a continuación:

perl -lne '
   m/
      (?:                       # set up a do-while loop
         ([^,]+)                # first field which shall be deleted after printing
         (?=((?:,[^,]+){2}))    # lookahead and remember the next 2 fields
         (?{ print $1,$2 })     # print the first field + next 2 fields
      )*                        # loop back for more
      $                         # till we hit the end of line
   /x;
' yourfile

Y con sed podemos hacerlo con una variedad de sus comandos:

sed -e '
   /,$/!s/$/,/     # add a dummy comma at the EOL

   s/,/\n&/3;ta    # while there still are 3 elements in the line jump to label "a"
   d               # else quit processing this line any further

   :a              # main action
   P               # print the leading portion, i.e., that which is left of the first newline in the pattern space
   s/\n//          # take away the marker

   s/,/\n/;tb      # get ready to delete the first field
   :b

   D               # delete the first field, and apply the sed code all over from the beginning to what remains in the pattern space
' yourfile

DC también puede hacer esto:

sed -e 's/[^,]*/[&]/g;y/,/ /' gene_data.in |
dc -e '
[q]sq                            # macro for quitting
[SM z0<a]sa                      # macro to store stack -> register "M"
[LMd SS zlk>b c]sb               # macro to put register "M" -> register "S"
[LS zlk>c]sc                     # macro to put register "S" -> stack
[n44an dn44an rdn10anr z3!>d]sd  # macro to print 1st three stack elements
[zsk lax lbx lcx ldx c]se        # macro that initializes & calls all other macros
[?z3>q lex z0=?]s?               # while loop to read in file line by line and run macro "e" on each line
l?x                              # main()
'

Resultados

A,B,C
B,C,D
C,D,E
D,E,F
E,F,G
P,Q,R
G,D,V
D,V,K
L,Q,X
Q,X,I
X,I,U
I,U,G

Answer

Usando Perl podemos abordarlo como:

perl -lne '/(?:([^,]+)(?=((?:,[^,]+){2}))(?{ print $1,$2 }))*$/' yourfile
perl -F, -lne '$,=","; print shift @F, @F[0..1] while @F >= 3' 
perl -F, -lne '$,=","; print splice @F, 0, 3, @F[1,2] while @F >= 3'

que se puede escribir de forma ampliada como se muestra a continuación:

perl -lne '
   m/
      (?:                       # set up a do-while loop
         ([^,]+)                # first field which shall be deleted after printing
         (?=((?:,[^,]+){2}))    # lookahead and remember the next 2 fields
         (?{ print $1,$2 })     # print the first field + next 2 fields
      )*                        # loop back for more
      $                         # till we hit the end of line
   /x;
' yourfile

Y con sed podemos hacerlo con una variedad de sus comandos:

sed -e '
   /,$/!s/$/,/     # add a dummy comma at the EOL

   s/,/\n&/3;ta    # while there still are 3 elements in the line jump to label "a"
   d               # else quit processing this line any further

   :a              # main action
   P               # print the leading portion, i.e., that which is left of the first newline in the pattern space
   s/\n//          # take away the marker

   s/,/\n/;tb      # get ready to delete the first field
   :b

   D               # delete the first field, and apply the sed code all over from the beginning to what remains in the pattern space
' yourfile

DC también puede hacer esto:

sed -e 's/[^,]*/[&]/g;y/,/ /' gene_data.in |
dc -e '
[q]sq                            # macro for quitting
[SM z0<a]sa                      # macro to store stack -> register "M"
[LMd SS zlk>b c]sb               # macro to put register "M" -> register "S"
[LS zlk>c]sc                     # macro to put register "S" -> stack
[n44an dn44an rdn10anr z3!>d]sd  # macro to print 1st three stack elements
[zsk lax lbx lcx ldx c]se        # macro that initializes & calls all other macros
[?z3>q lex z0=?]s?               # while loop to read in file line by line and run macro "e" on each line
l?x                              # main()
'

Resultados

A,B,C
B,C,D
C,D,E
D,E,F
E,F,G
P,Q,R
G,D,V
D,V,K
L,Q,X
Q,X,I
X,I,U
I,U,G

Encontrar las diferentes combinaciones posibles

Respuesta1

Respuesta2

Respuesta3

Resultados

información relacionada