Как извлечь группы из столбца на основе уникальной комбинации двух других столбцов

Question 1

Это то, что вы пытаетесь сделать?

$ awk '{vals[$2 FS $3] = vals[$2 FS $3] OFS $1} END{for (key in vals) print key vals[key]}' file
Apples Red Sample_1 Sample_2 Sample_3 Sample_4 Sample_5
Apples Green Sample_6 Sample_7 Sample_8 Sample_9 Sample_10
Apples Yellow Sample_11 Sample_12 Sample_13 Sample_14 Sample_15

или может быть это?

$ awk -v fruit='Apples' -v color='Green' '($2==fruit) && ($3==color)' file
Sample_6    Apples  Green
Sample_7    Apples  Green
Sample_8    Apples  Green
Sample_9    Apples  Green
Sample_10   Apples  Green

Answer

Это то, что вы пытаетесь сделать?

$ awk '{vals[$2 FS $3] = vals[$2 FS $3] OFS $1} END{for (key in vals) print key vals[key]}' file
Apples Red Sample_1 Sample_2 Sample_3 Sample_4 Sample_5
Apples Green Sample_6 Sample_7 Sample_8 Sample_9 Sample_10
Apples Yellow Sample_11 Sample_12 Sample_13 Sample_14 Sample_15

или может быть это?

$ awk -v fruit='Apples' -v color='Green' '($2==fruit) && ($3==color)' file
Sample_6    Apples  Green
Sample_7    Apples  Green
Sample_8    Apples  Green
Sample_9    Apples  Green
Sample_10   Apples  Green

Question 2

Это простой пример скрипта gawk, который анализирует ваши входные данные и выводит транспозицию данных, которая кажется вам подходящей.

#!/usr/bin/gawk -f

# Checks if type (column 2) or subtype (column 3) are 
# different from previous line.
(type != $2) || (subtype != $3) {
    # Prints the start of a new output line.
    # The NR!=1 check avoids that a new line is 
    # printed on the first line.
    printf("%s%s\t%s\t", (NR!=1)?"\n":"", $2, $3);
    type=$2;
    subtype=$3
}
{
    # Prints all sample (column 1) values on the 
    # current output line.
    printf("\"%s\" ", $1);
}
# prints a new line at the end of file.
END{
    print "";
}

Вывод script.awk < input.lstследующий. Где script.awkпредыдущий скрипт, а input.lstгде ваш пример ввода.

Apples  Red     "Sample_1" "Sample_2" "Sample_3" "Sample_4" "Sample_5" 
Apples  Green   "Sample_6" "Sample_7" "Sample_8" "Sample_9" "Sample_10" 
Apples  Yellow  "Sample_11" "Sample_12" "Sample_13" "Sample_14" "Sample_15"

Вывод скрипта можно легко изменить следующим образом.

script.awk < input.lst | while read TYPE SUBTYPE LIST
do 
    echo $TYPE
    echo $SUBTYPE
    for ITEM in $LIST
    do  
        echo execute some command on $ITEM where type is $TYPE and subtype is $SUBTYPE
    done 
done

Обратите внимание, что этот скрипт очень грубый. Например, в нем нет обработки ошибок и проверки на наличие пробелов или специальных символов во входных данных.

Answer

Это простой пример скрипта gawk, который анализирует ваши входные данные и выводит транспозицию данных, которая кажется вам подходящей.

#!/usr/bin/gawk -f

# Checks if type (column 2) or subtype (column 3) are 
# different from previous line.
(type != $2) || (subtype != $3) {
    # Prints the start of a new output line.
    # The NR!=1 check avoids that a new line is 
    # printed on the first line.
    printf("%s%s\t%s\t", (NR!=1)?"\n":"", $2, $3);
    type=$2;
    subtype=$3
}
{
    # Prints all sample (column 1) values on the 
    # current output line.
    printf("\"%s\" ", $1);
}
# prints a new line at the end of file.
END{
    print "";
}

Вывод script.awk < input.lstследующий. Где script.awkпредыдущий скрипт, а input.lstгде ваш пример ввода.

Apples  Red     "Sample_1" "Sample_2" "Sample_3" "Sample_4" "Sample_5" 
Apples  Green   "Sample_6" "Sample_7" "Sample_8" "Sample_9" "Sample_10" 
Apples  Yellow  "Sample_11" "Sample_12" "Sample_13" "Sample_14" "Sample_15"

Вывод скрипта можно легко изменить следующим образом.

script.awk < input.lst | while read TYPE SUBTYPE LIST
do 
    echo $TYPE
    echo $SUBTYPE
    for ITEM in $LIST
    do  
        echo execute some command on $ITEM where type is $TYPE and subtype is $SUBTYPE
    done 
done

Обратите внимание, что этот скрипт очень грубый. Например, в нем нет обработки ошибок и проверки на наличие пробелов или специальных символов во входных данных.

Question 3

Попробовал с помощью скрипта ниже и все прошло отлично

for i in "Apples"; do for j in "Red" "Green" "Yellow"; do awk -v i="$i" -v j="$j" 'BEGIN{print "Below are table contains" " " i " and " " " j}$2==i && $NF==j{print $0}' filename; done; done

выход

Below are table contains Apples and  Red
Sample_1    Apples  Red
Sample_2    Apples  Red
Sample_3    Apples  Red
Sample_4    Apples  Red
Sample_5    Apples  Red
Below are table contains Apples and  Green
Sample_6    Apples  Green
Sample_7    Apples  Green
Sample_8    Apples  Green
Sample_9    Apples  Green
Sample_10   Apples  Green
Below are table contains Apples and  Yellow
Sample_11   Apples  Yellow
Sample_12   Apples  Yellow
Sample_13   Apples  Yellow
Sample_14   Apples  Yellow
Sample_15   Apples  Yellow

Answer

Попробовал с помощью скрипта ниже и все прошло отлично

for i in "Apples"; do for j in "Red" "Green" "Yellow"; do awk -v i="$i" -v j="$j" 'BEGIN{print "Below are table contains" " " i " and " " " j}$2==i && $NF==j{print $0}' filename; done; done

выход

Below are table contains Apples and  Red
Sample_1    Apples  Red
Sample_2    Apples  Red
Sample_3    Apples  Red
Sample_4    Apples  Red
Sample_5    Apples  Red
Below are table contains Apples and  Green
Sample_6    Apples  Green
Sample_7    Apples  Green
Sample_8    Apples  Green
Sample_9    Apples  Green
Sample_10   Apples  Green
Below are table contains Apples and  Yellow
Sample_11   Apples  Yellow
Sample_12   Apples  Yellow
Sample_13   Apples  Yellow
Sample_14   Apples  Yellow
Sample_15   Apples  Yellow

Как извлечь группы из столбца на основе уникальной комбинации двух других столбцов

решение1

решение2

решение3

Связанный контент