我有這樣的數據:
Sample_1 Apples Red
Sample_2 Apples Red
Sample_3 Apples Red
Sample_4 Apples Red
Sample_5 Apples Red
Sample_6 Apples Green
Sample_7 Apples Green
Sample_8 Apples Green
Sample_9 Apples Green
Sample_10 Apples Green
Sample_11 Apples Yellow
Sample_12 Apples Yellow
Sample_13 Apples Yellow
Sample_14 Apples Yellow
Sample_15 Apples Yellow
如何根據其他兩列形成的組的組合迭代地從第一列中提取樣本,以便獲得樣本 1-5、6-10 和 11-15?
我最終想要做的是將樣本清單(如上面的群組)作為另一個命令的輸入,例如:
comm -23 <(sort <all_samples.txt>) <(sort <[input from above]>) > <difference.txt>
我努力了:
awk '{print $2"\t"$3}' <file.txt> | uniq
為了獲得第二列和第三列的獨特組合,但我似乎無法對此做任何事情,特別是拉動第一列,這正是我所需要的。
答案1
這就是你想做的事嗎?
$ awk '{vals[$2 FS $3] = vals[$2 FS $3] OFS $1} END{for (key in vals) print key vals[key]}' file
Apples Red Sample_1 Sample_2 Sample_3 Sample_4 Sample_5
Apples Green Sample_6 Sample_7 Sample_8 Sample_9 Sample_10
Apples Yellow Sample_11 Sample_12 Sample_13 Sample_14 Sample_15
或者也許是這個?
$ awk -v fruit='Apples' -v color='Green' '($2==fruit) && ($3==color)' file
Sample_6 Apples Green
Sample_7 Apples Green
Sample_8 Apples Green
Sample_9 Apples Green
Sample_10 Apples Green
答案2
這是一個簡單的 gawk 腳本範例,它解析您的輸入並輸出似乎適合您需求的資料轉置。
#!/usr/bin/gawk -f
# Checks if type (column 2) or subtype (column 3) are
# different from previous line.
(type != $2) || (subtype != $3) {
# Prints the start of a new output line.
# The NR!=1 check avoids that a new line is
# printed on the first line.
printf("%s%s\t%s\t", (NR!=1)?"\n":"", $2, $3);
type=$2;
subtype=$3
}
{
# Prints all sample (column 1) values on the
# current output line.
printf("\"%s\" ", $1);
}
# prints a new line at the end of file.
END{
print "";
}
的輸出script.awk < input.lst
如下。script.awk
前面的腳本在哪裡,input.lst
是您的輸入範例。
Apples Red "Sample_1" "Sample_2" "Sample_3" "Sample_4" "Sample_5"
Apples Green "Sample_6" "Sample_7" "Sample_8" "Sample_9" "Sample_10"
Apples Yellow "Sample_11" "Sample_12" "Sample_13" "Sample_14" "Sample_15"
可以如下輕鬆操作腳本輸出。
script.awk < input.lst | while read TYPE SUBTYPE LIST
do
echo $TYPE
echo $SUBTYPE
for ITEM in $LIST
do
echo execute some command on $ITEM where type is $TYPE and subtype is $SUBTYPE
done
done
請注意,這個腳本非常粗糙。例如,沒有錯誤處理,也沒有檢查輸入中的空格或特殊字元。
答案3
嘗試使用下面的腳本並且工作正常
for i in "Apples"; do for j in "Red" "Green" "Yellow"; do awk -v i="$i" -v j="$j" 'BEGIN{print "Below are table contains" " " i " and " " " j}$2==i && $NF==j{print $0}' filename; done; done
輸出
Below are table contains Apples and Red
Sample_1 Apples Red
Sample_2 Apples Red
Sample_3 Apples Red
Sample_4 Apples Red
Sample_5 Apples Red
Below are table contains Apples and Green
Sample_6 Apples Green
Sample_7 Apples Green
Sample_8 Apples Green
Sample_9 Apples Green
Sample_10 Apples Green
Below are table contains Apples and Yellow
Sample_11 Apples Yellow
Sample_12 Apples Yellow
Sample_13 Apples Yellow
Sample_14 Apples Yellow
Sample_15 Apples Yellow