Grep 匹配並提取

Question 1

使用grep -o，您必須完全匹配您想要提取的內容。由於您不想提取字串proto=，因此不應匹配它。

tcp匹配或udp後跟斜杠和一些非空字母數字字串的擴展正則表達式是

(tcp|udp)/[[:alnum:]]+

將其應用於您的數據：

$ grep -E -o '(tcp|udp)/[[:alnum:]]+' file
tcp/http
tcp/https
udp/dns

為了確保我們只在以字串開頭的行上執行此操作proto=：

grep '^proto=' file | grep -E -o '(tcp|udp)/[[:alnum:]]+'

使用，刪除第一個空白字元sed之前和之後的所有內容：=

$ sed 's/^[^=]*=//; s/[[:blank:]].*//' file
tcp/http
tcp/https
udp/dns

為了確保我們只在以 string 開頭的行上執行此操作，您可以插入與上面proto=相同的預處理步驟，或者您可以使用grep

sed -n '/^proto=/{ s/^[^=]*=//; s/[[:blank:]].*//; p; }' file

在這裡，我們使用選項抑制預設輸出-n，然後僅當該行匹配時才觸發替換並明確列印該行^proto=。

對於awk，使用預設的欄位分隔符，然後拆分第一個欄位=並列印它的第二位元：

$ awk '{ split($1, a, "="); print a[2] }' file
tcp/http
tcp/https
udp/dns

為了確保我們只在以 string 開頭的行上執行此操作，您可以插入與上面proto=相同的預處理步驟，或者您可以使用grep

awk '/^proto=/ { split($1, a, "="); print a[2] }' file

Answer

使用grep -o，您必須完全匹配您想要提取的內容。由於您不想提取字串proto=，因此不應匹配它。

tcp匹配或udp後跟斜杠和一些非空字母數字字串的擴展正則表達式是

(tcp|udp)/[[:alnum:]]+

將其應用於您的數據：

$ grep -E -o '(tcp|udp)/[[:alnum:]]+' file
tcp/http
tcp/https
udp/dns

為了確保我們只在以字串開頭的行上執行此操作proto=：

grep '^proto=' file | grep -E -o '(tcp|udp)/[[:alnum:]]+'

使用，刪除第一個空白字元sed之前和之後的所有內容：=

$ sed 's/^[^=]*=//; s/[[:blank:]].*//' file
tcp/http
tcp/https
udp/dns

為了確保我們只在以 string 開頭的行上執行此操作，您可以插入與上面proto=相同的預處理步驟，或者您可以使用grep

sed -n '/^proto=/{ s/^[^=]*=//; s/[[:blank:]].*//; p; }' file

在這裡，我們使用選項抑制預設輸出-n，然後僅當該行匹配時才觸發替換並明確列印該行^proto=。

對於awk，使用預設的欄位分隔符，然後拆分第一個欄位=並列印它的第二位元：

$ awk '{ split($1, a, "="); print a[2] }' file
tcp/http
tcp/https
udp/dns

為了確保我們只在以 string 開頭的行上執行此操作，您可以插入與上面proto=相同的預處理步驟，或者您可以使用grep

awk '/^proto=/ { split($1, a, "="); print a[2] }' file

Question 2

如果您使用 GNU grep （對於-P選項），您可以使用：

$ grep -oP 'proto=\K[^ ]*' file
tcp/http
tcp/https
udp/dns

這裡我們匹配proto=字串，以確保我們提取正確的列，但隨後我們使用標誌將其從輸出中丟棄\K。

上面假設列是用空格分隔的。如果製表符也是有效的分隔符，您將使用它\S來匹配非空白字符，因此命令將是：

grep -oP 'proto=\K\S*' file

如果您還想防止匹配字段，其中proto=是子字串，例如 a thisisnotaproto=tcp/https，您可以添加單字邊界，\b如下所示：

grep -oP '\bproto=\K\S*' file

Answer

如果您使用 GNU grep （對於-P選項），您可以使用：

$ grep -oP 'proto=\K[^ ]*' file
tcp/http
tcp/https
udp/dns

這裡我們匹配proto=字串，以確保我們提取正確的列，但隨後我們使用標誌將其從輸出中丟棄\K。

上面假設列是用空格分隔的。如果製表符也是有效的分隔符，您將使用它\S來匹配非空白字符，因此命令將是：

grep -oP 'proto=\K\S*' file

如果您還想防止匹配字段，其中proto=是子字串，例如 a thisisnotaproto=tcp/https，您可以添加單字邊界，\b如下所示：

grep -oP '\bproto=\K\S*' file

Question 3

使用awk：

awk '$1 ~ "proto" { sub(/proto=/, ""); print $1 }' input

$1 ~ "proto"proto將確保我們只對第一列中的行採取行動

sub(/proto=/, "")proto=將從輸入中刪除

print $1列印剩餘的列

$ awk '$1 ~ "proto" { sub(/proto=/, ""); print $1 }' input
tcp/http
tcp/https
udp/dns

Answer

使用awk：

awk '$1 ~ "proto" { sub(/proto=/, ""); print $1 }' input

$1 ~ "proto"proto將確保我們只對第一列中的行採取行動

sub(/proto=/, "")proto=將從輸入中刪除

print $1列印剩餘的列

$ awk '$1 ~ "proto" { sub(/proto=/, ""); print $1 }' input
tcp/http
tcp/https
udp/dns

Question 4

只是另一個grep解決方案：

grep -o '[^=/]\+/[^ ]\+' file

以及類似的sed僅列印匹配的捕獲組：

sed -n 's/.*=\([^/]\+\/[^ ]\+\).*/\1/p' file

Answer

只是另一個grep解決方案：

grep -o '[^=/]\+/[^ ]\+' file

以及類似的sed僅列印匹配的捕獲組：

sed -n 's/.*=\([^/]\+\/[^ ]\+\).*/\1/p' file

相關內容