如何使用 Unix 工具尋找兩個字串中標記的差異？

Question 1

GNUly：

s1='token1, token2, token3, token4, token5, token6, token8, token9, token10'
s2='token2, token7, token4, token3, token5, token6, token8, token10, token9'
comm <(grep -oE '\w+' <<< "$s1" | sort) <(grep -oE '\w+' <<< "$s2" | sort)

給出：

token1
                token10
                token2
                token3
                token4
                token5
                token6
        token7
                token8
                token9

這些列是：

令牌僅在 s1 中
令牌僅存在於 s2 中
兩者中的令牌。

您可以透過傳遞相應的選項來抑制列（例如-3抑制第三列）。

Answer

GNUly：

s1='token1, token2, token3, token4, token5, token6, token8, token9, token10'
s2='token2, token7, token4, token3, token5, token6, token8, token10, token9'
comm <(grep -oE '\w+' <<< "$s1" | sort) <(grep -oE '\w+' <<< "$s2" | sort)

給出：

token1
                token10
                token2
                token3
                token4
                token5
                token6
        token7
                token8
                token9

這些列是：

令牌僅在 s1 中
令牌僅存在於 s2 中
兩者中的令牌。

您可以透過傳遞相應的選項來抑制列（例如-3抑制第三列）。

Question 2

從 Ramesh 汲取基本思想

與 GNUawk一起bash

awk -v RS='[[:space:]]*,[[:space:]]*' '{x[$0]++}; END{for (y in x) if (x[y] == 1) print y}'  
<(printf "%s" 'token1, token2, token3, token4, token5, token6, token8, token9, token10')  
<(printf "%s" 'token2, token7, token4, token3, token5, token6, token8, token10, token9')
token1
token7

Answer

從 Ramesh 汲取基本思想

與 GNUawk一起bash

awk -v RS='[[:space:]]*,[[:space:]]*' '{x[$0]++}; END{for (y in x) if (x[y] == 1) print y}'  
<(printf "%s" 'token1, token2, token3, token4, token5, token6, token8, token9, token10')  
<(printf "%s" 'token2, token7, token4, token3, token5, token6, token8, token10, token9')
token1
token7

Question 3

你可以做如下的事情。

cat input1 input2 >> output
arr=$(cat output | tr "," "\n")
echo "${arr[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '

解釋

我將這兩個文件合併到另一個文件中，並用逗號作為分隔符號分割標記。之後，我只列印唯一的值（意味著僅出現一次的值，我相信這就是您正在尋找的值）。

輸入1文件內容

token1, token2, token3, token4, token5, token6, token8, token9, token10

輸入2文件內容

token2, token7, token4, token3, token5, token6, token8, token10, token9

執行上述腳本後，我得到的輸出為：

token1 token10 token2 token3 token4 token5 token6 token7 token8 token9

如果您觀察上面的輸出，它將只列印兩個文件中的唯一值。

但是，如果您只需要差異，則可以使用以下命令。

echo ${arr[@]} | sort | uniq -c

Answer