刪除內嵌括號之間的重複單字

刪除內嵌括號之間的重複單字

我們的輸入看起來像

2012-04-17  [GBPGBP]
2012-04-13  [GBP GBP]
2012-04-13  [GBP]
2012-04-11  [GBPGBP]
2012-04-11  [GBP GBP]
2012-04-10  [GBPGBP]
2012-04-06  [GBP GBP GBP]
2012-04-17  [GBPGBP]
2012-04-13  [GBP CDN]
2012-04-13  [GBP]
2012-04-11  [GBPCDN]
2012-04-11  [GBP DL DL]
2012-04-10  [PSGBP]
2012-04-06  [PS PS]

我們希望得到像這樣的輸出

2012-04-17  [GBP]
2012-04-13  [GBP]
2012-04-13  [GBP]
2012-04-11  [GBP]
2012-04-11  [GBP]
2012-04-10  [GBP]
2012-04-06  [GBP]
2012-04-17  [GBP]
2012-04-13  [GBP CDN]
2012-04-13  [GBP]
2012-04-11  [GBPCDN]
2012-04-11  [GBP DL]
2012-04-10  [PSGBP]
2012-04-06  [PS]

基本上刪除括號內任何重複的字串。有什麼建議麼?

答案1

sed -e ': a' -e 's/\(\[[^][]*\)\([A-Z][A-Z][A-Z]*\)\([^][]*\)\2/\1\2\3/' -e 't a'
  • : a在腳本的開頭設定一個標籤。
  • s/\(wibble\)\(foo\)\(bar\)\2/\1\2\3/將 wibblefoobarfoo 替換為 wibblefoobar。
  • [A-Z][A-Z][A-Z]*匹配兩個或多個字母
  • t aa如果前一個s命令進行了替換,則循環回到標籤。

相關內容