我在文本文件中有一個文本,我想要其中字串之間的所有內容,例如\{{[}
和{]}\}
刪除,包括這些字串本身。這兩個字串能位於不同的線上以及位於同一條線上。在任一情況下, 在開頭所在的行\{{[}
,我不希望刪除它之前的文本,即左側的文本 - 對於 後的文本也是如此{]}\}
。
這是一個範例:給定一個包含內容的文字文件
Bla Bla bla bla \{{[} more bla bla
even more bla bla bla bla.
A lot of stuff might be here.
Bla bla {]}\} finally done.
Nonetheless, the \{{[} show {]}\} goes on.
該腳本應傳回另一個包含內容的文字文件
Bla Bla bla bla finally done.
Nonetheless, the goes on.
不幸的是,這個看似簡單的任務對我來說太難了sed
。我很高興任何任何語言的解決方案,只要我不需要在我的標準 Linux 機器上安裝任何東西(C 和一些 java 已經安裝)。
答案1
和perl
:
perl -0777 -pe 's/\Q\{{[}\E.*?\Q{]}\}\E//gs'
請注意,整個輸入在處理之前已載入到記憶體中。
\Qsomething\E
被something
視為文字字串而不是正規表示式。
若要就地修改常規文件,請新增以下-i
選項:
perl -0777 -i -pe 's/\Q\{{[}\E.*?\Q{]}\}\E//gs' file.txt
使用 GNUawk
或mawk
:
awk -v 'RS=\\\\\\{\\{\\[}|\\{\\]}\\\\}' -v ORS= NR%2
在那裡,我們定義了記錄分隔符作為這些開始或結束標記之一(僅gawk
並mawk
支援RS
此處為正規表示式)。但是我們需要再次轉義正則表達式運算符(反斜杠{
,, )的字元[
以及反斜杠,因為它在-v
(用於諸如\n
,\b
...)的參數中很特殊,因此有大量的反斜杠。
然後我們需要做的就是列印所有其他記錄。對於每個奇數記錄NR%2
都是(true)。1
對於這兩種解決方案,我們假設標記是匹配的並且這些部分沒有嵌套。
若要使用最新版本的 GNU 就地修改文件,awk
請新增-i /usr/share/awk/inplace.awk
1 個選項。
^不使用-i inplace
as嘗試先從目前工作目錄gawk
載入inplace
擴充功能(asinplace
或),有人可能已經在其中植入了惡意軟體。隨系統提供的擴展inplace.awk
的路徑可能會有所不同,請參閱輸出inplace
gawk
gawk 'BEGIN{print ENVIRON["AWKPATH"]}'
答案2
sed -e:t -e'y/\n/ /;/\\{{\[}/!b' \
-e:N -e'/\\{{\[.*{\]}\\}/!N' \
-e's/\(\\{{\[}\).*\n/\1/;tN' \
-e'y/ /\n/;s/\\{{\[}/& /;ts' \
-e:s -e's/\(\[} [^ ]*\)\({\]}\\}\)/\1 \2/' \
-ets -e's/..... [^ ]* .....//;s/ //g;bt' \
<<""
#Bla Bla {]}\} bla bla \{{[} more bla bla
#even more bla bla bla bla. \{{[}
#
#A lot of stuff might be here.
#hashes are for stupid syntax color only
#Bla bla {]}\} finally {]}\} done.
#
#Nonetheless, the \{{[} show {]}\} goes \{{[} show {]}\} on.
#Bla Bla {]}\} bla bla finally {]}\} done.
#
#Nonetheless, the goes on.
不過,這裡有一個更好的方法。替換次數少很多,而且每次替換的都是幾個角色,而不是.*
一直替換。實際上,唯一.*
使用的時間是當第一個出現的開始與第一個後續結束肯定配對時清除中間空間的模式空間。其餘時間sed
只需D
刪除盡可能多的內容即可到達下一個出現的分隔符號。唐教我的。
sed -etD -e:t -e'/\\{{\[}/!b' \
-e's//\n /;h;D' -e:D \
-e'/^}/{H;x;s/\n.*\n.//;}' \
-ett -e's/{\]}\\}/\n}/' \
-e'/\n/!{$!N;s//& /;}' -eD \
<<""
#Bla Bla {]}\} bla bla \{{[} more bla bla
#even more bla bla bla bla. \{{[}
#
#A lot of stuff might be here.
#hashes are for stupid syntax color only
#Bla bla {]}\} finally {]}\} done.
#
#Nonetheless, the \{{[} show {]}\} goes \{{[} show {]}\} on.
#Bla Bla {]}\} bla bla finally {]}\} done.
#
#Nonetheless, the goes on.
不過, RHS\n
換行符號可能需要替換為文字反斜線轉義換行符。
這是一個更通用的版本:
#!/usr/bin/sed -f
####replace everything between START and END
#branch to :Kil if a successful substitution
#has already occurred. this can only happen
#if pattern space has been Deleted earlier
t Kil
#set a Ret :label so we can come back here
#when we've cleared a START -> END occurrence
#and check for another if need be
:Ret
#if no START, don't
/START/!b
#sigh. there is one. get to work. replace it
#with a newline followed by an S and save
#a copy then Delete up to our S marker.
s||\
S|
h;D
#set the :Kil label. we'll come back here from now
#on until we've definitely got END at the head of
#pattern space.
:Kil
#do we?
/^E/{
#if so, we'll append it to our earlier save
#and slice out everything between the two newlines
#we've managed to insert at just the right points
H;x
s|\nS.*\nE||
}
#if we did just clear START -> END we should
#branch back to :Ret and look for another START
t Ret
#pattern space didnt start w/ END, but is there even
#one at all? if so replace it w/ a newline followed
#by an E so we'll recognize it at the next :Kil
s|END|\
E|
#if that last was successful we'll have a newline
#but if not it means we need to get the next line
#if the last line we've got unmatched pairs and are
#currently in a delete cycle anyway, but maybe we
#should print up to our START marker in that case?
/\n/!{
#i guess so. now that i'm thinking about it
#we'll swap into hold space, and Print it
${ x;P;d
}
#get next input line and add S after the delimiting
#newline because we're still in START state. Delete
#will handle everything up to our marker before we
#branch back to :Kil at the top of the script
N
s||&S|
}
#now Delete will slice everything from head of pattern space
#to the first occurring newline and loop back to top of script.
#because we've definitely made successful substitutions if we
#have a newline at all we'll test true and branch to :Kil
#to go again until we've definitely got ^E
D
....沒有評論...
#!/usr/bin/sed -f
t Kil
:Ret
/START/!b
s||\
S|
h;D
:Kil
/^E/{
H;x
s|\nS.*\nE||
}
t Ret
s|END|\
E|
/\n/!{
${ x;P;d
}
N
s||&S|
}
D
我將註釋版本複製到剪貼簿並執行以下操作:
{ xsel; echo; } >se.sed
chmod +x se.sed
./se.sed <se.sed
#!/usr/bin/sed -f
####replace everything between
#branch to :Kil if a successful substitution
#has already occurred. this can only happen
#if pattern space has been Deleted earlier
t Kil
#set a Ret :label so we can come back here
#when we've cleared a occurrence
#and check for another if need be
:Ret
#if no at the head of
#pattern space.
:Kil
#do we?
/^E/{
#if so, we'll append it to our earlier save
#and slice out everything between the two newlines
#we've managed to insert at just the right points
H;x
s|\nS.*\nE||
}
#if we did just clear we should
#branch back to :Ret and look for another , but is there even
#one at all? if so replace it w/ a newline followed
#by an E so we'll recognize it at the next :Kil
s|END|\
E|
#if that last was successful we'll have a newline
#but if not it means we need to get the next line
#if the last line we've got unmatched pairs and are
#currently in a delete cycle anyway, but maybe we
#should print up to our
答案3
如果您的檔案是 test.txt 您可以使用:
sed ':a;N;$!ba;s/\n/ /g' test.txt|sed 's/\\{{\[}.*{\]}\\}//'
第一個 sed 刪除所有換行符,第二個刪除標籤內的文字。
我不知道您是否需要更通用的解決方案