
我需要刪除其中包含的所有程式碼
<li class="share"> ... </li>
包括<li>
標籤本身。
<li>
標籤內還有多個其他標籤li class="share"
,所以我不太確定如何處理這個問題。我正在使用記事本++。
答案1
好吧,下面是一個快速拼湊的程式碼,似乎可以正常工作簡單的數據範例。用它做你想做的事。
是的,這不是你可以在記事本中使用簡單的查找和替換來完成的事情,即使N++ 中的F&R 中可用的正則表達式也不能真正讓你有能力做到這一點......或在正則表達式中這是可能的- 這遠遠超出了我的水平。 ;)
import sys
import re
def get_tag():
buffer = ""
while True:
c = sys.stdin.read(1)
if not c:
sys.stderr.write("Unexpected EOF\n")
break
buffer += c
if c == '"' or c == "'":
buffer += get_string(c)
if c == '>':
break
return buffer
def get_string(quote = '"'):
buffer = ""
while True:
c = sys.stdin.read(1)
if not c:
sys.stderr.write("Unexpected EOF\n")
break
buffer += c
if c == quote and buffer[-2] != '\\':
break
return buffer
buffer = ""
skip_depth = 0
ul_begin = re.compile(r"<\s*li(?:>|\s+.*>)", re.IGNORECASE | re.DOTALL)
ul_begin_share = re.compile(r"<\s*li\s+.*class\s*=\s*([\"'])(?:[^\1]*?\s+)?share(?:\s+[^\1]*?)?(\1).*?>", re.IGNORECASE | re.DOTALL)
ul_end = re.compile(r"</\s*li\s*>", re.IGNORECASE)
while True:
if skip_depth < 0:
skip_depth = 0
c = sys.stdin.read(1)
if not c:
#sys.stderr.write("EOF\n")
break
if c == '<':
buffer = c + get_tag()
if skip_depth > 0 and ul_begin.match(buffer):
skip_depth += 1
elif ul_begin_share.match(buffer):
skip_depth += 1
elif ul_end.match(buffer):
skip_depth -= 1
if skip_depth == 0:
continue
c = buffer
if skip_depth > 0:
pass
else:
sys.stdout.write(c)
data.html中的測試資料:
<ul>
<li>do not touch that</li>
<li id="whatever1">or that</li>
<li class="share">delete this</li>
<li class="foo-bar share">delete this</li>
<li class="foobar share foo-bar_">delete this</li>
<li class='share'>delete this</li>
<li class='"wtf" share'>delete this</li>
<li class=" share ">delete this</li>
<li class=" share ">delete this</li>
<li class="foo share">delete this</li>
<li class="share bar">delete this</li>
<li class="foo share bar">delete this</li>
<li class="long foo share short bar">delete this</li>
<li class=" share ">delete this</li>
<li class=" foo share bar ">delete this</li>
<!-- but leave <li class="share">this comment</li> alone -->
<li>This will stay</li>
<li class="share">
<li>delete this</li>
<li>delete this</li>
</li>
<li style="not !important" class="share">delete this</li>
<li>leave this, but
<li class="share">
<li>delete this</li>
<li>delete this</li>
<li>delete this</li>
<li>delete this</li>
</li>
</li>
<li class=" foo share bar ">delete this</li>
<li class="shared">Can't touch this, naaaa-nanana...</li>
</ul>
<em>blablabla</em>
運行範例:
$ python test.py < data.html > data.corrected.html
$ cat data.corrected.html
<ul>
<li>do not touch that</li>
<li id="whatever1">or that</li>
<!-- but leave <li class="share">this comment</li> alone -->
<li>This will stay</li>
<li>leave this, but
</li>
<li class="shared">Can't touch this, naaaa-nanana...</li>
</ul>
<em>blablabla</em>