![Notepad++에서 여러 중복 줄을 공백으로 바꾸기](https://rvso.com/image/1595494/Notepad%2B%2B%EC%97%90%EC%84%9C%20%EC%97%AC%EB%9F%AC%20%EC%A4%91%EB%B3%B5%20%EC%A4%84%EC%9D%84%20%EA%B3%B5%EB%B0%B1%EC%9C%BC%EB%A1%9C%20%EB%B0%94%EA%BE%B8%EA%B8%B0.png)
다음과 같은 텍스트 파일이 있습니다.
eeeeeeee6fd6e6e7000000800010884f image_0001.png
eeeeeeee6fd6e6e7000000800010884f image_0002.png
e6eee7afef77c6c7000000808860003b image_0003.png
e6eeefa7cfe777170100000008886033 image_0004.png
e6eeefa7cfe777170100000008886033 image_0005.png
eeeecfe7afcfe7770100000030088c27 image_0006.png
efebefe7a7cfc7e70101080000300c03 image_0007.png
ef6befdf674f97c7000000900200301f image_0008.png
ef6befdf674f97c7000000900200301f image_0009.png
6d6d6faff767479700004008810000e1 image_0010.png
ed6d6dada5f767570000400098830401 image_0011.png
ed6d6dada5f767570000400098830401 image_0012.png
efed6d4da595f7a70202004000181303 image_0013.png
ebececcc2f2797f10000008051043c5b image_0014.png
e9edecce4e6e26ba120101808058042a image_0015.png
e9edecce4e6e26ba120101808058042a image_0016.png
ececeeefcf6f67a61000000080585887 image_0017.png
cc6ceeefcf4f67e710000020000149d8 image_0018.png
cc6cefefefcf6fe71000000040000001 image_0019.png
cc6cefefefcf6fe71000000040000001 image_0020.png
8ceceeefefcfcfe700000000c0000009 image_0021.png
그리고 Notepad++를 사용하여 중복 문자열(왼쪽의 해시 값) 중 하나만 빼고 모두 제거하고 다음과 같이 줄의 해당 부분을 비워두고 오른쪽에 파일 이름을 유지하고 싶습니다.
eeeeeeee6fd6e6e7000000800010884f image_0001.png
image_0002.png
e6eee7afef77c6c7000000808860003b image_0003.png
e6eeefa7cfe777170100000008886033 image_0004.png
image_0005.png
eeeecfe7afcfe7770100000030088c27 image_0006.png
efebefe7a7cfc7e70101080000300c03 image_0007.png
ef6befdf674f97c7000000900200301f image_0008.png
image_0009.png
6d6d6faff767479700004008810000e1 image_0010.png
ed6d6dada5f767570000400098830401 image_0011.png
image_0012.png
...etc.
물론 교체해야 할 다양한 문자열이 있으므로 생각만큼 쉽지는 않습니다(특히 그러한 줄이 수천 개 있는 경우). 이를 달성하는 정규식이나 다른 방법이 있습니까? 감사해요
답변1
Python을 사용하여 이를 수행하는 방법에는 여러 가지가 있습니다. 한 가지 방법은 다음과 같습니다.
# Note: Your output file must be different to your input file!
# Use absolute filepaths unless the files are in the current working directory.
input_filepath = r"C:\Users\Admin\Desktop\file hashes.txt"
output_filepath = r"C:\Users\Admin\Desktop\file hashes (processed).txt"
hashes = set() # This set keeps track of known file hashes
with open(input_filepath) as fin:
with open(output_filepath, "w") as fout:
# After opening both the input and output files,
# loop over every line in the input file.
for line in fin:
# Get the hash, which is between the start of the line and the first space.
file_hash = line[:line.find(" ")]
# Check if it is in the set of known hashes.
# If it is, write the current line without the hash to the output file.
# If it isn't, write the current line with the hash to the output file,
# and add the hash to our set of known hashes
if file_hash in hashes:
hash_len = len(file_hash)
fout.write(" " * hash_len + line[hash_len:])
else:
fout.write(line)
hashes.add(file_hash)
file hashes (processed).txt
다음과 같습니다:
eeeeeeee6fd6e6e7000000800010884f image_0001.png
image_0002.png
e6eee7afef77c6c7000000808860003b image_0003.png
e6eeefa7cfe777170100000008886033 image_0004.png
image_0005.png
eeeecfe7afcfe7770100000030088c27 image_0006.png
efebefe7a7cfc7e70101080000300c03 image_0007.png
ef6befdf674f97c7000000900200301f image_0008.png
image_0009.png
6d6d6faff767479700004008810000e1 image_0010.png
ed6d6dada5f767570000400098830401 image_0011.png
image_0012.png
efed6d4da595f7a70202004000181303 image_0013.png
ebececcc2f2797f10000008051043c5b image_0014.png
e9edecce4e6e26ba120101808058042a image_0015.png
image_0016.png
ececeeefcf6f67a61000000080585887 image_0017.png
cc6ceeefcf4f67e710000020000149d8 image_0018.png
cc6cefefefcf6fe71000000040000001 image_0019.png
image_0020.png
8ceceeefefcfcfe700000000c0000009 image_0021.png
Python이 귀하의 시스템에 어떻게 설정되어 있는지 잘 모르겠지만, 위의 코드를 와 같은 이름의 파일에 복사한 다음 remove_duplicate_hashes.py
, 해당 파일을 두 번 클릭하거나 python remove_duplicate_hashes.py
명령 프롬프트에 입력하여 실행할 수 있습니다.