![Substitua várias linhas duplicadas por espaços no Notepad++](https://rvso.com/image/1595494/Substitua%20v%C3%A1rias%20linhas%20duplicadas%20por%20espa%C3%A7os%20no%20Notepad%2B%2B.png)
Eu tenho um arquivo de texto como este:
eeeeeeee6fd6e6e7000000800010884f image_0001.png
eeeeeeee6fd6e6e7000000800010884f image_0002.png
e6eee7afef77c6c7000000808860003b image_0003.png
e6eeefa7cfe777170100000008886033 image_0004.png
e6eeefa7cfe777170100000008886033 image_0005.png
eeeecfe7afcfe7770100000030088c27 image_0006.png
efebefe7a7cfc7e70101080000300c03 image_0007.png
ef6befdf674f97c7000000900200301f image_0008.png
ef6befdf674f97c7000000900200301f image_0009.png
6d6d6faff767479700004008810000e1 image_0010.png
ed6d6dada5f767570000400098830401 image_0011.png
ed6d6dada5f767570000400098830401 image_0012.png
efed6d4da595f7a70202004000181303 image_0013.png
ebececcc2f2797f10000008051043c5b image_0014.png
e9edecce4e6e26ba120101808058042a image_0015.png
e9edecce4e6e26ba120101808058042a image_0016.png
ececeeefcf6f67a61000000080585887 image_0017.png
cc6ceeefcf4f67e710000020000149d8 image_0018.png
cc6cefefefcf6fe71000000040000001 image_0019.png
cc6cefefefcf6fe71000000040000001 image_0020.png
8ceceeefefcfcfe700000000c0000009 image_0021.png
e eu gostaria de usar o Notepad ++ para me livrar de todas as strings duplicadas, exceto uma (valores hash à esquerda) e deixar essa parte da linha em branco, mantendo os nomes dos arquivos no lado direito, assim:
eeeeeeee6fd6e6e7000000800010884f image_0001.png
image_0002.png
e6eee7afef77c6c7000000808860003b image_0003.png
e6eeefa7cfe777170100000008886033 image_0004.png
image_0005.png
eeeecfe7afcfe7770100000030088c27 image_0006.png
efebefe7a7cfc7e70101080000300c03 image_0007.png
ef6befdf674f97c7000000900200301f image_0008.png
image_0009.png
6d6d6faff767479700004008810000e1 image_0010.png
ed6d6dada5f767570000400098830401 image_0011.png
image_0012.png
...etc.
É claro que existem muitas strings diferentes que precisam ser substituídas, então não é tão fácil quanto se poderia pensar (especialmente para milhares dessas linhas). Existe um regex ou alguma outra maneira de conseguir isso? Obrigado
Responder1
Existem muitas maneiras de fazer isso com Python. Aqui está uma maneira:
# Note: Your output file must be different to your input file!
# Use absolute filepaths unless the files are in the current working directory.
input_filepath = r"C:\Users\Admin\Desktop\file hashes.txt"
output_filepath = r"C:\Users\Admin\Desktop\file hashes (processed).txt"
hashes = set() # This set keeps track of known file hashes
with open(input_filepath) as fin:
with open(output_filepath, "w") as fout:
# After opening both the input and output files,
# loop over every line in the input file.
for line in fin:
# Get the hash, which is between the start of the line and the first space.
file_hash = line[:line.find(" ")]
# Check if it is in the set of known hashes.
# If it is, write the current line without the hash to the output file.
# If it isn't, write the current line with the hash to the output file,
# and add the hash to our set of known hashes
if file_hash in hashes:
hash_len = len(file_hash)
fout.write(" " * hash_len + line[hash_len:])
else:
fout.write(line)
hashes.add(file_hash)
file hashes (processed).txt
parece:
eeeeeeee6fd6e6e7000000800010884f image_0001.png
image_0002.png
e6eee7afef77c6c7000000808860003b image_0003.png
e6eeefa7cfe777170100000008886033 image_0004.png
image_0005.png
eeeecfe7afcfe7770100000030088c27 image_0006.png
efebefe7a7cfc7e70101080000300c03 image_0007.png
ef6befdf674f97c7000000900200301f image_0008.png
image_0009.png
6d6d6faff767479700004008810000e1 image_0010.png
ed6d6dada5f767570000400098830401 image_0011.png
image_0012.png
efed6d4da595f7a70202004000181303 image_0013.png
ebececcc2f2797f10000008051043c5b image_0014.png
e9edecce4e6e26ba120101808058042a image_0015.png
image_0016.png
ececeeefcf6f67a61000000080585887 image_0017.png
cc6ceeefcf4f67e710000020000149d8 image_0018.png
cc6cefefefcf6fe71000000040000001 image_0019.png
image_0020.png
8ceceeefefcfcfe700000000c0000009 image_0021.png
Não tenho certeza de como o Python está configurado em seu sistema, mas você poderá executá-lo copiando o código acima em um arquivo chamado algo como remove_duplicate_hashes.py
e executando-o clicando duas vezes nele ou digitando python remove_duplicate_hashes.py
no prompt de comando.