使用腳本查找並刪除 osx 中的重複文件

Question 1

首先，您必須重新排序第一個命令列，以便保持 find 命令找到的檔案的順序：

find . -size 20 ! -type d -exec cksum {} \; | tee /tmp/f.tmp | cut -f 1,2 -d ‘ ‘ | sort | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt

（註：出於測試目的，我使用了我的機器find . -type f -exec cksum {} \;）

其次，列印除第一個副本之外的所有副本的一種方法是使用輔助文件，比方說/tmp/f2.tmp.然後我們可以做類似的事情：

while read line; do
    checksum=$(echo "$line" | cut -f 1,2 -d' ')
    file=$(echo "$line" | cut -f 3 -d' ')

    if grep "$checksum" /tmp/f2.tmp > /dev/null; then
        # /tmp/f2.tmp already contains the checksum
        # print the file name
        # (printf is safer than echo, when for example "$file" starts with "-")
        printf %s\\n "$file"
    else
        echo "$checksum" >> /tmp/f2.tmp
    fi
done < duplicates.txt

只需確保/tmp/f2.tmp在運行之前存在且為空，例如透過以下命令：

rm /tmp/f2.tmp
touch /tmp/f2.tmp

希望有幫助 =)

Answer

首先，您必須重新排序第一個命令列，以便保持 find 命令找到的檔案的順序：

find . -size 20 ! -type d -exec cksum {} \; | tee /tmp/f.tmp | cut -f 1,2 -d ‘ ‘ | sort | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt

（註：出於測試目的，我使用了我的機器find . -type f -exec cksum {} \;）

其次，列印除第一個副本之外的所有副本的一種方法是使用輔助文件，比方說/tmp/f2.tmp.然後我們可以做類似的事情：

while read line; do
    checksum=$(echo "$line" | cut -f 1,2 -d' ')
    file=$(echo "$line" | cut -f 3 -d' ')

    if grep "$checksum" /tmp/f2.tmp > /dev/null; then
        # /tmp/f2.tmp already contains the checksum
        # print the file name
        # (printf is safer than echo, when for example "$file" starts with "-")
        printf %s\\n "$file"
    else
        echo "$checksum" >> /tmp/f2.tmp
    fi
done < duplicates.txt

只需確保/tmp/f2.tmp在運行之前存在且為空，例如透過以下命令：

rm /tmp/f2.tmp
touch /tmp/f2.tmp

希望有幫助 =)

Question 2

另一個選擇是使用 fdupes：

brew install fdupes
fdupes -r .

fdupes -r .遞歸查找目前目錄下的重複檔案。新增-d以刪除重複項 - 系統會提示您要保留哪些檔案；如果您添加-dN，fdupes 將始終保留第一個檔案並刪除其他檔案。

Answer

另一個選擇是使用 fdupes：

brew install fdupes
fdupes -r .

fdupes -r .遞歸查找目前目錄下的重複檔案。新增-d以刪除重複項 - 系統會提示您要保留哪些檔案；如果您添加-dN，fdupes 將始終保留第一個檔案並刪除其他檔案。

Question 3

我編寫了一個腳本，可以重命名您的文件以匹配其內容的哈希值。

它使用檔案位元組的子集，因此速度很快，如果發生衝突，它會在名稱後面附加一個計數器，如下所示：

3101ace8db9f.jpg
3101ace8db9f (1).jpg
3101ace8db9f (2).jpg

這樣您就可以輕鬆地自行查看和刪除重複項，而無需過度信任其他人的軟體來處理您的照片。

腳本： https://gist.github.com/SimplGy/75bb4fd26a12d4f16da6df1c4e506562

Answer