測試

Question 1

如果您願意簡單地使用命令列工具，而不必建立 shell 腳本，則fdupes大多數發行版上都可以使用該程式來執行此操作。

還有fslint具有相同功能的基於 GUI 的工具。

Answer

如果您願意簡單地使用命令列工具，而不必建立 shell 腳本，則fdupes大多數發行版上都可以使用該程式來執行此操作。

還有fslint具有相同功能的基於 GUI 的工具。

Question 2

此解決方案將在 O(n) 時間內找到重複項。每個文件都有一個為其產生的校驗和，並且每個文件依次透過關聯數組與一組已知校驗和進行比較。

#!/bin/bash
#
# Usage:  ./delete-duplicates.sh  [<files...>]
#
declare -A filecksums

# No args, use files in current directory
test 0 -eq $# && set -- *

for file in "$@"
do
    # Files only (also no symlinks)
    [[ -f "$file" ]] && [[ ! -h "$file" ]] || continue

    # Generate the checksum
    cksum=$(cksum <"$file" | tr ' ' _)

    # Have we already got this one?
    if [[ -n "${filecksums[$cksum]}" ]] && [[ "${filecksums[$cksum]}" != "$file" ]]
    then
        echo "Found '$file' is a duplicate of '${filecksums[$cksum]}'" >&2
        echo rm -f "$file"
    else
        filecksums[$cksum]="$file"
    fi
done

如果您未在命令列上指定任何檔案（或通配符），它將使用目前目錄中的檔案集。它將比較多個目錄中的文件，但不會遞歸到目錄本身。

該集中的“第一個”文件始終被視為最終版本。不考慮文件時間、權限或所有權。僅考慮內容。

當您確定它能滿足您的要求時，請將其echo從行中刪除。rm -f "$file"請注意，如果您要替換該行，則ln -f "${filecksums[$cksum]}" "$file"可以硬連結內容。同樣節省磁碟空間，但不會遺失檔案名稱。

Answer

此解決方案將在 O(n) 時間內找到重複項。每個文件都有一個為其產生的校驗和，並且每個文件依次透過關聯數組與一組已知校驗和進行比較。

#!/bin/bash
#
# Usage:  ./delete-duplicates.sh  [<files...>]
#
declare -A filecksums

# No args, use files in current directory
test 0 -eq $# && set -- *

for file in "$@"
do
    # Files only (also no symlinks)
    [[ -f "$file" ]] && [[ ! -h "$file" ]] || continue

    # Generate the checksum
    cksum=$(cksum <"$file" | tr ' ' _)

    # Have we already got this one?
    if [[ -n "${filecksums[$cksum]}" ]] && [[ "${filecksums[$cksum]}" != "$file" ]]
    then
        echo "Found '$file' is a duplicate of '${filecksums[$cksum]}'" >&2
        echo rm -f "$file"
    else
        filecksums[$cksum]="$file"
    fi
done

如果您未在命令列上指定任何檔案（或通配符），它將使用目前目錄中的檔案集。它將比較多個目錄中的文件，但不會遞歸到目錄本身。

該集中的“第一個”文件始終被視為最終版本。不考慮文件時間、權限或所有權。僅考慮內容。

當您確定它能滿足您的要求時，請將其echo從行中刪除。rm -f "$file"請注意，如果您要替換該行，則ln -f "${filecksums[$cksum]}" "$file"可以硬連結內容。同樣節省磁碟空間，但不會遺失檔案名稱。

Question 3

腳本中的主要問題似乎是i將實際檔案名稱作為值，而j只是一個數字。將名稱放入數組並使用i和j作為索引應該可以工作：

files=(*)
count=${#files[@]}
for (( i=0 ; i < count ;i++ )); do 
    for (( j=i+1 ; j < count ; j++ )); do
        if diff -q "${files[i]}" "${files[j]}"  >/dev/null ; then
            echo "${files[i]} and ${files[j]} are the same"
        fi
    done
done

（似乎可以與 Bash 和ksh/ ksh93Debian 一起使用。）

此賦值運算將使用兩個元素和（索引為 0 和 1）來a=(this that)初始化陣列。分詞和通配符照常工作，因此使用目前目錄中所有檔案的名稱（點檔案除外）進行初始化。將擴展到數組的所有元素，哈希符號要求長度，數組中元素的數量也是如此。（請注意，這將是數組的第一個元素，並且是第一個元素的長度，而不是數組！）athisthatfiles=(*)files"${files[@]}"${#files[@]}${files}${#files}

for i in `/folder/*`

這裡的反引號肯定是一個錯字嗎？您將作為命令運行第一個文件，並將其餘文件作為參數提供給它。

Answer

腳本中的主要問題似乎是i將實際檔案名稱作為值，而j只是一個數字。將名稱放入數組並使用i和j作為索引應該可以工作：

files=(*)
count=${#files[@]}
for (( i=0 ; i < count ;i++ )); do 
    for (( j=i+1 ; j < count ; j++ )); do
        if diff -q "${files[i]}" "${files[j]}"  >/dev/null ; then
            echo "${files[i]} and ${files[j]} are the same"
        fi
    done
done

（似乎可以與 Bash 和ksh/ ksh93Debian 一起使用。）

此賦值運算將使用兩個元素和（索引為 0 和 1）來a=(this that)初始化陣列。分詞和通配符照常工作，因此使用目前目錄中所有檔案的名稱（點檔案除外）進行初始化。將擴展到數組的所有元素，哈希符號要求長度，數組中元素的數量也是如此。（請注意，這將是數組的第一個元素，並且是第一個元素的長度，而不是數組！）athisthatfiles=(*)files"${files[@]}"${#files[@]}${files}${#files}

for i in `/folder/*`

這裡的反引號肯定是一個錯字嗎？您將作為命令運行第一個文件，並將其餘文件作為參數提供給它。

Question 4

順便說一句，使用校驗和或雜湊是個好主意。我的腳本沒有使用它。但如果檔案很小且檔案數量不大（例如 10-20 個檔案），則此腳本將運行得相當快。如果你有 100 個或更多文件，每個文件有 1000 行，那麼時間將超過 10 秒。

用法： ./duplicate_removing.sh files/*

#!/bin/bash

for target_file in "$@"; do
    shift
    for candidate_file in "$@"; do
        compare=$(diff -q "$target_file" "$candidate_file")
        if [ -z "$compare" ]; then
            echo the "$target_file" is a copy "$candidate_file"
            echo rm -v "$candidate_file"
        fi
    done
done

測試

建立隨機檔案： ./creating_random_files.sh

#!/bin/bash

file_amount=10
files_dir="files"

mkdir -p "$files_dir"

while ((file_amount)); do
    content=$(shuf -i 1-1000)
    echo "$RANDOM" "$content" | tee "${files_dir}/${file_amount}".txt{,.copied} > /dev/null
    ((file_amount--))
done

跑步 ./duplicate_removing.sh files/* 並得到輸出

the files/10.txt is a copy files/10.txt.copied
rm -v files/10.txt.copied
the files/1.txt is a copy files/1.txt.copied
rm -v files/1.txt.copied
the files/2.txt is a copy files/2.txt.copied
rm -v files/2.txt.copied
the files/3.txt is a copy files/3.txt.copied
rm -v files/3.txt.copied
the files/4.txt is a copy files/4.txt.copied
rm -v files/4.txt.copied
the files/5.txt is a copy files/5.txt.copied
rm -v files/5.txt.copied
the files/6.txt is a copy files/6.txt.copied
rm -v files/6.txt.copied
the files/7.txt is a copy files/7.txt.copied
rm -v files/7.txt.copied
the files/8.txt is a copy files/8.txt.copied
rm -v files/8.txt.copied
the files/9.txt is a copy files/9.txt.copied
rm -v files/9.txt.copied

Answer

順便說一句，使用校驗和或雜湊是個好主意。我的腳本沒有使用它。但如果檔案很小且檔案數量不大（例如 10-20 個檔案），則此腳本將運行得相當快。如果你有 100 個或更多文件，每個文件有 1000 行，那麼時間將超過 10 秒。

用法： ./duplicate_removing.sh files/*

#!/bin/bash

for target_file in "$@"; do
    shift
    for candidate_file in "$@"; do
        compare=$(diff -q "$target_file" "$candidate_file")
        if [ -z "$compare" ]; then
            echo the "$target_file" is a copy "$candidate_file"
            echo rm -v "$candidate_file"
        fi
    done
done

測試

建立隨機檔案： ./creating_random_files.sh

#!/bin/bash

file_amount=10
files_dir="files"

mkdir -p "$files_dir"

while ((file_amount)); do
    content=$(shuf -i 1-1000)
    echo "$RANDOM" "$content" | tee "${files_dir}/${file_amount}".txt{,.copied} > /dev/null
    ((file_amount--))
done

跑步 ./duplicate_removing.sh files/* 並得到輸出

the files/10.txt is a copy files/10.txt.copied
rm -v files/10.txt.copied
the files/1.txt is a copy files/1.txt.copied
rm -v files/1.txt.copied
the files/2.txt is a copy files/2.txt.copied
rm -v files/2.txt.copied
the files/3.txt is a copy files/3.txt.copied
rm -v files/3.txt.copied
the files/4.txt is a copy files/4.txt.copied
rm -v files/4.txt.copied
the files/5.txt is a copy files/5.txt.copied
rm -v files/5.txt.copied
the files/6.txt is a copy files/6.txt.copied
rm -v files/6.txt.copied
the files/7.txt is a copy files/7.txt.copied
rm -v files/7.txt.copied
the files/8.txt is a copy files/8.txt.copied
rm -v files/8.txt.copied
the files/9.txt is a copy files/9.txt.copied
rm -v files/9.txt.copied

測試

答案1

答案2

答案3

答案4

測試

相關內容