테스트

Question 1

단순히 명령줄 도구를 사용하고 쉘 스크립트를 만들 필요가 없다면 fdupes대부분의 배포판에서 이 프로그램을 사용할 수 있습니다.

fslint동일한 기능을 가진 GUI 기반 도구도 있습니다 .

Answer

단순히 명령줄 도구를 사용하고 쉘 스크립트를 만들 필요가 없다면 fdupes대부분의 배포판에서 이 프로그램을 사용할 수 있습니다.

fslint동일한 기능을 가진 GUI 기반 도구도 있습니다 .

Question 2

이 솔루션은 O(n) 시간 내에 중복 항목을 찾습니다. 각 파일에는 그에 대해 생성된 체크섬이 있으며, 각 파일은 차례로 연관 배열을 통해 알려진 체크섬 세트와 비교됩니다.

#!/bin/bash
#
# Usage:  ./delete-duplicates.sh  [<files...>]
#
declare -A filecksums

# No args, use files in current directory
test 0 -eq $# && set -- *

for file in "$@"
do
    # Files only (also no symlinks)
    [[ -f "$file" ]] && [[ ! -h "$file" ]] || continue

    # Generate the checksum
    cksum=$(cksum <"$file" | tr ' ' _)

    # Have we already got this one?
    if [[ -n "${filecksums[$cksum]}" ]] && [[ "${filecksums[$cksum]}" != "$file" ]]
    then
        echo "Found '$file' is a duplicate of '${filecksums[$cksum]}'" >&2
        echo rm -f "$file"
    else
        filecksums[$cksum]="$file"
    fi
done

명령줄에 파일(또는 와일드카드)을 지정하지 않으면 현재 디렉터리에 있는 파일 집합이 사용됩니다. 여러 디렉터리의 파일을 비교하지만 디렉터리 자체에 재귀적으로 작성되지는 않습니다.

세트의 "첫 번째" 파일은 항상 최종 버전으로 간주됩니다. 파일 시간, 권한 또는 소유권은 고려되지 않습니다. 내용만 고려됩니다.

원하는 작업이 수행되는 것이 확실하면 줄 echo에서 제거하십시오 . rm -f "$file"해당 줄을 다음으로 바꾸려면 ln -f "${filecksums[$cksum]}" "$file"콘텐츠를 하드 링크할 수 있습니다. 디스크 공간 절약은 동일하지만 파일 이름은 손실되지 않습니다.

Answer

이 솔루션은 O(n) 시간 내에 중복 항목을 찾습니다. 각 파일에는 그에 대해 생성된 체크섬이 있으며, 각 파일은 차례로 연관 배열을 통해 알려진 체크섬 세트와 비교됩니다.

#!/bin/bash
#
# Usage:  ./delete-duplicates.sh  [<files...>]
#
declare -A filecksums

# No args, use files in current directory
test 0 -eq $# && set -- *

for file in "$@"
do
    # Files only (also no symlinks)
    [[ -f "$file" ]] && [[ ! -h "$file" ]] || continue

    # Generate the checksum
    cksum=$(cksum <"$file" | tr ' ' _)

    # Have we already got this one?
    if [[ -n "${filecksums[$cksum]}" ]] && [[ "${filecksums[$cksum]}" != "$file" ]]
    then
        echo "Found '$file' is a duplicate of '${filecksums[$cksum]}'" >&2
        echo rm -f "$file"
    else
        filecksums[$cksum]="$file"
    fi
done

명령줄에 파일(또는 와일드카드)을 지정하지 않으면 현재 디렉터리에 있는 파일 집합이 사용됩니다. 여러 디렉터리의 파일을 비교하지만 디렉터리 자체에 재귀적으로 작성되지는 않습니다.

세트의 "첫 번째" 파일은 항상 최종 버전으로 간주됩니다. 파일 시간, 권한 또는 소유권은 고려되지 않습니다. 내용만 고려됩니다.

원하는 작업이 수행되는 것이 확실하면 줄 echo에서 제거하십시오 . rm -f "$file"해당 줄을 다음으로 바꾸려면 ln -f "${filecksums[$cksum]}" "$file"콘텐츠를 하드 링크할 수 있습니다. 디스크 공간 절약은 동일하지만 파일 이름은 손실되지 않습니다.

Question 3

스크립트의 주요 문제는 i실제 파일 이름을 값으로 사용하는 반면 j숫자는 단지 숫자라는 것입니다. 이름을 배열로 가져오고 및 인덱스를 모두 사용하면 i작동 j합니다.

files=(*)
count=${#files[@]}
for (( i=0 ; i < count ;i++ )); do 
    for (( j=i+1 ; j < count ; j++ )); do
        if diff -q "${files[i]}" "${files[j]}"  >/dev/null ; then
            echo "${files[i]} and ${files[j]} are the same"
        fi
    done
done

(Bash와 ksh/ ksh93Debian에서 작동하는 것 같습니다.)

할당은 두 요소 및 (인덱스 0과 1) 로 a=(this that)배열을 초기화합니다 . 단어 분할 및 글로빙은 평소와 같이 작동하므로 현재 디렉터리에 있는 모든 파일의 이름으로 초기화됩니다 (도트 파일 제외). 배열의 모든 요소로 확장되며 해시 기호는 길이를 묻고 배열의 요소 수도 마찬가지입니다. (이것은 배열의 첫 번째 요소이며 배열이 아닌 첫 번째 요소의 길이입니다!)athisthatfiles=(*)files"${files[@]}"${#files[@]}${files}${#files}

for i in `/folder/*`

여기의 백틱은 확실히 오타인가요? 첫 번째 파일을 명령으로 실행하고 나머지 파일을 인수로 제공합니다.

Answer

스크립트의 주요 문제는 i실제 파일 이름을 값으로 사용하는 반면 j숫자는 단지 숫자라는 것입니다. 이름을 배열로 가져오고 및 인덱스를 모두 사용하면 i작동 j합니다.

files=(*)
count=${#files[@]}
for (( i=0 ; i < count ;i++ )); do 
    for (( j=i+1 ; j < count ; j++ )); do
        if diff -q "${files[i]}" "${files[j]}"  >/dev/null ; then
            echo "${files[i]} and ${files[j]} are the same"
        fi
    done
done

(Bash와 ksh/ ksh93Debian에서 작동하는 것 같습니다.)

할당은 두 요소 및 (인덱스 0과 1) 로 a=(this that)배열을 초기화합니다 . 단어 분할 및 글로빙은 평소와 같이 작동하므로 현재 디렉터리에 있는 모든 파일의 이름으로 초기화됩니다 (도트 파일 제외). 배열의 모든 요소로 확장되며 해시 기호는 길이를 묻고 배열의 요소 수도 마찬가지입니다. (이것은 배열의 첫 번째 요소이며 배열이 아닌 첫 번째 요소의 길이입니다!)athisthatfiles=(*)files"${files[@]}"${#files[@]}${files}${#files}

for i in `/folder/*`

여기의 백틱은 확실히 오타인가요? 첫 번째 파일을 명령으로 실행하고 나머지 파일을 인수로 제공합니다.

Question 4

그런데 체크섬이나 해시를 사용하는 것이 좋습니다. 내 스크립트는 그것을 사용하지 않습니다. 그러나 파일이 작고 파일의 양이 크지 않은 경우(예: 10-20개 파일) 이 스크립트는 매우 빠르게 작동합니다. 100개 이상의 파일이 있고 모든 파일에 1000줄이 있으면 시간은 10초 이상이 됩니다.

용법: ./duplicate_removing.sh files/*

#!/bin/bash

for target_file in "$@"; do
    shift
    for candidate_file in "$@"; do
        compare=$(diff -q "$target_file" "$candidate_file")
        if [ -z "$compare" ]; then
            echo the "$target_file" is a copy "$candidate_file"
            echo rm -v "$candidate_file"
        fi
    done
done

테스트

임의의 파일을 생성합니다: ./creating_random_files.sh

#!/bin/bash

file_amount=10
files_dir="files"

mkdir -p "$files_dir"

while ((file_amount)); do
    content=$(shuf -i 1-1000)
    echo "$RANDOM" "$content" | tee "${files_dir}/${file_amount}".txt{,.copied} > /dev/null
    ((file_amount--))
done

달리다 ./duplicate_removing.sh files/* 그리고 출력을 얻습니다

the files/10.txt is a copy files/10.txt.copied
rm -v files/10.txt.copied
the files/1.txt is a copy files/1.txt.copied
rm -v files/1.txt.copied
the files/2.txt is a copy files/2.txt.copied
rm -v files/2.txt.copied
the files/3.txt is a copy files/3.txt.copied
rm -v files/3.txt.copied
the files/4.txt is a copy files/4.txt.copied
rm -v files/4.txt.copied
the files/5.txt is a copy files/5.txt.copied
rm -v files/5.txt.copied
the files/6.txt is a copy files/6.txt.copied
rm -v files/6.txt.copied
the files/7.txt is a copy files/7.txt.copied
rm -v files/7.txt.copied
the files/8.txt is a copy files/8.txt.copied
rm -v files/8.txt.copied
the files/9.txt is a copy files/9.txt.copied
rm -v files/9.txt.copied

Answer

그런데 체크섬이나 해시를 사용하는 것이 좋습니다. 내 스크립트는 그것을 사용하지 않습니다. 그러나 파일이 작고 파일의 양이 크지 않은 경우(예: 10-20개 파일) 이 스크립트는 매우 빠르게 작동합니다. 100개 이상의 파일이 있고 모든 파일에 1000줄이 있으면 시간은 10초 이상이 됩니다.

용법: ./duplicate_removing.sh files/*

#!/bin/bash

for target_file in "$@"; do
    shift
    for candidate_file in "$@"; do
        compare=$(diff -q "$target_file" "$candidate_file")
        if [ -z "$compare" ]; then
            echo the "$target_file" is a copy "$candidate_file"
            echo rm -v "$candidate_file"
        fi
    done
done

테스트

임의의 파일을 생성합니다: ./creating_random_files.sh

#!/bin/bash

file_amount=10
files_dir="files"

mkdir -p "$files_dir"

while ((file_amount)); do
    content=$(shuf -i 1-1000)
    echo "$RANDOM" "$content" | tee "${files_dir}/${file_amount}".txt{,.copied} > /dev/null
    ((file_amount--))
done

달리다 ./duplicate_removing.sh files/* 그리고 출력을 얻습니다

the files/10.txt is a copy files/10.txt.copied
rm -v files/10.txt.copied
the files/1.txt is a copy files/1.txt.copied
rm -v files/1.txt.copied
the files/2.txt is a copy files/2.txt.copied
rm -v files/2.txt.copied
the files/3.txt is a copy files/3.txt.copied
rm -v files/3.txt.copied
the files/4.txt is a copy files/4.txt.copied
rm -v files/4.txt.copied
the files/5.txt is a copy files/5.txt.copied
rm -v files/5.txt.copied
the files/6.txt is a copy files/6.txt.copied
rm -v files/6.txt.copied
the files/7.txt is a copy files/7.txt.copied
rm -v files/7.txt.copied
the files/8.txt is a copy files/8.txt.copied
rm -v files/8.txt.copied
the files/9.txt is a copy files/9.txt.copied
rm -v files/9.txt.copied

테스트

답변1

답변2

답변3

답변4

테스트

관련 정보