디렉토리에서 중복된 파일을 제거하는 방법은 무엇입니까?

Question 1

배쉬 4.x

#!/bin/bash
declare -A arr
shopt -s globstar

for file in **; do
  [[ -f "$file" ]] || continue
   
  read cksm _ < <(md5sum "$file")
  if ((arr[$cksm]++)); then 
    echo "rm $file"
  fi
done

이는 재귀적이며 모든 파일 이름을 처리합니다. 단점은 연관 배열과 재귀 검색을 사용하려면 버전 4.x가 필요하다는 것입니다. echo결과가 마음에 들면 제거하세요 .

멍청한 버전

gawk '
  {
    cmd="md5sum " q FILENAME q
    cmd | getline cksm
    close(cmd)
    sub(/ .*$/,"",cksm)
    if(a[cksm]++){
      cmd="echo rm " q FILENAME q
      system(cmd)
      close(cmd)
    }
    nextfile
  }' q='"' *

이름에 큰따옴표가 있는 파일에서는 여전히 중단됩니다. .awk echo결과가 마음에 들면 제거하세요 .

Answer

배쉬 4.x

#!/bin/bash
declare -A arr
shopt -s globstar

for file in **; do
  [[ -f "$file" ]] || continue
   
  read cksm _ < <(md5sum "$file")
  if ((arr[$cksm]++)); then 
    echo "rm $file"
  fi
done

이는 재귀적이며 모든 파일 이름을 처리합니다. 단점은 연관 배열과 재귀 검색을 사용하려면 버전 4.x가 필요하다는 것입니다. echo결과가 마음에 들면 제거하세요 .

멍청한 버전

gawk '
  {
    cmd="md5sum " q FILENAME q
    cmd | getline cksm
    close(cmd)
    sub(/ .*$/,"",cksm)
    if(a[cksm]++){
      cmd="echo rm " q FILENAME q
      system(cmd)
      close(cmd)
    }
    nextfile
  }' q='"' *

이름에 큰따옴표가 있는 파일에서는 여전히 중단됩니다. .awk echo결과가 마음에 들면 제거하세요 .

Question 2

fdupes당신이 선택한 도구입니다. 현재 디렉토리에서 모든 중복 파일을(이름이 아닌 내용으로) 찾으려면 다음을 수행하십시오.

fdupes -r .

중복된 파일의 삭제를 수동으로 확인하려면:

fdupes -r -d .

각 중복된 파일 중 첫 번째 파일을 제외한 모든 복사본을 자동으로 삭제하려면(경고합니다. 이 경고는 요청한 대로 실제로 파일을 삭제합니다.):

fdupes -r -f . | grep -v '^$' | xargs rm -v

삭제하기 전에 파일을 수동으로 확인하는 것이 좋습니다.

fdupes -rf . | grep -v '^$' > files
... # check files
xargs -a files rm -v

Answer

fdupes당신이 선택한 도구입니다. 현재 디렉토리에서 모든 중복 파일을(이름이 아닌 내용으로) 찾으려면 다음을 수행하십시오.

fdupes -r .

중복된 파일의 삭제를 수동으로 확인하려면:

fdupes -r -d .

각 중복된 파일 중 첫 번째 파일을 제외한 모든 복사본을 자동으로 삭제하려면(경고합니다. 이 경고는 요청한 대로 실제로 파일을 삭제합니다.):

fdupes -r -f . | grep -v '^$' | xargs rm -v

삭제하기 전에 파일을 수동으로 확인하는 것이 좋습니다.

fdupes -rf . | grep -v '^$' > files
... # check files
xargs -a files rm -v

Question 3

나는 추천한다f클론.

Fclones는 Rust로 작성된 최신 중복 파일 찾기 및 제거 프로그램으로 대부분의 Linux 배포판 및 macOS에서 사용할 수 있습니다.

주목할만한 기능:

파일 경로에서 공백, 비ASCII 및 제어 문자를 지원합니다.
여러 디렉토리 트리에서 검색 가능
.gitignore 파일을 존중합니다.
안전: 중복 항목에 대해 작업을 수행하기 전에 수동으로 중복 항목 목록을 검사할 수 있습니다.
제거하거나 보존할 파일을 필터링/선택하기 위한 다양한 옵션을 제공합니다.
매우 빠르다

현재 디렉터리에서 중복 항목을 검색하려면 다음을 실행하세요.

fclones group . >dupes.txt

그런 다음 파일을 검사하여 올바른 중복 항목을 찾았는지 확인할 수 dupes.txt있습니다(해당 목록을 원하는 대로 수정할 수도 있습니다).

마지막으로 다음 중 하나를 사용하여 중복 파일을 제거/링크/이동합니다.

fclones remove <dupes.txt
fclones link <dupes.txt
fclones move target <dupes.txt
fclones dedupe <dupes.txt   # copy-on-write deduplication on some filesystems

예:

pkolaczk@p5520:~/Temp$ mkdir files
pkolaczk@p5520:~/Temp$ echo foo >files/foo1.txt
pkolaczk@p5520:~/Temp$ echo foo >files/foo2.txt
pkolaczk@p5520:~/Temp$ echo foo >files/foo3.txt

pkolaczk@p5520:~/Temp$ fclones group files >dupes.txt
[2022-05-13 18:48:25.608] fclones:  info: Started grouping
[2022-05-13 18:48:25.613] fclones:  info: Scanned 4 file entries
[2022-05-13 18:48:25.613] fclones:  info: Found 3 (12 B) files matching selection criteria
[2022-05-13 18:48:25.614] fclones:  info: Found 2 (8 B) candidates after grouping by size
[2022-05-13 18:48:25.614] fclones:  info: Found 2 (8 B) candidates after grouping by paths and file identifiers
[2022-05-13 18:48:25.619] fclones:  info: Found 2 (8 B) candidates after grouping by prefix
[2022-05-13 18:48:25.620] fclones:  info: Found 2 (8 B) candidates after grouping by suffix
[2022-05-13 18:48:25.620] fclones:  info: Found 2 (8 B) redundant files

pkolaczk@p5520:~/Temp$ cat dupes.txt
# Report by fclones 0.24.0
# Timestamp: 2022-05-13 18:48:25.621 +0200
# Command: fclones group files
# Base dir: /home/pkolaczk/Temp
# Total: 12 B (12 B) in 3 files in 1 groups
# Redundant: 8 B (8 B) in 2 files
# Missing: 0 B (0 B) in 0 files
6109f093b3fd5eb1060989c990d1226f, 4 B (4 B) * 3:
    /home/pkolaczk/Temp/files/foo1.txt
    /home/pkolaczk/Temp/files/foo2.txt
    /home/pkolaczk/Temp/files/foo3.txt

pkolaczk@p5520:~/Temp$ fclones remove <dupes.txt
[2022-05-13 18:48:41.002] fclones:  info: Started deduplicating
[2022-05-13 18:48:41.003] fclones:  info: Processed 2 files and reclaimed 8 B space

pkolaczk@p5520:~/Temp$ ls files
foo1.txt

Answer

나는 추천한다f클론.

Fclones는 Rust로 작성된 최신 중복 파일 찾기 및 제거 프로그램으로 대부분의 Linux 배포판 및 macOS에서 사용할 수 있습니다.

주목할만한 기능:

파일 경로에서 공백, 비ASCII 및 제어 문자를 지원합니다.
여러 디렉토리 트리에서 검색 가능
.gitignore 파일을 존중합니다.
안전: 중복 항목에 대해 작업을 수행하기 전에 수동으로 중복 항목 목록을 검사할 수 있습니다.
제거하거나 보존할 파일을 필터링/선택하기 위한 다양한 옵션을 제공합니다.
매우 빠르다

현재 디렉터리에서 중복 항목을 검색하려면 다음을 실행하세요.

fclones group . >dupes.txt

그런 다음 파일을 검사하여 올바른 중복 항목을 찾았는지 확인할 수 dupes.txt있습니다(해당 목록을 원하는 대로 수정할 수도 있습니다).

마지막으로 다음 중 하나를 사용하여 중복 파일을 제거/링크/이동합니다.

fclones remove <dupes.txt
fclones link <dupes.txt
fclones move target <dupes.txt
fclones dedupe <dupes.txt   # copy-on-write deduplication on some filesystems

예:

pkolaczk@p5520:~/Temp$ mkdir files
pkolaczk@p5520:~/Temp$ echo foo >files/foo1.txt
pkolaczk@p5520:~/Temp$ echo foo >files/foo2.txt
pkolaczk@p5520:~/Temp$ echo foo >files/foo3.txt

pkolaczk@p5520:~/Temp$ fclones group files >dupes.txt
[2022-05-13 18:48:25.608] fclones:  info: Started grouping
[2022-05-13 18:48:25.613] fclones:  info: Scanned 4 file entries
[2022-05-13 18:48:25.613] fclones:  info: Found 3 (12 B) files matching selection criteria
[2022-05-13 18:48:25.614] fclones:  info: Found 2 (8 B) candidates after grouping by size
[2022-05-13 18:48:25.614] fclones:  info: Found 2 (8 B) candidates after grouping by paths and file identifiers
[2022-05-13 18:48:25.619] fclones:  info: Found 2 (8 B) candidates after grouping by prefix
[2022-05-13 18:48:25.620] fclones:  info: Found 2 (8 B) candidates after grouping by suffix
[2022-05-13 18:48:25.620] fclones:  info: Found 2 (8 B) redundant files

pkolaczk@p5520:~/Temp$ cat dupes.txt
# Report by fclones 0.24.0
# Timestamp: 2022-05-13 18:48:25.621 +0200
# Command: fclones group files
# Base dir: /home/pkolaczk/Temp
# Total: 12 B (12 B) in 3 files in 1 groups
# Redundant: 8 B (8 B) in 2 files
# Missing: 0 B (0 B) in 0 files
6109f093b3fd5eb1060989c990d1226f, 4 B (4 B) * 3:
    /home/pkolaczk/Temp/files/foo1.txt
    /home/pkolaczk/Temp/files/foo2.txt
    /home/pkolaczk/Temp/files/foo3.txt

pkolaczk@p5520:~/Temp$ fclones remove <dupes.txt
[2022-05-13 18:48:41.002] fclones:  info: Started deduplicating
[2022-05-13 18:48:41.003] fclones:  info: Processed 2 files and reclaimed 8 B space

pkolaczk@p5520:~/Temp$ ls files
foo1.txt

Question 4

고유한 콘텐츠가 있는 파일을 테스트하는 방법은 무엇입니까?

if diff "$file1" "$file2" > /dev/null; then
    ...

디렉토리에 있는 파일 목록을 어떻게 얻을 수 있나요?

files="$( find ${files_dir} -type f )"

해당 목록에서 2개의 파일을 가져와 이름이 다르고 내용이 동일한지 확인할 수 있습니다.

#!/bin/bash
# removeDuplicates.sh

files_dir=$1
if [[ -z "$files_dir" ]]; then
    echo "Error: files dir is undefined"
fi

files="$( find ${files_dir} -type f )"
for file1 in $files; do
    for file2 in $files; do
        # echo "checking $file1 and $file2"
        if [[ "$file1" != "$file2" && -e "$file1" && -e "$file2" ]]; then
            if diff "$file1" "$file2" > /dev/null; then
                echo "$file1 and $file2 are duplicates"
                rm -v "$file2"
            fi
        fi
    done
done

예를 들어 다음과 같은 디렉토리가 있습니다.

$> ls .tmp -1
all(2).txt
all.txt
file
text
text(2)

따라서 고유한 파일은 3개만 있습니다.

해당 스크립트를 실행해 보겠습니다.

$> ./removeDuplicates.sh .tmp/
.tmp/text(2) and .tmp/text are duplicates
removed `.tmp/text'
.tmp/all.txt and .tmp/all(2).txt are duplicates
removed `.tmp/all(2).txt'

그리고 우리는 3개의 파일만 남게 됩니다.

$> ls .tmp/ -1
all.txt
file
text(2)

Answer

고유한 콘텐츠가 있는 파일을 테스트하는 방법은 무엇입니까?

if diff "$file1" "$file2" > /dev/null; then
    ...

디렉토리에 있는 파일 목록을 어떻게 얻을 수 있나요?

files="$( find ${files_dir} -type f )"

해당 목록에서 2개의 파일을 가져와 이름이 다르고 내용이 동일한지 확인할 수 있습니다.

#!/bin/bash
# removeDuplicates.sh

files_dir=$1
if [[ -z "$files_dir" ]]; then
    echo "Error: files dir is undefined"
fi

files="$( find ${files_dir} -type f )"
for file1 in $files; do
    for file2 in $files; do
        # echo "checking $file1 and $file2"
        if [[ "$file1" != "$file2" && -e "$file1" && -e "$file2" ]]; then
            if diff "$file1" "$file2" > /dev/null; then
                echo "$file1 and $file2 are duplicates"
                rm -v "$file2"
            fi
        fi
    done
done

예를 들어 다음과 같은 디렉토리가 있습니다.

$> ls .tmp -1
all(2).txt
all.txt
file
text
text(2)

따라서 고유한 파일은 3개만 있습니다.

해당 스크립트를 실행해 보겠습니다.

$> ./removeDuplicates.sh .tmp/
.tmp/text(2) and .tmp/text are duplicates
removed `.tmp/text'
.tmp/all.txt and .tmp/all(2).txt are duplicates
removed `.tmp/all(2).txt'

그리고 우리는 3개의 파일만 남게 됩니다.

$> ls .tmp/ -1
all.txt
file
text(2)

디렉토리에서 중복된 파일을 제거하는 방법은 무엇입니까?

답변1

배쉬 4.x

멍청한 버전

답변2

답변3

답변4

관련 정보