PDF를 검색 가능한 PDF로 일괄 변환

Question 1

BIN 폴더에 있는 CPYCONVERTER.EXE 파일을 명령줄로 사용합니다(eCopy Ver.9-Paperworks에서 지원되는 와일드카드). 이는 8.5 eCopy Desktop용입니다.

Command Line Cpy Converter Version 8.5 (Build 0.116)
 Copyright c 1992 - 2004. All rights reserved.

 Converts CPY to CPY, CPY to TIF or TIF to CPY

Usage:
 cpyconverter.exe [-?] -S=<source path> -D=<dest path> [-P] [-E] [-Q] [-B] [-O]
[-T3/T4/TC/C/U]

Note:
 Wildcards are not supported.  Full paths must be used for source and destinatio
n

Switches:
--------------------
-?                      : This menu
-Q                      : Turn off logging.
-P                      : Converter pauses after conversion.
-E                      : Converter pauses if there is an error.
-B                      : Converter burns-in Blackout/Whiteout markups (if appli
cable).
-O                      : Converter OCRs document and creates searchable text (i
f applicable).
-S="<SOURCE PATH>"      : The path of the file to convert.
-D="<DESTINATION PATH>" : The path of the newly converted file.
-P=<PASSWORD>           : Password for encrypting and decrypting documents.
--------------------
 * If the source document is encrypted CPY converter will attempt to decrypt it
to the destination document with the supplied password.
 * If the source document is not encrypted CPY converter will attempt to encrypt
 the destination document using the supplied password.
 * Please note you cannot encrypt/decrypt tif documents.

-<Conversion Type>      : The type of conversion to be done(T3, T4, TC, C, U)
--------------------
* T4 - Convert CPY to TIF Group4
* T3 - Convert CPY to TIF Group3
* C  - Convert TIF(Any group) to CPY
* U  - Convert CPY to CPY

Ex.1 cpyconverter.exe -S="C:\My Dir\test.tif" -D="C:\My Dir\test.cpy" -C
Convert Tiff to cpy

Ex.2 cpyconverter.exe -S="C:\My Dir\test.cpy" -D="C:\My Dir\test.tif" -T3
Convert Cpy to Tif Group 3

Ex.3 cpyconverter.exe -S="C:\My Dir\test.cpy" -D="C:\My Dir\test.tif" -T4
Convert Cpy to Tif Group 4

Answer

BIN 폴더에 있는 CPYCONVERTER.EXE 파일을 명령줄로 사용합니다(eCopy Ver.9-Paperworks에서 지원되는 와일드카드). 이는 8.5 eCopy Desktop용입니다.

Command Line Cpy Converter Version 8.5 (Build 0.116)
 Copyright c 1992 - 2004. All rights reserved.

 Converts CPY to CPY, CPY to TIF or TIF to CPY

Usage:
 cpyconverter.exe [-?] -S=<source path> -D=<dest path> [-P] [-E] [-Q] [-B] [-O]
[-T3/T4/TC/C/U]

Note:
 Wildcards are not supported.  Full paths must be used for source and destinatio
n

Switches:
--------------------
-?                      : This menu
-Q                      : Turn off logging.
-P                      : Converter pauses after conversion.
-E                      : Converter pauses if there is an error.
-B                      : Converter burns-in Blackout/Whiteout markups (if appli
cable).
-O                      : Converter OCRs document and creates searchable text (i
f applicable).
-S="<SOURCE PATH>"      : The path of the file to convert.
-D="<DESTINATION PATH>" : The path of the newly converted file.
-P=<PASSWORD>           : Password for encrypting and decrypting documents.
--------------------
 * If the source document is encrypted CPY converter will attempt to decrypt it
to the destination document with the supplied password.
 * If the source document is not encrypted CPY converter will attempt to encrypt
 the destination document using the supplied password.
 * Please note you cannot encrypt/decrypt tif documents.

-<Conversion Type>      : The type of conversion to be done(T3, T4, TC, C, U)
--------------------
* T4 - Convert CPY to TIF Group4
* T3 - Convert CPY to TIF Group3
* C  - Convert TIF(Any group) to CPY
* U  - Convert CPY to CPY

Ex.1 cpyconverter.exe -S="C:\My Dir\test.tif" -D="C:\My Dir\test.cpy" -C
Convert Tiff to cpy

Ex.2 cpyconverter.exe -S="C:\My Dir\test.cpy" -D="C:\My Dir\test.tif" -T3
Convert Cpy to Tif Group 3

Ex.3 cpyconverter.exe -S="C:\My Dir\test.cpy" -D="C:\My Dir\test.tif" -T4
Convert Cpy to Tif Group 4

Question 2

리눅스에서

PDF먼저 아직 OCR이 아닌 파일을 OCR해야 합니다. grep편집할 수 없는 모든 PDF를 검색하고 OCR하는 매우 간단한 방법을 작성했습니다.

pdf파일에 글꼴이 없으면 일반적으로 검색이 불가능하다는 것을 알았습니다 . 그래서 이것을 알면 우리는 pdffonts.

처음 2줄은 pdffonts테이블 헤더이므로 파일을 검색할 수 있으면 2줄 이상의 출력이 있으므로 이를 알고 다음을 만들 수 있습니다.

gedit check_pdf_searchable.sh

그럼 이걸 붙여넣어

#!/bin/bash 
#set -vx
if ((`pdffonts "$1" | wc -l` < 3 )); then
echo $1
pypdfocr "$1"
fi

그런 다음 실행 가능하게 만드십시오.

chmod +x check_pdf_searchable.sh

그런 다음 디렉터리에 검색할 수 없는 모든 PDF를 나열합니다.

ls -1 ./*.pdf | xargs -L1 -I {} ./check_pdf_searchable.sh {}

또는 디렉터리와 해당 하위 디렉터리에 있습니다.

tree -fai . | grep -P ".pdf$" | xargs -L1 -I {} ./check_pdf_searchable.sh {}

Answer

리눅스에서

PDF먼저 아직 OCR이 아닌 파일을 OCR해야 합니다. grep편집할 수 없는 모든 PDF를 검색하고 OCR하는 매우 간단한 방법을 작성했습니다.

pdf파일에 글꼴이 없으면 일반적으로 검색이 불가능하다는 것을 알았습니다 . 그래서 이것을 알면 우리는 pdffonts.

처음 2줄은 pdffonts테이블 헤더이므로 파일을 검색할 수 있으면 2줄 이상의 출력이 있으므로 이를 알고 다음을 만들 수 있습니다.

gedit check_pdf_searchable.sh

그럼 이걸 붙여넣어

#!/bin/bash 
#set -vx
if ((`pdffonts "$1" | wc -l` < 3 )); then
echo $1
pypdfocr "$1"
fi

그런 다음 실행 가능하게 만드십시오.

chmod +x check_pdf_searchable.sh

그런 다음 디렉터리에 검색할 수 없는 모든 PDF를 나열합니다.

ls -1 ./*.pdf | xargs -L1 -I {} ./check_pdf_searchable.sh {}

또는 디렉터리와 해당 하위 디렉터리에 있습니다.

tree -fai . | grep -P ".pdf$" | xargs -L1 -I {} ./check_pdf_searchable.sh {}

Question 3

가장 쉬운 방법은온라인 OCR API. ocr.space API에는 생성 지원이 포함되어 있습니다.검색 가능한 PDF. 이 서비스에는 매월 25,000회의 전환이 가능한 무료 등급이 있습니다.

그런 다음 Powershell, 배치 또는 기타 스크립팅 언어를 사용하여 이를 자동화할 수 있습니다. 예를 들어 cURL을 사용하여 일괄 변환을 트리거합니다.

curl -H "apikey:helloworld" --form "[email protected]" --form "language=eng" -form "isOverlayRequired=true" https://api.ocr.space/Parse/Image

Answer

가장 쉬운 방법은온라인 OCR API. ocr.space API에는 생성 지원이 포함되어 있습니다.검색 가능한 PDF. 이 서비스에는 매월 25,000회의 전환이 가능한 무료 등급이 있습니다.

그런 다음 Powershell, 배치 또는 기타 스크립팅 언어를 사용하여 이를 자동화할 수 있습니다. 예를 들어 cURL을 사용하여 일괄 변환을 트리거합니다.

curl -H "apikey:helloworld" --form "[email protected]" --form "language=eng" -form "isOverlayRequired=true" https://api.ocr.space/Parse/Image

PDF를 검색 가능한 PDF로 일괄 변환

답변1

답변2

리눅스에서

답변3

관련 정보