I would like to be able to scan paper documents to PDF files and make the text searchable. I believe the Tesseract program can assist this, but don't know how to begin, and don't know what would be the best program to use.
Is anybody making searchable PDF files successfully?
答え1
I can recommend ocrmypdf
, see https://github.com/ocrmypdf/OCRmyPDF , also packaged for Ubuntu. You can install it by running:
sudo apt install ocrmypdf
You can use it as follows:
ocrmypdf -l eng infile.pdf outfile.pdf
上記の呼び出しocrmypdf
は、ドキュメントの言語を英語(-l eng
)に指定する単純なものです。man
ページ; 時間の経過とともに必要に応じてそれらを発見したいと思うかもしれません。