Есть ли возможность сохранить содержимое каждой страницы в отдельный файл?

Question

Если вы запустите pdftotext для PDF-файла, созданного выше, вы получите текстовый файл со страницами, разделенными подачей страницы (ctrl-L), который заканчивается:

 Is this person doing something violent or not?
• Natural language processing (NLP) from digital documents:
– Does this news article belong to the realm of politics or sports?
– Does this query phrase match a particular article in the archive?

^L2

0.1.1

Second Level Head

Another instance of quantitative estimation is estimating a house’s price based
on inputs like current income of the house’s owner, crime statistics for the
neighborhood, and so on. Machines that make such quantitative estimators are
called regressors.

^L

Итак, если вы удалите пустые строки, а затем возьмете каждую строку, которая находится над строкой перевода страницы, вы получите более или менее требуемый текст:

$ sed -e '/^[ \t]*$/d' aa072.txt |grep -B1 -P "\x0C" 
– Does this query phrase match a particular article in the archive?

2
--
called regressors.

Answer 1

Если вы запустите pdftotext для PDF-файла, созданного выше, вы получите текстовый файл со страницами, разделенными подачей страницы (ctrl-L), который заканчивается:

 Is this person doing something violent or not?
• Natural language processing (NLP) from digital documents:
– Does this news article belong to the realm of politics or sports?
– Does this query phrase match a particular article in the archive?

^L2

0.1.1

Second Level Head

Another instance of quantitative estimation is estimating a house’s price based
on inputs like current income of the house’s owner, crime statistics for the
neighborhood, and so on. Machines that make such quantitative estimators are
called regressors.

^L

Итак, если вы удалите пустые строки, а затем возьмете каждую строку, которая находится над строкой перевода страницы, вы получите более или менее требуемый текст:

$ sed -e '/^[ \t]*$/d' aa072.txt |grep -B1 -P "\x0C" 
– Does this query phrase match a particular article in the archive?

2
--
called regressors.

Есть ли возможность сохранить содержимое каждой страницы в отдельный файл?

решение1

Связанный контент