Extract Text from Images in multi-page PDF
To extract text from PDF, you would need two software installed on your machine.
-
tesseract OCR
Installing these on Fedora is very easy:
$ sudo yum install -y ghostscript tesseract
Now if your PDF file is named story.pdf the you can extract text as follows:
$ ghostscript -dNOPAUSE -dBATCH -sDEVICE=pngalpha -r300 -sOutputFile="page%03d".png story.pdf
$ for f in page*.png ; do tesseract $f $f.out; done
References: