Fetch Text from Image & PDF Using Selenium Java | Devstringx
In this blog, we will learn how we can fetch data from images and PDFs.
This Blog Contains:
- Read Text From Image Using OCR with Tesseract (tess4j)
- Reading PDF Text Using PDFUtil
- Save PDF as Image Using PDFUtil
- Extract Images From PDF Using PDFUtil
Fetch Text From Image In Selenium
To get a text from the Image in selenium, we use Optical Character Recognition (OCR) with Tesseract (tess4j). Tesseract Supports UTF-8 Unicode.
- First, we need to create a folder with the name “tesseract” in our project and put trained data in that folder. You can find trained data for any language from the below URL:
https://github.com/tesseract-ocr/tessdata
Just Download eng. trained data for English Language and put it into Tesseract Folder for your project.
- Add below is maven dependency for tesseract (tess4j):
<dependency> <groupId>net.sourceforge.tess4j</groupId> <artifactId>tess4j</artifactId> <version>4.5.4</version> </dependency>
- Below is the Java code to fetch text from the image:
ITesseract image = new Tesseract(); image.setDatapath(“Location for TessData Folder”); image.setLanguage(“eng”); String str1 = image.doOCR(new File(“Location Of Image”));
Read Also:- Process Java Script Executor in Selenium Test Automation
Fetch Text From PDF
- Add Below Maven Dependency For PDFUtil
<dependency> <groupId>com.testautomationguru.pdfutil</groupId> <artifactId>pdf-util</artifactId> <version>0.0.3</version> </dependency>
- Below Java Code is used to Read Text From PDF
String pdfLocation = “Location where we have PDF File”; PDFUtil pdfUtil = new PDFUtil(); String text = pdfUtil.getText(pdfLocation);
- Below Java Code is used to Save PDF as an Image
String folderLocation = “Location Where we need to save Image”; String pdfLocation = “Location where we have PDF File”; PDFUtil pdfUtil = new PDFUtil(); pdfUtil.setImageDestinationPath(folderLocation); pdfUtil.savePdfAsImage(pdfLocation);
- Below Java Code is used to Fetch Image From PDF
String folderLocation = “Location Where we need to save Image”; String pdfLocation = “ Location where we have PDF File”; PDFUtil pdfUtil = new PDFUtil(); pdfUtil.setImageDestinationPath(folderLocation); pdfUtil.extractImages(pdfLocation);