![]() |
Apache Tika is a library that allows you to extract data from different documents(.PDF, .DOCX, etc.). In this tutorial, we will extract data by using BodyContentHandler.Next dependency that will be used is shown below: <dependency> <groupId>org.apache.tika < / groupId > <artifactId>tika - parsers < / artifactId > <version>1.26 < / version > < / dependency > BodyContentHandler is a class decorator that allows one to get everything inside XHTML <body> tag. <body> or <body/> will not be included into result value. Let us discuss first various constructors of this class is as follows:
The methods of this class is as follows:
Implementation: Example 1: Reading everything into the inner string buffer Java
Example 2: Writing content into a file with specifying the maximum content length
Java
Output:
There is nothing visible on the console window as there it files directory mapping where in this case it tries to write all information into a file
The program results in a ‘.txt’ with ‘.pdf’ file content which is as follows:
|
Reffered: https://www.geeksforgeeks.org
Java |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 11 |