BodyContentHandler Class in Java - Coding

Apache Tika is a library that allows you to extract data from different documents(.PDF, .DOCX, etc.). In this tutorial, we will extract data by using BodyContentHandler.Next dependency that will be used is shown below:

<dependency>
<groupId>org.apache.tika < / groupId >
<artifactId>tika - parsers < / artifactId >
<version>1.26 < / version >
< / dependency >

BodyContentHandler is a class decorator that allows one to get everything inside XHTML <body> tag. <body> or <body/> will not be included into result value.

Let us discuss first various constructors of this class is as follows:

BodyContentHandler()	Writes all content into an internal string buffer, to get content just call toString(). By default, the maximum content length is 100 000 characters. If this limit is reached, a SAXException will be thrown.
BodyContentHandler(writeLimit)	Writes all content into an internal string buffer, to get content just call toString(). ‘write limit’ is the maximum number of characters that can be read, set -1 to disable the limit. If this limit is reached, a SAXException will be thrown.
BodyContentHandler(OutputStream outputStream)	Writes all content into a given outputStream. Without any content limit.
BodyContentHandler(Writer writer)	Writes all content into a given writer. Without any content limit.
BodyContentHandler(ContentHandler handler)	Passes all content to a given handler.

The methods of this class is as follows:

Method	Action Performed
MatchingContentHandler	Allows you to get data by XPath

Note: BodyContentHandler class doesn’t implement any method of ContentHandler interface, it just describes XPath for MatchingContentHandler to get XHTML body content.

Implementation:

Example 1: Reading everything into the inner string buffer

Java

// Java Program to Read Everything into Inner String Buffer
 
// Main class
public class GFG {
 
    // Method 1
    // To parse the string
    public String parseToStringExample(String fileName)
        throws IOException, TikaException, SAXException
    {
 
        // Creating an object of InputStream class
        InputStream stream
            = this.getClass()
                  .getClassLoader()
                  .getResourceAsStream(fileName);
 
        Parser parser = new AutoDetectParser();
        ContentHandler handler = new BodyContentHandler();
        Metadata metadata = new Metadata();
        ParseContext context = new ParseContext();
 
        // Parsing the string
        parser.parse(stream, handler, metadata, context);
 
        return handler.toString();
    }
 
    // Method 2
    // Main driver method
    public static void main(String[] args)
        throws TikaException, IOException, SAXException
    {
 
        // Creating object of main class in main method
        GFG example = new GFG();
 
        // Display message for better readability
        System.out.println("Result");
 
        // Calling the method 1 to parse string by
        // providing file as an argument
        System.out.println(example.parseToStringExample(
            "test-reading.pdf"));
    }
}

Output:

Example 2: Writing content into a file with specifying the maximum content length

Java

// Java Program to Write Content into File by
// Specifying the Maximum Content Length
 
// Main class
// BodyContentHandlerWriteToFileExample
public class GFG {
 
    // Method 1
    // Main driver method
    public static void main(String[] args)
        throws TikaException, IOException, SAXException
    {
 
        // Creating an object of the class
        GFG example = new GFG();
 
        // Calling the Method 2 in main() method and
        // passing the file and directory path as arguments
        // to it
        example.writeParsedDataToFile(
            "test-reading.pdf",
            "/Users/ali_zhagparov/Desktop/pdf-content.txt");
    }
 
    // Method 2
    // Writing parsed data to a file
    public void
    writeParsedDataToFile(String readFromFileName,
                          String writeToFileName)
        throws IOException, TikaException, SAXException
    {
 
        // Creating an object of InputStream
        InputStream stream
            = this.getClass()
                  .getClassLoader()
                  .getResourceAsStream(readFromFileName);
 
        // Creating an object of File class
        File yourFile = new File(writeToFileName);
 
        // If file is already existing then
        // no operations to be performed
        yourFile.createNewFile();
 
        FileOutputStream fileOutputStream
            = new FileOutputStream(yourFile, false);
        Parser parser = new AutoDetectParser();
        ContentHandler handler
            = new BodyContentHandler(fileOutputStream);
        Metadata metadata = new Metadata();
        ParseContext context = new ParseContext();
 
        parser.parse(stream, handler, metadata, context);
    }
}

Output:

There is nothing visible on the console window as there it files directory mapping where in this case it tries to write all information into a file

The program results in a ‘.txt’ with ‘.pdf’ file content which is as follows:

Reffered: https://www.geeksforgeeks.org

Java

Related
Getting Your Own Device IP Address using Java
Spring - IoC Container
java.time.OffsetTime Class in Java
java.time.format.DateTimeFormatterBuilder Class in Java
Why Thread.stop(), Thread.suspend(), and Thread.resume() Methods are Deprecated After JDK 1.1 Version?

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	11