Horje
Java Program to Extract Paragraphs From a Word Document

The article demonstrates how to extract paragraphs from a word document using the getParagraphs() method of XWPFDocument class provided by the Apache POI package. Apache POI is a project developed and maintained by Apache Software Foundation that provides libraries to perform numerous operations on Microsoft office files using java. 

To extract paragraphs from a word file, the essential requirement is to import the following library of Apache.

poi-ooxml.jar

Approach

  1. Formulate the path of the word document
  2. Create a FileInputStream and XWPFDocument object for the word document.
  3. Retrieve the list of paragraphs using the getParagraphs() method.
  4. Iterate through the list of paragraphs to print it.

Implementation

  • Step 1: Getting the path of the current working directory where the word document is located.
  • Step 2: Creating a file object with the above-specified path.
  • Step 3: Creating a document object for the word document.
  • Step 4: Using the getParagraphs() method to retrieve the paragraphs list from the word file.
  • Step 5: Iterating through the list of paragraphs
  • Step 6: Printing the paragraphs
  • Step 7: Closing the connections

Sample Input

The content of the Word document is as follows:

Implementation

Example

Java

// Java program to extract paragraphs from a Word Document
  
// Importing IO package for basic file handling
import java.io.*;
import java.util.List;
// Importing Apache POI package
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
  
// Main class to extract paragraphs from word document
public class GFG {
  
    // Main driver method
    public static void main(String[] args) throws Exception
    {
  
        // Step 1: Getting path of the current working
        // directory where the word document is located
        String path = System.getProperty("user.dir");
        path = path + File.separator + "WordFile.docx";
  
        // Step 2: Creating a file object with the above
        // specified path.
        FileInputStream fin = new FileInputStream(path);
  
        // Step 3: Creating a document object for the word
        // document.
        XWPFDocument document = new XWPFDocument(fin);
  
        // Step 4: Using the getParagraphs() method to
        // retrieve the list of paragraphs from the word
        // file.
        List<XWPFParagraph> paragraphs
            = document.getParagraphs();
  
        // Step 5: Iterating through the list of paragraphs
        for (XWPFParagraph para : paragraphs) {
  
            // Step 6: Printing the paragraphs
            System.out.println(para.getText() + "\n");
        }
  
        // Step 7: Closing the connections
        document.close();
    }
}

Output




Reffered: https://www.geeksforgeeks.org


Java Programs

Related
Java Program for Median of two sorted arrays of same size Java Program for Median of two sorted arrays of same size
Java Program to Represent Graphs Using Linked List Java Program to Represent Graphs Using Linked List
Java Program to Find Independent Sets in a Graph By Graph Coloring Java Program to Find Independent Sets in a Graph By Graph Coloring
Implementing Rabin Karp Algorithm Using Rolling Hash in Java Implementing Rabin Karp Algorithm Using Rolling Hash in Java
Java Program to Implement Leftist Heap Java Program to Implement Leftist Heap

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
10