Huffman Coding Java - Coding

The Huffman coding is a popular algorithm used for lossless data compression. It works by assigning the variable-length codes to the input characters with the shorter codes assigned to the more frequent characters. This results in the prefix-free code meaning no code is a prefix of any other code. The Huffman coding guarantees an optimal prefix-free encoding minimizing the total number of bits required to encode a message.

Data Structure Organization:

The Huffman coding organizes data using the binary tree called a Huffman tree. The tree is constructed using the priority queue or a min-heap to efficiently find the two lowest frequency nodes and merge them into the new node. This process continues until all nodes are merged into a single root node forming the Huffman tree.

Root

/ \

Node1 Node2

/ \ / \

… … … …

Implementation:

The Huffman coding can be implemented using various data structures such as the arrays linked lists or priority queues. One common approach is to the use a priority queue to efficiently select the two lowest frequency nodes for the merging. This implementation typically involves the following steps:

Calculate the frequency of each input character.
Create a leaf node for each character and add it to the priority queue.
While there is more than one node in the priority queue:
Remove the two nodes with the lowest frequency.
They merge them into a new internal node with the sum of their frequencies.
The Add the new node back to priority queue.
The remaining node in priority queue becomes the root of the Huffman tree.

Basic Operations:

Encoding: The Generate the Huffman codes for the each input character based on the constructed Huffman tree.The Time complexity: O(n log n) where n is the number of the unique characters.
Decoding: The Decode a sequence of the Huffman codes back into the original message using the Huffman tree. The Time complexity: O(n) where n is the length of the encoded message.
Compression: The Encode a message using the generated Huffman codes. The Space complexity: The Depends on the efficiency of the Huffman tree typically close to the entropy of the message.
Decompression: The Decode a compressed message back into the original message. The Space complexity: Depends on the size of the compressed message and efficiency of the Huffman tree.

Program:

Java

// Java Program to Implement
// Huffman Coding
import java.util.PriorityQueue;
import java.util.HashMap;

// Class representing a node in the Huffman Tree
class HuffmanNode {
      // Character data
    char data;           
      
      // Frequency of the character
    int frequency;       
  
      // Left and right child nodes
    HuffmanNode left, right; 

    // Constructor to initialize the node
    HuffmanNode(char data, int frequency) {
        this.data = data;
        this.frequency = frequency;
        left = right = null;
    }
}

// Driver Class
public class HuffmanCoding {
      // Main Function
    public static void main(String[] args) {
      
        // The message to be encoded
        String message = "Huffman coding is a lossless data compression algorithm.";

        // Create a frequency map to count the frequency of each character
        HashMap<Character, Integer> frequencyMap = new HashMap<>();
      
        for (char c : message.toCharArray()) {
            frequencyMap.put(c, frequencyMap.getOrDefault(c, 0) + 1);
        }

        // Create a priority queue to build the Huffman Tree
        PriorityQueue<HuffmanNode> priorityQueue =
          new PriorityQueue<>((a, b) -> a.frequency - b.frequency);

        // Create a Huffman node for each character and add it to the priority queue
        for (char c : frequencyMap.keySet()) {
            priorityQueue.add(new HuffmanNode(c, frequencyMap.get(c)));
        }

        // Build the Huffman Tree
        while (priorityQueue.size() > 1) {
            // Remove the two nodes with the lowest frequency
            HuffmanNode left = priorityQueue.poll();
            HuffmanNode right = priorityQueue.poll();

            // Create a new internal node with these two nodes
              // as children and add it back to the queue
            HuffmanNode newNode =
              new HuffmanNode('$', left.frequency + right.frequency);
          
            newNode.left = left;
            newNode.right = right;
            priorityQueue.add(newNode);
        }

        // The remaining node is the root of the Huffman Tree
        HuffmanNode root = priorityQueue.poll();

        // Print the Huffman codes for each character
        System.out.println("Huffman codes:");
        printCodes(root, new StringBuilder());
    }

    // Method to print the Huffman codes
    public static void printCodes(HuffmanNode root, StringBuilder code) {
        if (root == null) return;

        // If this is a leaf node, print the character and its code
        if (root.data != '$') {
            System.out.println(root.data + ": " + code);
        }
        
        // Traverse the left subtree
        if (root.left != null) {
            printCodes(root.left, code.append('0'));
            code.deleteCharAt(code.length() - 1);
        }
        
        // Traverse the right subtree
        if (root.right != null) {
            printCodes(root.right, code.append('1'));
            code.deleteCharAt(code.length() - 1);
        }
    }
}

Output

Huffman codes:
a: 000
m: 0010
n: 0011
.: 01000
p: 010010
u: 010011
l: 0101
s: 011
 : 100
f: 10100
g: 10101
r: 10110
d: 10111
H: 110000
h: 110001
c: 11001
i: 1101
t: 11100
e: 11101
o: 1111

Applications of Huffman Coding

Data compression: The Huffman coding is widely used in the file compression algorithms like ZIP, gzip and JPEG.
Telecommunication: The Huffman coding is used in the voice and image compression to the reduce bandwidth usage.
Error detection and correction: The Huffman codes with the error-detecting capabilities are used in the applications like QR codes and barcodes.

Conclusion

The Huffman coding provides an efficient method for the lossless data compression by the assigning the variable-length codes to the input characters based on their frequencies. Its applications span across the various domains making it a fundamental algorithm in the computer science and telecommunications. By understanding the Huffman coding and its implementation one can appreciate its significance in the optimizing data storage and transmission.

Reffered: https://www.geeksforgeeks.org

Java

Related
Balanced Binary Tree in Java
Adding JGit to the Project with Maven
What is the difference between ArrayList.clear() and ArrayList.removeAll()?
Java XML Parsers
Java Program to Evaluate an Expression using Stacks

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	14