Effective Strategy to Avoid Duplicate Messages in Apache Kafka Consumer - Coding

Apache Kafka is a good choice for distributed messaging systems because of its robust nature. In this article, we will explore advanced strategies to avoid duplicate messages in Apache Kafka consumers.

Challenge of Duplicate Message Consumption

Apache Kafka’s at-least-once delivery system ensures message durability, and it can result in messages being delivered more than once. This becomes particularly challenging in scenarios involving network disruptions, consumer restarts, or Kafka rebalances. It is essential to implement strategies that guarantee to avoid message duplication without compromising the system’s reliability.

Comprehensive Strategies to Avoid Duplicate Messages

Below are some strategies that avoid duplicate messages in Apache Kafka Consumer.

1. Consumer Group IDs and Offset Management

Ensuring unique consumer group IDs is foundational to preventing conflicts between different consumer instances. Additionally, effective offset management is important. Storing offsets in an external and persistent storage system allows consumers to resume processing from the last successfully processed message in the event of failures. This practice enhances the resilience of Kafka consumers against restarts and rebalances.

Java

Properties properties = new Properties();
properties.put("bootstrap.servers",
               "your_kafka_bootstrap_servers");
properties.put("group.id", "unique_consumer_group_id");

KafkaConsumer<String, String> consumer
    = new KafkaConsumer<>(properties);

// Manually managing offsets
consumer.subscribe(Collections.singletonList("your_topic"));
ConsumerRecords<String, String> records
    = consumer.poll(Duration.ofMillis(100));

for (ConsumerRecord<String, String> record : records) {
    // Process message

    // Manually commit offset
    consumer.commitSync(Collections.singletonMap(
        new TopicPartition(record.topic(),
                           record.partition()),
        new OffsetAndMetadata(record.offset() + 1)));
}

2. Transaction-Aware Consumer

Implementing idempotency on the consumer side is inherently more complex and resource-intensive. Additionally, it is advantageous to allow greater flexibility at the consumer listener level, enabling tailored idempotency handling based on specific requirements and operational contexts. So, we indicate with isolation.level that we should wait to read transactional messages until the associated transaction has been committed:

Java

Properties properties = new Properties();
properties.put("bootstrap.servers",
               "your_kafka_bootstrap_servers");
properties.put("group.id", "unique_consumer_group_id");
properties.put("enable.auto.commit", "false");
properties.put("isolation.level", "read_committed");

KafkaConsumer<String, String> consumer
    = new KafkaConsumer<>(properties);

// Consume messages as usual

3. Transaction Support

Kafka’s transactional support is a robust strategy to achieve exactly once semantics. By processing messages within a transaction, consumers can ensure atomicity between message processing and offset commits. In case of processing errors, the transaction is rolled back, preventing offset commits and subsequent message consumption until the issue is resolved.

Java

consumer.beginTransaction();
try {
    // Process message
    consumer.commitTransaction();
}
catch (Exception e) {
    // Handle error
    consumer.rollbackTransaction();
}

4. Dead Letter Queues (DLQs)

Implementing Dead Letter Queues for Kafka consumers involves redirecting problematic messages to a separate queue for manual inspection. This approach facilitates isolating and analyzing messages that fail processing, enabling developers to identify and address the root cause before considering reprocessing.

Java

// Assuming a DLQ topic named "your_topic_dlq"
KafkaProducer<String, String> dlqProducer
    = new KafkaProducer<>(dlqProperties);

try {
    // Process message
    dlqProducer.send(new ProducerRecord<>(
        "your_topic_dlq", record.key(), record.value()));
}
catch (Exception e) {
    // Handle error
}

5. Message Deduplication Filters

This filter maintains a record of processed message identifiers, allowing the consumer to identify and discard duplicates efficiently. This approach is particularly effective when strict ordering of messages is not a critical requirement.

Java

Set<String> processedMessageIds = new HashSet<>();

ConsumerRecords<String, String> records
    = consumer.poll(Duration.ofMillis(100));

for (ConsumerRecord<String, String> record : records) {
    // Check if the message ID has been processed
    if (!processedMessageIds.contains(record.key())) {
        // Process message

        // Add the message ID to the set
        processedMessageIds.add(record.key());
    }
}

Reffered: https://www.geeksforgeeks.org

Advance Java

Related
How to Configure AuditListener in Spring Boot Application
Server - Sent Events in Spring
Spring Security Login Page with React
Inject a Map from a YAML File With Spring
Create a Backend Task Management System using Microservices

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	13