Kafka只消费一次保证

Kafka只消费一次保证

问题描述:

我在一些关于堆栈溢出的回答中看到,并且在网络上看到 Kafka 不支持消费确认或很难实现一次消费的想法.

I see in some answers around stack-overflow and in general in the web the idea that Kafka does not support consumption acknowledge or that exactly once consumption is hard to achieve.

在以下条目中作为示例是否有理由在 Kafka 上使用 RabbitMQ?,我可以阅读以下语句:

In the following entry as a sample Is there any reason to use RabbitMQ over Kafka?, I can read the following statements:

RabbitMQ 将保留有关已消费/已确认/未确认消息的所有状态,而 Kafka 不会

RabbitMQ will keep all states about consumed/acknowledged/unacknowledged messages while Kafka doesn't

使用 Kafka 很难获得 Exactly once 保证.

Exactly once guarantees are hard to get with Kafka.

这不是我阅读官方 Kafka 文档所理解的:https://kafka.apache.org/documentation/#design_consumerposition

This is not what I understand by reading the official Kafka documentation at: https://kafka.apache.org/documentation/#design_consumerposition

之前的文档指出 Kafka 不使用传统的确认实现(如 RabbitMQ).相反,他们依赖于关系 partition-consumer 和 offset...

The previous documentation states that Kafka does not use a traditional acknowledge implementation (as RabbitMQ). Instead they rely on the relationship partition-consumer and offset...

这使得消息确认等价物非常便宜

This makes the equivalent of message acknowledgements very cheap

谁能解释一下为什么Kafka中的只消费一次保证"很难实现?以及这与 Kafka 与其他更传统的 Message Broker(如 RabbitMQ)有何不同?我错过了什么?

Could somebody please explain why "only once consumption guarantee" in Kafka is difficult to achieve? and How this differs from Kafka vs other more traditional Message Broker as RabbitMQ? What am I missing?

如果你的意思是恰好一次问题是这样的.您可能知道 Kafka 消费者使用轮询机制,即消费者向服务器询问消息.此外,您需要记住消费者提交消息偏移量,即它告诉集群下一个预期的偏移量是什么.所以,想象一下会发生什么.

If you mean exactly once the problem is like this. Kafka consumer as you may know use a polling mechanism, that is consumers ask the server for messages. Also, you need to recall that the consumer commit message offsets, that is, it tells the cluster what is the next expected offset. So, imagine what could happen.

消费者轮询消息并获取偏移量为 1 的消息.

Consumer poll for messages and get message with offset = 1.

A) 如果消费者在处理消息之前立即提交该偏移量,那么它可能会崩溃并且永远不会再次收到该消息,因为它已经提交,在下一次轮询时,Kafka 将返回偏移量 = 2 的消息.这就是他们所说的最多一次语义.

A) If consumer commit that offset immediately before processing the message, then it can crash and will never receive that message again because it was already committed, on next poll Kafka will return message with offset = 2. This is what they call at most once semantic.

B) 如果消费者先处理消息然后提交偏移量,可能发生的情况是在处理消息之后但在提交之前,消费者崩溃,因此在这种情况下,下一次轮询将再次获得偏移量为 1 的相同消息并且该消息将被处理两次.他们至少有一次这样称呼.

B) If consumer process the message first and then commit the offset, what could happen is that after processing the message but before committing, the consumer crashes, so in that case next poll will get again the same message with offset = 1 and that message will be processed twice. This is what they call at least once.

为了只实现一次,您需要处理消息并在原子操作中提交该偏移量,您总是两者都做或不做.这不是那么容易.执行此操作的一种方法(如果可能)是存储处理结果以及生成该结果的消息的偏移量.然后,当消费者启动时,它会在 Kafka 之外查找最后处理的偏移量并寻找该偏移量.

In order to achieve exactly once, you need to process the message and commit that offset in an atomic operation, where you always do both or none of them. This is not so easy. One way to do this (if possible) is to store the result of the processing along with the offset of the message that generated that result. Then, when consumer starts it looks for the last processed offset outside Kafka and seek to that offset.