StreamX developer contribution merged into Apache Pulsar 4.0.0 release

by Anna Szemiot
5 min read

Apache Pulsar 4.0.0, a recently released Long-Term Support (LTS) version, includes a significant contribution from one of StreamX software engineers, Marek Czajkowski.

Marek introduced an alternative event-time-based topic compaction algorithm in response to PIP (Pulsar Improvement Proposal) #352 that he also contributed to.

Apache Pulsar is one of the open-source projects that we use under the hood of our StreamX Digital Experience Mesh. 

Why there was a need for event-time-based compaction

Pulsar's topic compaction is a key feature that helps manage storage and improve efficiency by retaining only the most recent message for each key. However, existing compaction methods, primarily the 'TwoPhaseCompactor', rely on the order in which messages are published.

This can be problematic when network latencies or message redeliveries disrupt the intended order of events as perceived by external applications:

Pulsar Improvement Proposal

PIP-352: Event time based topic compactor

Read on Github

Introducing the Event Time Order Compactor

Marek addressed this limitation by developing the 'EventTimeOrderCompactor', a new type of topic compactor that leverages the event time associated with each message. This makes sure that messages are compacted based on the actual time of the event they represent, regardless of their arrival order in the Pulsar topic.

Design and implementation

Key elements of Marek's work included:

  • Abstracting the 'TwoPhaseCompactor' that provides a foundation for creating different types of compactors, including the new event-time-based compactor,

  • Developing the 'EventTimeCompactionServiceFactory' that makes configuration and deployment of the 'EventTimeOrderCompactor' possible

  • Implementing the 'EventTimeOrderCompactor', which incorporates the logic for event-time-based message comparison during compaction,

  • Introducing 'MessageCompactionData', that acts as a container for data relevant to compaction, streamlining the process,

  • Unit Testing, added to validate the functionality and behaviour of the new compactor.

Pull Request #22517

[feat] PIP-352: Event time based compaction

Read on Github

How to use the Event Time Order Compactor

The new EventTimeOrderCompactor can be enabled through the existing configuration property: 'compactionServiceFactoryClassName'

The relevant configuration files ('broker.conf' and 'standalone.conf') have been updated with information to guide users on enabling this new functionality.