Out of order events is a challenge not just in stream processing but also in tradicional ETL systems. Because in a normal ETL you process your data in a batch oriented way, this can be controlled, but in a multiple sources and real time environment this needs to be deal with.
Out-of-sequence events happen quite frequently and expectedly in messaging services, microservices, IoT,
amoung a vastly range of scenarios. This causes that messages can be delivered with delay to a queue
in a order that isn't the correct one.
Although the event (normally) has the creation date, we need to enable our stream applications to be able to handle those scenarios (by default, they don't...). This typically means the application has to do the following:
- Recognize that an event is out of sequence
- Define a time period during which it will attempt to reconcile out-of-sequence events
- Have a in-band capability to reconcile this event. This is the main difference between streaming apps and batch jobs
- Be able to update results (technology dependent)
Although there's several frameworks that have built-in support for event time and processing time, there's know limitations that should be taken into account (e.g.: Memory, complexity).
But when you think on event driven applications, you should be prepared for a lot os situations that can - and will -
occur, like event duplication, consistency and quality. But this will be another discussion topic for other
Finf out more on this matter at Confluent and get yourself a bundle set of ebooks for free.