This post is part of The Software Architecture Chronicles, a series of posts about Software Architecture. In them, I write about what I’ve learned on Software Architecture, how I think of it, and how I use that knowledge. The contents of this post might make more sense if you read the previous posts in this series.
Using events to design applications is a practice that seems to be around since the late 1980s. We can use events anywhere in the frontend or backend. When a button is pressed, when some data changes or some backend action is performed.
But what is it exactly? When should we use it and how? What are the downsides?
Just like classes, components should have low coupling between them but be high cohesive within them. When components need to collaborate, let’s say a component “A” needs to trigger some logic in component “B”, the natural way to do it is simply to have component A call a method in an object of component B. However, if A knows about the existence of B, they are coupled, A depends on B, making the system more difficult to change and maintain. Events can be used to prevent coupling.
Furthermore, as a side effect of using events and having decoupled components, if we have a team working only on component B, it can change how component B reacts to logic in component A without even talking to the team responsible for component A. Components can evolve independently: our application becomes more organic.
Even within the same component, sometimes we have code that needs to be executed as a result of an action, but it doesn’t need to be executed immediately, in the same request/response. The most flagrant example is sending out an email. In this case, we can immediately return a response to the user and send out the email at a later time, in an async fashion and avoid having the user waiting for an email to be sent.
Nevertheless, there are dangers to it. If we use it indiscriminately, we run the risk of ending up with logic flows that are conceptually highly cohesive but wired up together by events which are a decoupling mechanism. In other words, code that should be together will be separated and will be difficult to track its flow (kind of like the goto statement), to understand it and reason about it: It will be spaghetti code!
To prevent turning our codebase into a big pile of spaghetti code, we should keep the usage of events limited to clearly identified situations. In my experience, there are three cases in which to use events:
- To decouple components
- To perform async tasks
- To keep track of state changes (audit log)
1. To decouple components
When component A performs the logic that needs to trigger the component B logic, instead of calling it directly, we can trigger an event sending into an event dispatcher. Component B will be listening to that specific event in the dispatcher and will act whenever the event occurs.
This means that both A and B will be depending on the dispatcher and the event, but they will have no knowledge of each other: they will be decoupled.
Ideally, both the dispatcher and the event should live in neither component:
- The dispatcher should be a library completely independent of our application and therefore installed in a generic location using a dependency management system. In PHP world, this is something installed in the vendor folder, using Composer.
- The event, nevertheless, is part of our application but should live outside both components as to keep them unaware of each other. The event is shared between the components and it is part of the core of the application. Events are part of what DDD calls the Shared Kernel. This way, both components will depend on the Shared Kernel but will remain unaware of each other.
Nevertheless, in a Monolithic application, for convenience, it is acceptable to place it in the component that triggers the event.
[…] Designate with an explicit boundary some subset of the domain model that the teams agree to share. Keep this kernel small. […] This explicitly shared stuff has special status, and shouldn’t be changed without consultation with the other team.
Eric Evans 2014, Domain-Driven Design Reference
2. To perform async tasks
Sometimes we have a piece of logic that we want to have executed, but it might take quite some time executing and we don’t want to keep the user waiting for it to finish. In such a case, it is desirable to have it run as an async job and immediately return a message to the user informing him that his request will be executed later, asynchronously.
For example, placing an order in a webshop could be done in sync, but sending an email notifying the user could be done async.
In such cases, what we can do is trigger an event that will be queued, and will sit in the queue until a worker can pick it up and execute it whenever the system has resources for it.
In these cases, it doesn’t really matter if the associated logic is in the same bounded context or not, either way, the logic is decoupled.
3. To keep track of state changes (audit log)
In the traditional way of storing data, we have entities holding some data. When the data in those entities change, we simply update a DB table row to reflect the new values.
The problem here is that we do not store exactly what changed and when.
We can store events containing the changes, in an audit log kind of construction.
More about this further ahead, in the explanation about Event Sourcing.
Martin Fowler identifies three different types of event patterns:
- Event Notification
- Event-Carried State Transfer
All of these patterns share the same key concepts:
- Events communicate that something has happened (they occur after something);
- Events are broadcasted to any code that is listening (several code units can react to an event).
Let’s suppose we have an application core with clearly defined components. Ideally, those components are completely decoupled from each other but, some of their functionality requires some logic in other components to be executed.
This is the most typical case, which was described earlier: When component A performs the logic that needs to trigger the component B logic, instead of calling it directly, it triggers an event sending it to an event dispatcher. Component B will be listening to that specific event in the dispatcher and will act whenever the event occurs.
It is important to note that, a characteristic of this pattern is that the event carries minimal data. It carries only enough data for the listener to know what happened and carry out their code, usually just entity ID(s) and maybe the date and time that the event was created.
- Greater resilience, if the events are queued the origin component can perform its logic even if the secondary logic can not be performed at that moment because of a bug (since they are queued, they can be executed later, when the bug is fixed);
- Reduced latency, if the event is queued the user does not need to wait for that logic to be executed;
- Teams can evolve the components independently, making their work easier, faster, less prone to problems, and more organic;
- If used without criteria, it has the potential to transform the codebase in a pile of spaghetti code.
Event-Carried State Transfer
Let’s consider again the previous example of an application core with clearly defined components. This time, for some of their functionality, they need data from other components. The most natural way of getting that data is to ask the other components for it, but that means the querying component will know about the queried component: the components will be coupled to each other!
Another way of sharing this data is by using events that are triggered when the component that owns the data, changes it. The event will carry the whole new version of the data with them. The Components interested in that data will be listening to those events and will react to them by storing a local copy of that data. This way, when they need that external data, they will have it locally, they will not need to query the other component for it.
- Greater resilience, since the querying components can function if the queried component becomes unavailable (either because there is a bug or the remote server is unreachable);
- Reduced latency, as there’s no remote call (when the queried component is remote) required to access the data;
- We don’t have to worry about load on the queried component to satisfy queries from all the querying components (especially if it’s a remote component);
- There will be several copies of the same data, although they will be read-only copies and data storage is not a problem nowadays;
- Higher complexity of the querying component, as it will need the logic to maintain a local copy of the external data although this is pretty standard logic.
Maybe this pattern is not necessary if both components execute in the same process, which makes for fast communication between components, but even then it might be interesting to use it for decoupling and maintainability sake or as a preparation for decoupling those components into different microservices, sometime in the future. It all depends on what are our current needs, future needs and how far we want/need to go with decoupling.
Let’s assume an Entity in its initial state. Being an Entity, it has its own identity, it’s a specific thing in the real world, which the application is modelling. Along its lifetime, the Entity data changes and, traditionally, the current state of the entity is simply stored as a row in a DB.
This is fine for most cases, but what happens if we need to know how the Entity reached that state (ie. we want to know the credits and debits of our bank account)? It’s not possible because we only store the current state!
Using event sourcing, instead of storing the Entity state, we focus on storing the Entity state changes and computing the Entity state from those changes. Each state change is an event, stored in an event stream (ie. a table in an RDBMS). When we need the current state of an Entity, we calculate it from all of its events in the event stream.
The event store becomes the principal source of truth, and the system state is purely derived from it. For programmers, the best example of this is a version-control system. The log of all the commits is the event store and the working copy of the source tree is the system state.
Greg Young 2010, CQRS Documents
If we have a state change (event) that is a mistake, we can not simply delete that event because that would change the state change history, and it would go against the whole idea of doing event sourcing. Instead, we create an event, in the event stream, that reverses the event that we would like to delete. This process is called a Reversal Transaction, and not only brings the Entity back to the desired state but also leaves a trail that shows that the object had been in that state at a given point in time.
There are also architectural benefits to not deleting data. The storage system becomes an additive only architecture, it is well known that append-only architectures distribute more easily than updating architectures because there are far fewer locks to deal with.
Greg Young 2010, CQRS Documents
However, when we have many events in an event stream, computing an Entity state will be costly, it will not be performant. To solve this, every X amount of events we will create a snapshot of the Entity state at that point in time. This way, when we need the entity state, we only need to calculate it up to the last snapshot. Hell, we can even keep a permanently updated snapshot of the Entity, that way we have the best of both worlds.
In event sourcing, we also have the concept of a projection, which is the computation of the events in an event stream, from and to specific moments. This means that a snapshot, or the current state of an entity, fits the definition of a projection. But the most valuable idea in the concept of projections, is that we can analyse the “behaviour” of Entities during specific periods of time, which allows us to make educated guesses about the future (ie. if in the past 5 years an Entity had increased activity during August, it’s likely that next August the same will happen), and this can be an extremely valuable capability for the business.
Pros and cons
Event sourcing can be very useful for both the business and the development process:
- we query these events, useful both for business and for development to understand the users and system behaviour (debugging);
- we can also use the event log to reconstruct past states, again useful both for business and for development;
- automatically adjust the state to cope with retroactive changes, great for business;
- explore alternative histories by injecting hypothetical events when replaying, awesome for business.
But not everything is good news, be aware of hidden problems:
When our events trigger updates in external systems, we do not want to retrigger those events when we are replaying the events in order to create a projection. At this point, we can simply disable the external updates when we are in “replay mode”, maybe encapsulating that logic in a gateway.
Another solution, depending on the actual problem, might be to buffer the updates to external systems, performing them after a certain amount of time, when it is safe to assume that the events will not be replayed.
When our events use a query to an external system, ie. getting stock bonds ratings, what happens when we are replaying the events in order to create a projection? We might want to get the same ratings that were used when the events were run for the first time, maybe years ago. So either the remote application can give us those values or we need to store them in our system so we can simulate the remote query, again, by encapsulating that logic in the gateway.
Martin Fowler identifies 3 types of code changes: new features, bug fixes, and temporal logic. The real problem comes when replaying events which should be played with different business logic rules at different moments in time, ie. last year tax calculations are different than this year. As usual, conditional logic can be used but it will become messy, so the advice it to use a strategy pattern instead.
So, I advise caution, and I follow these rules whenever possible:
- Keep events dumb, knowing only about the state change and not how it was decided. That way we can safely replay any event and expect the result to be the same even if the business rules have changed in the meantime (although we will need to keep the legacy business rules so we can apply them when replaying past events);
- Interactions with external systems should not depend on these events, this way we can safely replay events without the danger of retriggering external logic and we don’t need to ensure the reply from the external system is the same as when the event was played originally.
And, of course, like any other pattern, we don’t need to use it everywhere, we should use it where it makes sense, where it brings us an advantage and solves more problems than it creates.
It’s, again, mostly about encapsulation, low coupling and high cohesion.
Events can leverage tremendous benefits to the maintainability, performance and growth of the codebase, but, with event sourcing, also to the reliability and information that the system data can provide.
Nevertheless, it’s a path with its own dangers because both the conceptual and technical complexity increase and a misuse in either of them can have disastrous outcomes.
2005 • Martin Fowler • Event Sourcing
2006 • Martin Fowler • Focusing on Events
2010 • Greg Young • CQRS Documents
2014 • Greg Young • CQRS and Event Sourcing – Code on the Beach 2014
2014 • Eric Evans • Domain-Driven Design Reference
2017 • Martin Fowler • What do you mean by “Event-Driven”?
2017 • Martin Fowler • The Many Meanings of Event-Driven Architecture