There are 3 patterns to handle data persistence to relational DBs:
- Active Record
- Data Mapper
Associated to this patterns, we have a few more design patterns to help implement the previous ones:
- Unit of work
- Identity map
- Lazy loading
And we also have 3 patterns to structure the data in the DB:
- Single table inheritance
- Concrete table inheritance
- Class table inheritance
This is the simplest approach to persisting objects. Each entity represents a table row and knows how to connect to the DB, persist itself to the DB and find other sibling objects from the DB, using static methods.
An object that wraps a row in a DB table or view, encapsulates the DB access, and adds domain logic to that data.
This pattern may be usable in simple projects, however, it has an enormous amount of responsibilities, breaking the SRP. As complexity grows, and we refactor the domain objects into smaller objects, we lose the one-on-one mapping of objects to table rows and the pattern breaks.
The gateway represents a table in the DB and takes care of connecting, persisting and finding objects in the DB.
These gateways should be as simple as possible, generic, mechanic, reusable, maybe even generated. They should know nothing about business logic.
The problem is that, because they are supposed to be as simple as possible, they make a direct mapping from table column to object property. If changes are done to either the model or the DB, those changes need to be reflected in its counterpart.
The data mapper makes the translation between a set of data ans one or several objects and from an object into one or several tables. It can, and should, use some kind of entity management system to cache the objects that were already extracted from the DB.
However, the concrete task of connecting, finding and persisting a set of data in the DB should be left to a data gateway. A common option is to implement a data mapper as a decorator around a data gateway.
How to load and save objects
Repeatedly persisting and loading the same object to/from the DB can have a severe impact in performance, therefore that is something we don’t want to be doing. We also need to prevent concurrently changing and persisting different instances that represent the same data.
To prevent this, we need to keep track of the objects that are loaded, in order to:
- Not read an object from the DB more than once
- Not write an object to the DB more than once
- Always use the same object version for all editing operations (prevent concurrency)
- Limit the entity nested graph loaded
There are a few patterns that can help us with this.
Unit of work
Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.
There are a few ways to implement this pattern, but the only one that allows for transparency (implicit persistence of changed entities) and decoupling of the entity from the UoW is to create a snapshot of the entity when loading it and comparing each managed entity to its snapshot when flushing the managed data into the DB, so that we only persist the modified objects. This adds some computation overhead, but it might be worth it performance wise and it sure is worth it, maintenance wise.
Ensures that each object gets loaded only once by keeping every loaded object in a map. Looks up objects using the map when referring to them.
The identity map is the structure that is used by the unit of work to keep track of the loaded entities. Whenever we ask for an entity that has already been loaded once, the unit of work return a second reference to it.
As a side effect, the identity map works as a caching mechanism, but its primary goal is to maintain unique entities in memory.
An object that doesn’t contain all the data you need but know how to get it.
When we have many nested entities, when loading one entity from the DB, we may end up loading a huge set of data that maybe we don’t even need.
By using lazy loading we store a reference to the nested entity, instead of the entity itself. When we first access the nested entity we load the entity from the DB.
The big problem though is the “ripple loading”, aka “N+1 problem”, which happens when we have a list of entities with nested lazy loaded entities. When we loop through the list, we will be loading the nested entities one by one, instead of all in one go. To solve this we can use eager loading (pre-load the nested entities) when loading lists of entities.
Martin Fowler gives us a few rules of thumb to follow when reading data from the DB. Unfortunately only one of them is not outdated:
- Its often better to pull out more rows and filter out the ones not needed then issuing one query for each row.
Structural Mapping Patterns
Mapping 1-1, 1-N, N-N relationships are, now days, common practise and quite easy with the available ORMs (not the case in 2004). But it is worth mentioning the relation between entities and value objects.
In this later case, we have 2 options:
- If we need to query by the members of the value objects, we can explode the VO members into columns of the entity object to which it belongs to.
- If we don’t need to query by the members of the value object, we can store it as a string in one column of the entity. This string can be any type of object serialization format.
Nevertheless, it is worth mention that, now days, using MySQL 5.7 we can also store a VO as JSON string and still query the VO members, not to talk about all the NO-SQL options currently available.
How do we map an object inheritance tree into a relational model?
As usual there is on silver bullet, we need to look at our context and make the best decision possible, but we have 3 options:
Single Table Inheritance
We store all classes in one same table. The problem here is that some columns are only going to be used by a subset of the class hierarchy, so we end up with a non normalized table.
This option avoids joins and is easy to modify, but we end up wasting space and having a huge table.
Concrete Table Inheritance
Maps each class to a table and each table has the columns of the parent classes.
This option also avoids joins but with any change to the object inheritance we need to update several tables.
Class Table Inheritance
Creates a table for each class, including the classes in the hierarchy. This makes it necessary to use joins to get a full object. Although in the other hand, hierarchy changes have limited repercussion to the tables definitions.