PEAA.3 – Mapping to relational DBs

There are 3 patterns to handle data persistence to relational DBs:

Active Record
Gateway
Data Mapper

Associated to this patterns, we have a few more design patterns to help implement the previous ones:

Unit of work
Identity map
Lazy loading

And we also have 3 patterns to structure the data in the DB:

Single table inheritance
Concrete table inheritance
Class table inheritance

The Patterns

Active Record

This is the simplest approach to persisting objects. Each entity represents a table row and knows how to connect to the DB, persist itself to the DB and find other sibling objects from the DB, using static methods.

An object that wraps a row in a DB table or view, encapsulates the DB access, and adds domain logic to that data.

Pg. 160

This pattern may be usable in simple projects, however, it has an enormous amount of responsibilities, breaking the SRP. As complexity grows, and we refactor the domain objects into smaller objects, we lose the one-on-one mapping of objects to table rows and the pattern breaks.

Gateway

The gateway represents a table in the DB and takes care of connecting, persisting and finding objects in the DB.

These gateways should be as simple as possible, generic, mechanic, reusable, maybe even generated. They should know nothing about business logic.

The problem is that, because they are supposed to be as simple as possible, they make a direct mapping from table column to object property. If changes are done to either the model or the DB, those changes need to be reflected in its counterpart.

Data Mapper

The data mapper makes the translation between a set of data ans one or several objects and from an object into one or several tables. It can, and should, use some kind of entity management system to cache the objects that were already extracted from the DB.

However, the concrete task of connecting, finding and persisting a set of data in the DB should be left to a data gateway. A common option is to implement a data mapper as a decorator around a data gateway.

How to load and save objects

Repeatedly persisting and loading the same object to/from the DB can have a severe impact in performance, therefore that is something we don’t want to be doing. We also need to prevent concurrently changing and persisting different instances that represent the same data.

To prevent this, we need to keep track of the objects that are loaded, in order to:

Not read an object from the DB more than once
Not write an object to the DB more than once
Always use the same object version for all editing operations (prevent concurrency)
Limit the entity nested graph loaded

There are a few patterns that can help us with this.

Unit of work

Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.

pg. 184

There are a few ways to implement this pattern, but the only one that allows for transparency (implicit persistence of changed entities) and decoupling of the entity from the UoW is to create a snapshot of the entity when loading it and comparing each managed entity to its snapshot when flushing the managed data into the DB, so that we only persist the modified objects. This adds some computation overhead, but it might be worth it performance wise and it sure is worth it, maintenance wise.

Identity Map

Ensures that each object gets loaded only once by keeping every loaded object in a map. Looks up objects using the map when referring to them.

pg. 195

The identity map is the structure that is used by the unit of work to keep track of the loaded entities. Whenever we ask for an entity that has already been loaded once, the unit of work return a second reference to it.

As a side effect, the identity map works as a caching mechanism, but its primary goal is to maintain unique entities in memory.

Lazy loading

An object that doesn’t contain all the data you need but know how to get it.

pg. 200

When we have many nested entities, when loading one entity from the DB, we may end up loading a huge set of data that maybe we don’t even need.

By using lazy loading we store a reference to the nested entity, instead of the entity itself. When we first access the nested entity we load the entity from the DB.

The big problem though is the “ripple loading”, aka “N+1 problem”, which happens when we have a list of entities with nested lazy loaded entities. When we loop through the list, we will be loading the nested entities one by one, instead of all in one go. To solve this we can use eager loading (pre-load the nested entities) when loading lists of entities.

Reading data

Martin Fowler gives us a few rules of thumb to follow when reading data from the DB. Unfortunately only one of them is not outdated:

Its often better to pull out more rows and filter out the ones not needed then issuing one query for each row.

Structural Mapping Patterns

Mapping relationships

Mapping 1-1, 1-N, N-N relationships are, now days, common practise and quite easy with the available ORMs (not the case in 2004). But it is worth mentioning the relation between entities and value objects.

In this later case, we have 2 options:

If we need to query by the members of the value objects, we can explode the VO members into columns of the entity object to which it belongs to.
If we don’t need to query by the members of the value object, we can store it as a string in one column of the entity. This string can be any type of object serialization format.

Nevertheless, it is worth mention that, now days, using MySQL 5.7 we can also store a VO as JSON string and still query the VO members, not to talk about all the NO-SQL options currently available.

Inheritance

How do we map an object inheritance tree into a relational model?

As usual there is on silver bullet, we need to look at our context and make the best decision possible, but we have 3 options:

Single Table Inheritance

We store all classes in one same table. The problem here is that some columns are only going to be used by a subset of the class hierarchy, so we end up with a non normalized table.

table_inheritance_single

This option avoids joins and is easy to modify, but we end up wasting space and having a huge table.

Concrete Table Inheritance

Maps each class to a table and each table has the columns of the parent classes.

This option also avoids joins but with any change to the object inheritance we need to update several tables.

table_inheritance_concrete

Class Table Inheritance

Creates a table for each class, including the classes in the hierarchy. This makes it necessary to use joins to get a full object. Although in the other hand, hierarchy changes have limited repercussion to the tables definitions.

table_inheritance_class

This post is part of a set of posts with my personal notes about all the chapters in the book “Patterns of Enterprise Application Architecture” by Martin Fowler. I will do this as I read through the book, and take notes on the concepts I personally find more relevant.

PEAA.3 – Mapping to relational DBs