Documenting Software Architecture

This post is part of The Software Architecture Chronicles, a series of posts about Software Architecture. In them, I write about what I’ve learned on Software Architecture, how I think of it, and how I use that knowledge. The contents of this post might make more sense if you read the previous posts in this series.

We learn how to code and we build some cool applications, and then we learn about architecture and how to make the application maintainable for several years…

However when we need to explain to someone else (new developer, product owner, investor, …) how the application works, we need something more… we need documentation.

But what documentation options do we have that can express the whole application building blocks and how it works?!

In this post I’m going to write about:

  • UML
  • 4+1 Architectural view model
  • Architecture Decision Records
  • The C4 Model
  • Dependency diagrams
  • Application Map

UML

There are several diagrams we can create using UML, and we can segregate them into two categories:

I will not go into the details of each type of diagram because it would be too much to cover in this post, and there are plenty of resources out there documenting these diagram types. To know more about each of these types, you can check each of the links above that lead to some Visual Paradigm guides, or check out this blog post.

All in all, UML is cool, it’s very interesting, we can be very expressive with it, we can easily sketch some ideas with it and discuss it with colleagues.

However, to document a whole application architecture using UML we need to use several types of diagrams. Furthermore, if we try to use one single class diagram to express the whole application we are asking for trouble.

An example of good usage of an UML class diagram is to document design patterns:

This is fine, this is actually great! It can express classes, interfaces, usability and inheritance relations, data and behaviours. It is also concise and readable, and because it’s small, it’s also fast to create.

However, the example below is not so useful… It’s very big, so it gets confusing and difficult to follow. Furthermore, it will take so much time to create it, that when we are finished, it will probably be outdated already because someone will have made changes to the code in the mean time.

So, we can and should use UML, but for the situations it should be used: to describe patterns, small portions of an application in detail, or high granularity views of the application with low detail (not using class diagrams).

But then the question remains, how do we document a full application?!

4+1 Architectural view model

The 4+1 Architectural view model was created by Philippe Kruchten and published, back in 1995, in his paper titled “Architectural Blueprints—The “4+1” View Model of Software Architecture“.

This way of visualising a software application architecture is based on 5 views/perspectives of the application, telling us what diagrams can be used to document each of those views.

  1. Logical/Structural view
    Concerns itself with the functionality that is provided by the system and how the code is designed to provide such functionality;
  2. Implementation/Developer view
    Portrays the static organisation of the code, the components, modules and packages;
  3. Process/Behaviour view
    Focuses on the runtime behaviour of the system, how system processes communicate, concurrency, synchronisation, performance and so on;
  4. Deployment/Physical view
    Illustrates the physical organisation of the application, its about “what code runs in what hardware”;
  5. Use Case/Scenario view
    The architecture as a whole is explained with the help of a few use cases, which are simply sequences of interactions. Part of the architecture evolves from such use cases.

It’s important to note that the 4+1 architectural view-model does not mandate that we use all mentioned diagrams, and not even all the views. We always need to understand the tools, and use no more and no less than what we need.

Architecture Decision Records

The Architecture Decision Records (ADR) are actually not really about documenting the current, or future, state of an application architecture, but instead the reasons that led to it. They are specially important because they intend to tell others, and our future selves, why the architecture is what it is.

An ADR is a log entry about the architecture decisions that have been made and that lead to the state of the architecture as it is now or as it is intended to be in the future. They contain the why behind the the diagrams that describe the architecture.

To start, there are a few artefacts that we need to know:

  • Architecturally-Significant Requirement (ASR): a requirement that has a measurable effect on a software system’s architecture;
  • Architecture Decision (AD): a software design choice that addresses a significant requirement;
  • Architecture Decision Record (ADR): a document that captures an important architectural decision made along with its context and consequences;
  • Architecture Decision Log (ADL): the collection of all ADRs created and maintained for a particular project (or organisation);
  • Architecture Knowledge Management (AKM): the higher sphere of all previous concepts.

I have seen a few templates for creating ADRs, and I saw nice things in several of them, so I created my own template. You can, and maybe should, create yours as well, one that makes sense to you and your team.

For me, the most important thing for a template is that it’s simple, and it has some documentation in it to help fill it in and even to help make pragmatical and unbiased decisions.

The best way to use an ADR is not simply as a document written after having a discussion and making a decision. The best is to use it as the starting point for the discussion, as an RFC (Request For Comments), which is an idea/proposal that we submit to the other members of the team/department requesting their input/opinion/approval. The intention is really to use it to start a discussion, brainstorm, make the best decision possible, and use the proposal document itself as the decision log entry (ADR). The fact that the ADR is written before hand, doesn’t mean that it is immutable, it must be updated/improved as the discussion unfolds. I find it specially important that all the options under consideration be written down with their pros and cons, as to spark discussion and a clear decision.

So, this is the template I came up with:

Feel free to copy it from google docs.

If you want to explore this subject more, I recommend heading to the Joel Parker Henderson github repository about ADRs.

The C4 Model

The C4 model was introduced by Simon Brown, and it’s the best idea about software architecture documentation that I’ve come across so far. I’ll quickly explain the main idea in my own words, although using his own example diagrams.

The idea is to use 4 different granularity (or zoom) levels for documenting software architecture:

  • Level 1: System Context diagram
  • Level 2: Container diagram
  • Level 3: Component diagram
  • Level 4: Code diagram

Level 1: System Context diagram

This is the highest granularity diagram. It has little detail but its main goal is to describe the context in which the application is. So, it will be composed by one single box for the whole application, and it will be surrounded by other boxes that refer to the external systems and users the application interacts with.

Level 2: Container diagram

Now, we zoom into our application, the blue square in the diagram above which maps to the dashed square in the diagram below.

At this level of granularity, we will see the containers of the application, where a container is any independent technical piece of the application, for example a mobile app, an API or a database. It also documents the major technologies used and how the containers communicate.

Level 3: Component diagram

The component diagram shows us the components inside one container. In this context, each component is a module of the application, not restricted to domain wise modules (ie. billing, users, …) but also including purely functional modules (ie. email, sms, …). So this diagram shows us the main cog wheels of a container and the relations between those cog wheels.

Level 4: Code

The most fine grained diagram, aimed at describing the code structure inside a component. For this level, we use an UML diagram with class level artefacts.

To know more about it, you can read Simon Brown’s own explanations about it here and here, or even watch him talk about it here.

What is still missing?!

I think the C4 Model is a great way to document applications architecture, it is great to understand the architecture of the application to a certain level, but I still find it insufficient, although it took me some time to put my finger in what is missing.

There are three limitations I see in these diagrams:

  1. Save some exceptions, like Simon Brown’s structurizr, they need to be manually made, not automated nor directly extracted from the code, which means they might not reflect the actual code but, instead, our current understanding of it;
  2. They don’t quite help us see what is wrong in our application codebase, in regards to promiscuous code relations and poor structure, which impacts modularity and encapsulation, essential to any engineering product;
  3. They don’t help us understand our codebase as a whole, what the application cog wheels can do and how they interact with each other.

I have found two categories of diagrams that can help us with that.

Dependency diagrams

The dependency diagrams are useful to tell us about the dependencies that exist in the different types of code in our codebase.

Crucially important here is that these diagrams be automatically generated directly from the code, otherwise the diagram will reflect only what we think the code looks like, and if that was accurate we wouldn’t really have much need this type documentation.

Furthermore, maybe more important than the diagrams themselves is the ability to use these dependencies analysis to stop a build in the case of a break in our predefined dependency rules. So, the tool used to generate these diagrams should also be usable as a testing tool and included in our CI pipeline, just like unit tests are, preventing unwanted dependencies to reach production, which maintains and enforces modularity, which in turn helps reach high changeability rate and therefore high velocity of feature development.

Within this category of diagram, I find it useful to have 3 different types of diagram, to assert about different dependency types.

In the case of the examples I have below, they were all generated by deptrac for my pet project (explicit-architecture-php), which I use for experimenting. You can find the configuration used to generate them in the repository root.

Do note, however, that I added the colours myself as to make it easier to read in this blog post. The colours represent different layers in the application, in accordance with the layers I wrote about in previous blog posts:

Layer dependency diagram

The intention of this diagram is to visualize and make sure that the code in each layer can only depend on the layers inner or below it.

So, in the diagram below we can see, for example, that the Infrastructure layer, being one of the top outer layers, can depend on any other layer. On the other hand, the Domain layer, being the top center layer, can only depend on the layers below, namely the SharedKernel-Domain (which is part of the Domain as well) and the PhpExtension (whose code is used as if it was part of the language itself).

Layer dependencies diagram generated by deptrac for https://github.com/hgraca/explicit-architecture-php

Class dependency diagram

The Layer dependency diagram analyses the dependencies between layers, but within a layer there are still dependencies that must not occur.

The Class dependency diagram is useful to analyse the dependencies between the different types of class we have in our codebase, specially if they are in the same layer.

For example, if we want our events to be serializable, so that we can put them in a queue, we probably don’t want them to contain an entity because it would be problematic to unserialize it and persist it using an ORM. It would also not make sense for an event to depend on a service. With this type of diagram, or more accurately with the tool to test dependencies, we can easily detect such cases and prevent them from reaching production.

Class dependencies diagram generated by deptrac for https://github.com/hgraca/explicit-architecture-php

Component dependency diagram

A component is a domain wise module, a module that contains both Application and Domain layers. A component can be, for example, “Billing” containing all its use cases and Domain logic.

Components can be mapped to DDD bounded contexts and/or Microservices, which means they must be completely decoupled, physically and temporally, from other components. If we have a monolithic application with fully decoupled components, it will be fairly easy (code wise) to transform it into a Microservice Architecture.

Furthermore, applying the same decoupling requirements to the other non domain wise modules, we can guarantee that we can easily replace any module.

The Component dependencies diagram is aimed at making sure that the application components and modules are decoupled.

Note, in the diagram below, how the modules in the same layer (nodes with the same colour) are all unaware of each other, at least directly.

Specially important is that the two components (User and Blog, in mid-blue colour) are decoupled. If this application had a Microservices Architecture, these two components would be the microservices.

Component dependencies diagram generated by deptrac for https://github.com/hgraca/explicit-architecture-php

Application Map

About a year ago, I realised something else I was also missing in these documentation options: All these diagrams, they tell us what are the building blocks of the application, which blocks interact with each other and how they are related, but they don’t tell us what they do, nor how and when they interact with each other. For that we need to either know the application very well from the user perspective, or the codebase from the developer perspective. The previous diagrams don’t tell us what use cases we have in the application, nor what events are triggered by what use cases, nor what are the consequences of those events. If we show those diagrams to a Product Owner, he will find them mostly useless for his role.

So I came up with an idea for a new documentation diagram, which I call an Application Map, that can replace the C4 Model Component diagram.

The Application Map is aimed at being truly a map of the application, defining its “cities” (Components), its “local roads” (use cases), “highways” (events), etc.

The difference between modules and components is that a module is any modular piece of the application, while a component is a domain wise module of the application. So, while an ORM is a module of the application, it is not a component because it only deals with technical concerns. On the other hand, a “Billing” module is a component because it deals with domain concerns.

An Application Map starts by defining the components of the application, the domain wise modules, like “Billing”, “User”, “Company”, “Orders”, “Products”, and so on. In the case of a simple blog application, we could have two components, the “User” and the “Blog” components:

In each of those components, we define what are the commands that can be issued to them. The “User” component can create and delete users, while the “Blog” component can create and delete posts, and create comments to a post.

Next, in each component, we list any relevant services. These services are relevant because, for example, they trigger an event or are used directly by another component. This is important because the application map should make visible the connections between components as well as what they mean and any followup side effects, and for this we need to expose the services that wiring to other components and their names (which should express what they do).

Following the services, we list all the event listeners in each component, even if they are not actually used, which is handy because then we can detect it and either fix whatever needs to be fixed or remove the unused code.

By listener I mean a class whose public methods are all independently triggered by only one type of event, they focus on the event.

We will also list the event subscribers in each component, for exactly the same reasons as we list the listeners.

An event subscriber is similar to an event listener, except that its public methods are triggered by different events, they focus on a composite task, an example of a subscriber can be a class listening to different framework events in order to control when to start, commit or rollback the Request transaction.

At this point, we have all the components and their capabilities in the map. This is very valuable because it tells us, or any non technical person, what each component can do.

However, it still doesn’t tell us how all these capabilities relate to each other, for example “what happens as a consequence of a user creating a blog post?”.

In order to achieve that, the first step is to list what happens in a component when a specific capability is triggered.

In the image below, we can see that deleting a post (“DeletePost”) will trigger the deletePost() method in the PostService, which is also triggered by a listener listening to the event that notifies that a user has been deleted. This tells us that our application deletes posts as a result of either a direct command from a user or when a post author has been deleted.

In the User component, we can see that when a post is created, its author is automatically subscribed to that post subjects (tags).

Now we have the information about the flow within a component, but we are still lacking the information about cross component flow, so lets add the events being triggered and listened to:

We can see, for example, that:

  • Deleting a user will trigger an event that will delete the users’ posts;
  • Creating a post will trigger the event that will lead to both subscribing the author to the posts’ subjects and increasing the authors rating;
  • Deleting a post, from whatever use case, triggers an event that will result in decreasing that authors’ rating.

With all this information in our map, we can navigate it. Any technical or non-technical person can clearly visualise what happens when any of the use cases of the application is triggered. This can help us clarify our code, and our idea of the application behaviour.

But, when used in a big application, this diagram will still have problems common to the previously mentioned diagrams:

  1. It’s an artefact that will take a lot of effort and time to get done and also to simply keep it up to date;
  2. We will still end up having a big diagram with a lot of lines on it, which is not the most readable.

To solve the first problem, we need to be able to generate the diagram from the code, on-demand. This will make it effortless to create such a diagram, remove the need for maintaining it, and make it virtually immediate to create it.

To solve the second problem, we need to be able to selectively generate only part of the diagram. For example by providing the name of the use case that we want to analyse, which would result in only generating the sections of the diagram that somehow are related to the given use case.

So we need a tool… which does not exist… yet!

Or does it?! 😀

Some time ago I started creating it, and I got to the point where only the component internal flow is missing, but it lists all the commands, services, listeners, subscribers and events. It is still very alpha because of the missing information, but also because it is not flexible when it comes to the code base it needs to analyse but, from the codebase of the company where I currently work at, it can generate something like this:

Example of an (incomplete) application map, as generated by https://gitlab.com/hgraca/app-mapper

If you are curious about the project, you can check it out here, however be advised that it is still very alpha, its just a proof of concept and I haven’t worked on it for a few months already. If you feel it’s a worthy project and you have free time to contribute, let me know and I will try get you up to speed and create tasks that you can pick up to bring it to the next level.

33 thoughts on “Documenting Software Architecture

  1. Very nice article! Keep going!
    This article is very helpful to understand Hexagonal architecture as well, but still I am not able to make it link with my requirement.
    Basic Idea is to keep business logic isolated from the Input/Output and external libraries
    But I have different Input APIs like REST, File Upload, Database
    and same with output – response call back to Client service, File store and Database

    Input from all these apis will be first converted to Intermediate object -> then this IO will be processed by actual application -> Response will be put as JSON, File or Database

    One thing, that gets me a little bit confused that how I am going to aggregate all these APIs together (will there be a single controller or separate)?

    Your feedback is highly appreciated!!

    Thank you

    Like

    1. I think they all should be separate, and it should be clear what they are used for.
      Ideally, the namespaces/folder structure should communicate that on its own.

      Like

  2. I don’t think Application Map would work well in a functional paradigm, where the focus is on the data that is being transformed, not components exchanging events or dependencies.

    A function to create a post would expect all relevant data to create a Post structure in it.
    It would try to be a pure function and leave details like saving the post somewhere to other modules.

    Most likely one would opt for something like queues or channels to give other functions an opportunity to react to a new post without having to explicitly model events and building up use cases as a data flow were functions participate – independently.

    The same goes for events that would just be put on a queue by the source.

    If you can model your problem as a sequence of data transformations a lot of the complexities mentioned above are going away. You are left with a description of modules that take data of form A and return data of form A’.
    Simple.

    Liked by 1 person

  3. The application map looks very interesting, and specifically for events and events handler.

    Now, I have some trouble with events in my architecture, to the point I started to restrict/limit them until I find a solution.

    I have a similar architecture (decoupled, hexagonal, clean etc.) and it only concern the domain part.

    Sometime, I send events from my command handler. I like to use event/event handler when I do things that are not strictly related to the original handler or that can be done asynchronously (like sending email/notification) or when different commands have the same consequences.

    Events are great in these cases, but I noticed the business intents of the command handler is less explicit.

    It is less readable, you don’t know exactly what the handler is doing, because when you see `bus.emit(new SomethingHappened()), you have to search for all the event handlers handling this event to know exactly what the command is doing.

    So now I’m starting to use more services that I reuse accross my command handlers until I find a better solution to organize/structure my events.

    But now, some command handlers start to have too many responsibility, it’s always a tradeoff…

    Also, when I see the size of your application map, it looks way to big and unreadable. Maybe we just need an application map per command.

    What would be usefull is to have a list of all event handlers handling an event each time the events is referenced or emitted in the source code (like search for reference in IDE)…

    I will try to reorganize my package, maybe by merging CommandHandlers and EventHandlers in the same business package (they are very similar after all). Some research to do 🙂

    Like

    1. Yes, I feel exactly the same pain as you. It was the difficulty to understand what code was being executed as a result of events that made me come up with the Application Map idea.

      I also feel there is too much information in that Application Map, making it very difficult to read, but I think the CLI application to generate the map could use filters to only put in the map the events or commands we want to visualize.

      Good luck 🙂

      Like

      1. Good to know we are facing the same issue.

        I may have done a bad design decision which do not facilitate source code navigation : event handlers are not allowed to have business code.

        They must issue command, which is then processed by the command handler (where reside the business), which add another level of indirection…

        I made this decision because, we may want to trigger the command manually too.

        By example I have “open order” command which is triggered when the “paid order” event is issued. But what if I want to open the order independently of the event ?

        That’s why I have the “OpenOrderCommand”, and it can be triggered manually or by an event handler. But does all the action related to events need to be triggered manually ? probably not…

        Now I’m thinking about allowing the event handlers to have access to services/repositories/entities like the command handler.

        Where do you place your event handler in your code ? Do you place them in the same package of the command handlers or separatly ?

        Now I have :

        domain.commandhandlers.subdomain.MySubdomainCommandHandler.class
        domain.eventhandlers.subdomain.MySubdomainEventHandler.class

        I’m thining about about :

        domain.handlers.subdomain.MySubdomainCommandHandler.class
        domain.handlers.subdomain.MySubdomainEventHandler.class

        All the business logic related to a subdomain would be in the same package and it may help to understand everything the subdomain is doing and when.

        Like

        1. As i see it, you are going in the right direction by grouping them into packages, but i go even further.

          What you are calling domain, I call Core, and what you call subdomain I call Component, but this is how I do it:

          Core.Component.Application.Command. SomeUseCase.SomeCommand.class
          Core.Component.Application.Command. SomeUseCase.SomeHandler.class
          Core.Component.Application.Event.WhateverEvent.class
          Core.Component.Application.Listener.WhateverListener.class
          Core.Component.Domain.Entity.AnEntity.class

          The Application and Domain namespaces represent layers, and each component is a vertical slice of the application, as explained by Uncle Bob, Simon Brown and others.

          The command and handler files go together because they are work and change together so I apply the Package Cohesion Principles by Robert C. Martin:
          CCP – The Common Closure Principle – Classes that change together are packaged together
          CRP – The Common Reuse Principle – Classes that are used together are packaged together

          The separation between events and listeners in two different namespaces is because a component might triger an event but not listen to it, and it might listen to an event that it does not trigger, so they are somewhat independent, although I think it would be fine to put them together as well.

          Like

          1. I use SharedCore to name my business Shared Kernel namespace, and Domain.Core to name my business domain core interactions (the most inner layer of the architecture, that has no dependencies except the SharedCore.

            Like

  4. Do you consider using Archimate as a graphic notation for your application map? It provides handy semantic which looks suitably for the things you draw on your map, but it’s standardized, more or less common, and for my opinion more readable than just color elements.

    Regards to autogeneration, there is a great PlantUML plugin for Archimate: https://github.com/ebbypeter/Archimate-PlantUML

    Like

    1. I don’t think its possible to use those tools to have an auto generated application map.
      Although they are pretty good for the other diagrams.

      Like

  5. You have the right idea that we need to document the dynamic behavior of the system and it would be nice to have a tool that automatically generates this.
    However, the application map you show, is for a very small system and it’s already unusable. No human will understand such a map generated for a mid-to-large sized system.

    The solution is fairly simple and it’s already mentioned in the article:
    – you need to provide an activity diagram for all use cases in your system. In order for this to work, the system needs to be correctly decomposed into components.

    Highly recommend taking Juval Lowy’s Architect’s Master Class, it teaches how to correctly decompose a system based on volatility and how to document it.

    Liked by 1 person

    1. Yes, what you say totally resonates with me. That’s why i mention that we would need some way to analyze only the use case we need.

      Thanks for the recommendation, I don’t know it, but i will definitely check it out. 🙂

      Like

      1. Great article. I found it very helpful to have these different approaches cataloged together for easy reference.

        I’ll second the recommendation of Juval Lowy’s Architect’s Master Class:
        http://www.idesign.net/Training/Architect-Master-Class

        The use cases are the reason the system exists. They’re key to the architecture.

        Once your documentation is more oriented toward use-cases rather than static allocations, you can focus on dynamic aspects such as transaction, identity, authentication, and authorization boundaries.

        The use of a consistent, color-coded taxonomy is also a very helpful take-away from Juval’s course. I see elements of this in the color-codings of the Application Map. In Juval’s class, it’s applied at the Service/Component level.

        Like

  6. How do you mitigate the cost of generation? Are there tools you recommend? Personally I prefer source-trackable documentation formats, but have historically steered away from generated documentation because it seems to lack readability or go out of date.

    Like

    1. In my experience, on our line of work, all type of documentation gets outdated. The advantage of generated documentation is that we can generate it every time we need it, so basically removes tge issue of getting outdated.
      The readability is indeed a problem, but there are ways of making it more readable, ie using filters to generate only what we are interested on.

      Liked by 1 person

    1. Yes, i do, but there must always be some way of detecting what types of code we have. Now in my POC i am using regexes, so as long as that is possible…

      Like

Leave a comment