Dependency Management is Risk Management

By: Gene Hughson

How well-managed are your dependencies? Are you aware of all of them? Which ones can fail gracefully? Which ones allow the application to continue in a degraded state in the event of a failure? How many dependencies would cause your application to become unavailable in the event of a failure?

It's instructive that the Latin root of the word "depend" means "to hang". Although we may rely on them, dependencies also hang over our heads like the sword of Damocles, an ever-present threat to the well-being of our systems and our peace of mind. Since we cannot live without them, we must find a way to harness the usefulness while minimizing the risk.

It's common to think of code when the subject of dependencies comes up. Issues around how to organize an application into packages, reuse of common components, use of third-party components, and even proper usage of framework classes are all dependency management issues. For these types of dependencies, the anti-patterns tend to be well known, as are techniques for managing them:

My Four Principles of Dependency Management have an order of precedence:
- Minimise Dependencies: the simpler our code, the less "things" we have referring to other "things"
- Localise Dependencies: for the code we have to write, as much as possible, "things" should be packaged - in units of code organisation - together with the "things" they depend on
- Stabilise Dependencies: of course, we can't put our entire dependency network in the same function (that would be silly). For starters, it's at odds with minimising our dependencies, since modularity is the mechanism for removing duplication, and modularisation inevitably requires some dependencies to cross the boundaries between modules (using the most general meaning of "module" to mean a unit of code reuse - which could be a function or could be an entire system in a network of systems). When dependencies have to cross those boundaries, they should point towards things that are less likely - e.g., harder - to change. This can help to localise the spread of changes across our network of dependencies, in much the same way that a run on the banks is less likely if banks only lend to other banks that are less likely to default.
- Abstract Dependencies: when we have to depend on something, but still need to accomodate change into system somehow, the easiest way to that is to make things that are depended upon easier to substitute. It's for much the same reason that we favour modular computer hardware. We can evolve and improve our computer by swapping out components with newer ones. To make this possible, computer components need to communicate through standard interfaces. These industry abstractions make make it possible for me to swap out my memory with larger or faster memory, or my hard drive, or my graphics card. If ATI graphics cards had an ATI-specific interface, and NVidia cards had NVidia-specific interfaces, this would not be possible. Jason Gorman, "Revisiting Unified Principles of Dependency Management (In Lieu Of 100 Tweets)"

Infrastructure dependencies are another common dependency management issue. Database and web servers, middleware, and distributed storage all fall into this category, as does network connectivity. While the focus around code dependencies was on complexity and API stability, the main concern for infrastructure dependencies will be availability. Monitoring, clustering and/or mirroring are common tactics for mitigating risks with these. Other tactics include retries and queuing requests until communications are restored. In some cases, optional functionality can be disabled when unavailable.

Services, particularly third-party services, combine the features of both code and infrastructure dependencies in that API stability and availability are equal concerns. While this presents extra challenges, it also means that both sets of mitigation strategies are available for use. For example, assuming that two providers host an equivalent service, an abstraction layer can be combined with queuing and retries to remain up even when one provider is out of service. Where appropriate, an enterprise service bus can be used to handle translation across multiple message formats, taking that complexity out of the client application.

Third-party services pose a special case of availability concerns - supplier continuity. You must be prepared to deal with the contingency that the provider will either go out of business or discontinue their offering, leaving you to find a permanent replacement. This applies to third-party infrastructure (aka "the Cloud") as well.

Configuration data is a dependency that doesn't come to mind as readily. However, as systems become more complex and more redundant (purposely, for availability), configuration issues can cripple. Jason Gorman's first two principles (minimize and localize) can help. Additionally, automating changes and additions to configuration data will also help ensure that items aren't forgotten or poorly formatted.

A side effect of increased integration is increased reliance on shared data values and mapping from one value to another. If the same item is a "Gadget" on the web site and a "Widget" in the inventory system, there is a potential for problems. By the same token, even when items are named identically, issues can arise when changes are made for business reasons. When the systems involved cross organizational boundaries, the potential for problems increases further. It is critical to identify and understand these dependencies so that you have a plan in place to manage them prior to their becoming an issue.

Understanding and managing dependencies contributes to both the reliability and maintainability of a system. Potential issues can be identified, making them both quicker and easier to debug when a problem occurs. This allows issues to be evaluated for what level of risk is acceptable and which measures are appropriate to mitigate that risk. Where appropriate, the system's architecture can then be structured to handle them with minimal manual intervention. Failing to do the work up front can "leave you hanging" when things go wrong.

Re-posted from Form Follows Function