ArticleAlgorithmic underwriting Data People

Resilient REST APIs: The case for Parallel Change

Abby Sassel
Abby Sassel21 Oct 2024

How might we evolve published APIs safely and with minimal impact to the interface's consumers? This post introduces the Parallel Change approach, and explores its use at Artificial to safely introduce backward-incompatible changes in its REST API built in Haskell.

Resilient REST APIs: The case for Parallel Change

This article is an expansion of the talk I held at Lambda World on October 2nd, 2024.

Parallel Change is an approach to refactoring or changing an interface which splits the introduction of backwards-incompatible change into three phases:

  1. Expand: introduce the new version in parallel with the old version.

  2. Migrate: clients move from the old version to the new.

  3. Contract: once all clients have migrated, remove the old version from the interface.

Let's look at a motivating example using a simplified version of Artificial's API, which provides information about insurance policies as part of Artificial's algorithmic underwriting platform. This simplified API has two operations which allow us to get a single policy or a collection of policies.

Here's a description of the same API using the OpenAPI specification:

The problem

Imagine we'd like to change the schema to use NanoIDs, rather than integer IDs. NanoIDs have a lower chance of collisions and present a nicer experience in URLs versus UUIDs.

So we're replacing string IDs, e.g. 12345 with NanoIDs e.g. policy_abc. Here's the change expressed as a diff on the HTTP request and response:

This illustrates a breaking change, because we've changed the API in a way that forces clients to adapt to the altered response.

Our goal as designers of this API is to provide a working, easy-to-use, unsurprising interface. Changes or new functionality should be introduced with minimal impact to the consumers of the API. This means we:

  • Avoid introducing breaking changes

  • Provide a smooth migration path from the old to new behaviour.

Backwards-compatibility and breaking change

Backwards-incompatible changes are inevitable in any real-world interface, but breaking changes are not. In the context of APIs, a change is backwards-incompatible when:

  • The API requires more, e.g.

    • Adding a new validation rule to an existing resource

    • Adding a new required parameter

    • Making a previously optional parameter required

  • The API provides less, e.g.

    • Removing an entire operation

    • Removing a field in the response

    • Removing enum values

  • The API changes in some user-visible way, e.g.

    • Changing the path of an endpoint

    • Changing the type of a parameter or response field

    • Changing authentication or authorisation requirements

Any change that causes a previously valid request to become invalid is a breaking change.

Backwards-incompatible changes become breaking changes when they are applied by default and force the consumer to alter their behaviour.

Potential solutions

What options do we have to release backwards-incompatible changes?

  • Version the API

  • Version the operation

  • Make a Parallel Change

These options are not exhaustive, but illustrate approaches at increasing levels of granularity.

Version the API

One option is to release the change behind an entirely new API version. Common versioning approaches at this level include:

  1. Different base paths (e.g., /v1/, /v2/)

  2. Query parameters (e.g., ?version=1)

  3. Custom headers (e.g., Version: 1)

Let's version the base path and look at how the request/response changes:

In this example:

  1. The original /v1 endpoint is unchanged, allowing clients to migrate at their own pace.

  2. We've introduced a new /v2 path for the updated endpoint.

This style of versioning is typically used for significant API changes. A more granular approach like versioning individual operations may be more suitable for our example use case.

Version the operation

A more flexible approach is to version individual operations using standard HTTP headers. This allows for finer-grained control over versioning.

Here's what versioned operations might look like for the client:

In this example:

  1. We've kept the same endpoint path

  2. We define two versions of the Policy schema in the API implementation

  3. Clients can request the Policy version they want using a custom Operation-Version header.

The API handles the requests differently in code depending on the value of the header:

  • When the Header is X-Operation-Version: 1, returns PolicyV1 with an integer ID

  • When the Header is X-Operation-Version: 2, returns PolicyV2 with nanoID for V2

This approach offers more flexibility than full API versioning, but we still expect the client to compare and switch between versions. HTTP headers can also introduce complications with client-side caching, which we'd like to avoid if we can. Which brings us to our final option: the Parallel Change approach.

Make a Parallel Change

How might we transform breaking changes into non-breaking ones, without the overhead of multiple API versions? The Parallel Change approach suggests we add new capabilities alongside the old, and deprecate the old ones when they're no longer in use. We do so in three phases:

  1. Expand: introduce the new version in parallel with the old version.

  2. Migrate: clients move from the old version to the new.

  3. Contract: once all clients have migrated, remove the old version from the interface.

Here's how the Parallel Change approach might look from the API consumer's perspective:

This change is small and purely additive compared to the previous approaches. So we don't:

  • Remove a field from the response

  • Change the field's type or semantics

  • Require the consumer to change their behaviour

This accretive approach is the essence of turning breaking change into non-breaking API evolution. It reduces the migration risk and effort for the API consumer; adopting or reverting a new capability is as simple accessing a different field in the response payload for the same request rather than switching to a new version.

However, this approach increases the burden on the API producer; it's more work to maintain multiple code paths, there's a risk of slower client migration, and it requires more effort to monitor usage and deprecate unused capabilities.

How might we mitigate some of the downsides of the Parallel Change approach?

Parallel Change at Artificial

Here are a few things we've learned, using Parallel Change to evolve an external client-facing API at Artificial.

Object types in schema simplify API growth

At the expand phase, we change the interface to support both old and new capabilities.

Defaulting to JSON object types in the response made it easier to evolve the API in a backwards-compatible way. Objects allow us to add new fields without removing older ones and facilitate non-breaking change.

For example, introducing pagination to our example schema is a substantial breaking change, as we change the root schema from an array to an object:

Whereas designing our schema upfront by wrapping arrays in objects gives us flexibility for future non-breaking additions:

Telemetry tools help with sunsetting

In the contract phase, we sunset the old capability once it's no longer used; removing it from the interface and deleting its artefacts such as code, configuration and documentation.

We collect telemetry data to understand API usage and identify capabilities that are no longer used and could be sunset. Currently, our observability tools of choice include BetterStack, OpenTelemetry, and Grafana.

Summary

At a high level, Parallel Change is an approach to refactoring or changing an interface which splits the introduction of breaking change into expand, migrate and contract phases.

The overall goal is to evolve the system in a way that minimises the impact on those consuming the interface. We do this by making improvements in small, additive-only increments know as accretion.

The downside of this approach is that it can be more work for API producer to maintain, migrate and monitor the parallel changes. However there are ways of mitigating this extra work at each phase, for example:

  • At the expand phase, we can make API evolution easier by defaulting to object types in the schema.

  • At the migrate phase, by generating clear documentation which highlights deprecated parts of the API.

  • At the contract phase, with telemetry that points out where we can sunset unused capabilities.

Further reading

This post is a summary of some things we've learned from a recent application of old ideas. Shared with thanks in particular to the following sources:

Never miss an update

Sign up for the latest insurtech insights with our mailing list and receive only the most relevant articles delivered straight to your inbox.

By submitting your details, you agree to receive occasional marketing emails from us. We will never pass your details to any third parties. For further information on how we manage your personal data, visit our Privacy Policy.

We and selected third parties use cookies or similar technologies as specified in the cookie policy. Learn more