Navigating backwards compatibility in GraphQL
GraphQL strategies to help you manage interactions between evolving systems more effectively.
Hi, I'm Preston, a full stack engineer at Jam. Today, I'm excited to share how we manage GraphQL, and a few lessons learned - the hard way! GraphQL is a powerful tool, but it's not without its pitfalls. We faced challenges maintaining a seamless interaction between evolving systems.
I hope this break-down of issues we encountered and the solutions we arrived at will help you navigate the challenges of GraphQL backwards compatibility.
The need for backwards compatibility
Backwards compatibility is fundamental when two systems, such as a GraphQL client and server, need to communicate seamlessly while the shared interface evolves on both sides.
Jam is a Chrome extension for bug reporting. We use GraphQL to handle submitting bug reports from the extension client as well as reading data from the dashboard client.
A unique challenge we face is the delay in deploying updates to our browser extension, subject to Chrome's review process. In most web-based SaaS applications, you can freely deploy changes at anytime and clients will get the updated version immediately (barring any other caching mechanisms). With a browser based extension, certain vendors like Chrome manage a systematic review process that isn't automatic. In practice, submitting an extension or updating an extension can take 1-2 business days before Chrome Web Store approves a change and releases it to end users.
This delay means that any breaking changes in our service can't be immediately rectified, emphasizing the need for a robust approach to maintaining backwards compatibility.
Regardless of the speed at which you deploy, at a certain scale, deploying breaking changes will affect users. Unless you can deploy both the client and server at the same exact time, you will have users who may be running stale clients and requesting data from a newer GraphQL server. In this case breaking changes can and will compromise your users experience.
Learning from experience: GraphQL case study
A few years back, a minor change led to a significant disruption, taking about 20-30 minutes to resolve. This incident highlighted the impact of incompatible changes and the need for a solid strategy to handle them.
We've had to invest significant time and resources in developing custom tooling and processes to help us monitor and debug our GraphQL implementation effectively.
More recently, we had another brief incident related to breaking changes but we assumed the change was fully behind a feature flag. To put it briefly, we wanted to make breaking changes because of an upcoming feature where we needed to adapt to changing product requirements. We anticipated that end users would not see these changes since they would not have a particular feature toggle present.
What happened? We use feature flags to guard new bits of functionality. As we were preparing for a feature, we added new GraphQL properties under a commonly used query called the me
query. This provides important context for the logged in user such as user email, the workspaces the user belongs to, user preferences, etc.
Every dashboard page load and every extension installation will consult the me
query to provide some important details about the logged in user. This makes it useful for adding on more commonly requested bits of data, but it is a double-edged sword as we'll describe shortly.
After we added more data to this particular query for an upcoming feature, we failed to realize the consequence of modifying the new GraphQL objects with respect to older GraphQL clients! Any breaking changes would cause disruption if end users are using older extension versions or had a stale tab on the Jam dashboard.
Our assumption was that a feature flag was enough to guard a new feature and that we would be able to add, remove or modify GraphQL fields. But this proved to be incorrect. We needed to make sure that new GraphQL query paths did not affect the hot path for existing clients.
me query {
teams {
id
name
entitlements { // ← this property is new! Any child properties that change would be breaking changes!
…
}
}
}
So, we ended up gating application logic on the resolved data, but needed to vary the query itself. Old clients that don’t have the feature flag should not fetch new API data.
To summarize, we needed to adopt one of two conventions:
- Avoid querying paths if they are volatile and subject to iteration. If we introduce a new GraphQL property we must handle backwards compatibility. This tends to be a slower workflow because we need to be mindful of changes and mark properties for deprecation and later delete them after client usage subsides.
- Introduce new queries and objects that are separate from existing queries. As long as these are separated and executed only when a feature flag applies, we can iterate rapidly and make destructive changes if needed.
For this particular example, we ended up with the latter strategy because we were still rapidly iterating. At Jam, we try our best to ship features through incremental changes even if they are incomplete because shipping large change-sets incur more risk. In our experience, shipping small, inert changes are ideal but we must also take into account changes that risk modifying existing clients.
Strategies for backwards compatibility
- Versioning GraphQL Objects: A straightforward method is to version your GraphQL queries and objects, allowing older clients to continue functioning with the API version they were designed for.
- Parameterizing versions: Using version parameters or HTTP headers is another method, though less recommended in GraphQL.
- Adopting a versionless API approach: The most idiomatic way in GraphQL is to create a versionless API, ensuring your API can cater to clients running outdated code.
Implementing the versionless approach
- Avoid removing or renaming types: Instead of removing or renaming types, always add new types or queries.
- Use of deprecated directives: Marking elements as deprecated can guide clients during introspection, but it's loosely enforced and requires a mindful approach.
- Mindful communication and staging: Communicate changes clearly and phase them out slowly, allowing clients to adapt smoothly.
Advanced strategies and testing challenges
- Introspection API: Clients can use the introspection API to understand available APIs and handle errors gracefully. This approach helps ensure applications work as expected without lingering issues.
- Testing complexities: Supporting many versions of clients multiplies testing complexities. We explored using a sub-module of our repo at an earlier commit to test interactions but found it impractical.
Adopting conventions and automation
We've adopted conventions to address backwards compatibility issues. However, remembering these conventions, especially for new engineers, can be challenging. To help, we employed tools like GraphQL Inspector in our CI process. This tool warns us of any breaking changes, categorizing them as safe, dangerous, or breaking, and blocks builds if necessary. It's a crucial part of our strategy to automate and prevent potential issues.
Balancing innovation and stability in GraphQL
At Jam, we’ve learned a lot through our journey with GraphQL, and adapted to strike the right balance between innovation and stability. The methods we've developed are not just about maintaining an API, but about ensuring a smooth, uninterrupted experience for our users.
I hope our story helps you in your GraphQL endeavors. Reach out if you want to Jam!