Last year, I put together a few thoughts on what I saw as the emerging
DevOps trends for 2019. As we enter a new year and decade, I thought it might be useful to do the same for 2020. A common theme in this year’s trends concerns the way in which firms are dealing with delivering services at scale in the cloud, which I think
could be a grand trend for the decade – so one I wanted highlight from the offset – but for now, here’s four trends for the year ahead.
1. Site Reliability Engineering
As more and more companies leverage cloud to host their services, how do they manage large user bases around the globe without a large 24x7 Operations team? Embracing failure and observing standard setters such as Google, Netflix and Spotify, firms are looking
to site reliability engineering (SRE) for the answers.
Site reliability engineering at its core partners development skill with operational responsibilities. A successful pairing will result in a service that can scale without having to linearly scale the human labour required to maintain it. To achieve this,
SREs eliminate manual effort, or toil, and cope with failure by introducing reliability through writing software and automation.
When failure does occur, much like reviewing a black box recorder, SREs use data to conduct post mortems to search the depths of what happens before, during and after critical incidents to identify repairs that reduce the risk of reoccurrence. SREs seek
to learn from failure, incorporating valuable lessons learnt back into their service in the form of software improvements. Post Mortem techniques are borrowed from industries, such as health and aerospace, that embrace failure as an opportunity to learn through
This model of handling operations is appealing to ambitious businesses looking for global reach. As a result, there is a trend towards Site Reliability Engineers being placed within or alongside engineering teams, who then, in a DevOps-focused model, become
responsible for the availability of their service, rather than an external operations team.
For this emerging field, finding the right form of SRE for enterprises of varying sizes and work cultures will remain a challenge. What succeeds at Google may not be the right recipe for another business. The role of SRE and fit within an organisation remain
fluid. However, one thing that is for sure is the business-critical importance of SREs in the years to come, which I see reflected in the job market.
2. Kubernetes adoption
In 2019, I, and the development community, expected an increase of Kubernetes adoption fuelled by the uptake of containerised microservices. Over the past year those expectations have been met and signs point towards growth in adoption.
One of those signs is the emerging popularity of building software with Kubernetes in mind. Instead of having an after-the-fact revelation that Kubernetes would be a good fit for managing a service, more organisations develop software specifically with Kubernetes
in mind. In the past, Linux was the focal point for open source development, now Kubernetes is becoming the platform for building solutions.
A driver for this mindset is the platform’s maturing support for different types of workloads. On one end there is support for serverless workloads using Knative – a big talking point in this year’s KubeCon – and on the other the use of Operators allows
vendors an easy way to offer their software on the platform. A marketplace is emerging that makes it easy for vendors and consumers to consume containerised software and run it in production.
Growth will be driven by interest from business for commercial Kubernetes platforms that reduce the operational effort of running the platform. Commercial distributions continue to mature by offering improved developer experience, a shorter cycle to production
readiness and easier maintenance for IT. This year there is hope federation features will step up to help coordinate Kubernetes clusters and enable multi-cloud and distributed solutions. Given the already large and still developing Kubernetes community, we
should also see an increase in the speed and volume of feature releases.
3. Service Meshes
Adoption of microservices has a wealth of benefits for organisations that use them. Developers use microservices to architect for portability, however, there’s no denying that adoption places strains on DevOps teams. Operators are managing large hybrid and
multi-cloud deployments. The rise of microservices has led to a parallel boom in service meshes promising to reduce the complexity of these deployments.
A service mesh offers the ability to observe and control a network of microservices and the interactions between them. The composition provides an overall view of your service and aids DevOps & SRE teams with complex operational requirements, like A/B testing,
canary rollouts, access control, and end-to-end authentication.
Expect growth in the number of offerings and adoption as they become an important ingredient in the success of running microservices. If an enterprise is on the journey from monolith to microservices, it will soon be crossing paths with service meshes.
IT faces the challenge of trying to monitor and react to ever-increasing amounts of telemetry and alerts within complex, intelligent systems. Human’s are not well-suited to directly observing these systems, yet, like good parents, engineers and operators
want to know and understand what their systems are doing at all times in case things go wrong. When things do go wrong, cutting through the noise is paramount to determine severity and quickly restore service. DevOps & SRE adoption is promoting continuous
improvement in the incident response process. This is why there is great interest in the use AI and machine learning to improve the time it takes to detect and mitigate incidents.
A single alert from one monitoring tool is analogous to a smoke alarm and a flood of alerts from multiple tools becomes unintelligible to manually decipher. To truly understand what is going on, the correlation of anomalies and alerts in multiple systems
is essential. Enter AIOps, the use of machine learning and AI to analyse metrics, logs, tracing and alerts holistically and predict where and when incidents are likely to occur. Expect a stronger showing of dedicated AIOps platforms and AIOps features in existing
tools, offering on-call the ability to make quick, accurate and informed decisions.