This is a sequel to Getting Out of Quicksand, With DevOps!.
You can find other slides and videos here.
I added additional links where appropriate and try to attribute sources as well as possible. If you find an error or have a comment please contact me (see at the bottom).
TLDR; Just give me the code!
You can find the code of the dashboards that I developed here: https://github.com/rompic/Smashing-Flowboard
After putting in countless hours improving the deployment pipeline, investing in automation and deploying new technologies, it is time to ask this fundamental question: “Are we really moving faster?”
This is a story of how we made work visible by applying DevOps and Flow Metrics to discover bottlenecks and improve flow. We did this using dashboards, which are great cultural change tools, as they visualize problems and spark discussion.
I provide concrete steps to implement key metrics, automatically collect and visualize them on an open source dashboard and find an answer to this important question.
Key Takeaways are:
- A brief Intro to Value Stream Mapping
- Actionable DevOps and Flow Metrics
- An Implementation Example using an Open Source Solution
- References and pointers to advanced material
How did I end up here?
My name is Roman Pickl and for the last two years I’ve been a technical project manager at Elektrobit, which is an automotive software supplier. Before that I was CTO of a medium sized company called Fluidtime, but also a process manager at the Austrian parcel service, which also deals with some kind of continuous delivery, I guess…
I have a background in software engineering, business administration and computer & electronics engineering. CI/CD/DevOps is the sweet spot for me, as I really love how the things I learned in my Production Management and Operations Research courses are nowadays applied in the IT domain.
One aspect, that I really liked about my job in the operations department of the Austrian parcel service back in 2009 was the fast physical feedback and visibility of problems.
There were more subtle and hard to find process errors as well of course, but if one of the main systems or processes did not work as expected, boxes started to pile up at the bottleneck, providing a hard to ignore indicator of the problem.
I moved on after about 1.5 years but since then I always missed this clear feedback signal in my IT jobs.
I was missing “Ambient Awareness”. I think I first read about this concept in Michael Nygard’s 2007 book Release It! (There is also a 2018 second edition, but i haven’t looked into it yet). The idea is to create an Ambient Display, “an Interface between People and Digital Information” which represents data, e.g. the health of a system with the help of sound, visuals, movement or other cues. I had the honor to work with Michael Kieslinger who has published several papers on this topic at the Interaction Design Institute Ivrea and later founded Fluidtime around this concept.
These kind of “information radiators”, which should be put in a highly visible location to promote responsibility in the team (nothing to hide) and provoke conversation, can be traced back to, you may have already guessed it, the Toyota Production System (https://www.agilealliance.org/glossary/information-radiators).
When I spoke at DevDays Europe 2019, Steve Poole held an inspiring keynote about Dashboards and Culture: How openness changes your behavior (newer recording). He told a story about how sharing insights on dashboards closes communication gaps, forces discussion on how to generate accurate data / metrics and changes your culture. Putting data on a dashboard made the problem “real”. Before that it was just data in a spreadsheet. He is also talking about measuring end to end times. When I asked Gene Kim about tools which revolutionized IT work he also mentioned dashboards and Steve Poole’s talk.
I had already collected the data that I wanted to show for our weekly status meetings on a Wiki page by hand for a few month. However, I wanted to collect it automatically and have up to date data all the time. So I wanted to visualize our work on an automated dashboard. I had previous experience putting Redmine agile’s Agile Ajax board, JIRA wallboards, a Jenkins Build Monitor, and Graylog Dashboards on the wall and cycling through them in browser tabs using extensions like Revolver, but this time I was looking for something more integrated. What is more, i remembered a quote from Winston Churchill:
We shape our buildings and afterwards our buildings shape us.
— Winston Churchill
It also reminded me of the Skoda plant and BMW Project House discussed in Thomas J. Allen & Gunter Henn’s, book The Organization and Architecture of Innovation - Managing the Flow of Technology, where every employee has to pass certain points of the assembly line or up-to-date prototypes before arriving at his/her workplace.
So, working in a distributed team, I wanted to have the data available on our intranet, but also on a highly visible screen in the entrance area / hallway where everyone passes by a few times a day.
Creating a Dashboard
I already had a Raspberry Pi 3 at hand, but quickly learned that getting my private device on the company network is more or less impossible (get it on a white list to get an IP etc.). What was even more startling was that it was very difficult to get an additional monitor to show a dashboard. I asked every other month, but, given that we were growing, there was always a scarcity of time (also my own) and resources.
In retrospect I was very successful last year with piggybacking a planned change: Part of our company moved to a new office, down the street of the existing office.
The new office was renovated and a wish list / things-to-do list was created. So I asked IT for the stuff I needed for my dashboard (Raspberry Pi and a monitor).
I think that given that they were in a “change mode” of solving problems, buying hardware and setting up the network, it was easier to get the stuff approved and I got these devices. I may also have been lucky, as there was a plan to use a Raspberry Pi 3 Model B Plus Rev 1.3 (Broadcom BCM2837B0, Cortex-A53 (ARMv8) 64-bit SoC @ 1.4GHz, 1GB LPDDR2 SDRAM) incl. case, SD card, power supply, WiFi Dongle and HDMI cable for something else and it got canceled. In any case, I got this official device and a monitor from IT.
I would get a Raspberry PI 4 with at least 2 GB RAM now, as I ran into some memory issues.
There were also some logistical problems (I guess that’s normal) with setting up the new office space and I had some time to play around with dashing/smashing. As I had some previous experience with it, I gave it a go and was happy enough to keep it. There are other options like Tipboard or Mozaik, but unfortunately none of them seems to be very active.
Smashing is a Sinatra based dashboard framework. it comes with a number of pre-installed widgets based on scss, html, and coffeescript. There is also a high number of user submitted widgets that are easy to adapt. You can also get started easily with writing your own widget following their workshop. You can either use jobs written in ruby to update your widgets or push data to the API (see https://github.com/Smashing/smashing/wiki/How-To%3A-update-dashboard-in-Django) As smashing runs best on linux (https://github.com/Smashing/smashing/wiki/Installation), I used the following docker image for testing and development: https://hub.docker.com/r/visibilityspots/smashing. Note that some people have also been using it on windows lately.
A word of caution. I’m neither a ruby nor a CoffeeScript dev. So feel free to improve the code and setup.
I was ready to put some (private) time into implementing a first version of the dashboard and set it up in the hallway, ready for the official opening of the office.
So I ran:
docker run -d -p 8080:3030 visibilityspots/smashing
and pointed my browser to http://localhost:8080/.
It was up and running with a first dashboard:
The docker image allows you to map the dashboards, jobs and widgets folder from your local disk which I used extensively to speed up development.
Based on the metrics I had collected by hand for a few month I wanted to visualize the following things:
- Next milestones, important dates and releases
- Open Pull Requests
- Open Support Tickets
- Status of Jenkins Jobs, including build time and failing tests
- Jira tickets per status as also visible in our Kanban Board
This is what I came up with (Data has been changed to protect the innocent, code can be found here https://github.com/rompic/Smashing-Flowboard ):
All widgets are clickable and lead to the data source.
Putting it on the Raspberry Pi
Running the Dashboard on a Raspberry Pi and connecting it to an external monitor was the next step.
- The raspberry already had Raspbian installed. If not you can use NOOBS to get it installed.
- The first thing I did on the Raspberry is to install log2ram as described in https://github.com/azlux/log2ram#with-apt-recommended. It reduces the number of writes to the SD card and hence prolongs its life. I had run out of write cycles on an earlier setup so I thought that’s a good idea.
- I then installed ruby and bundler on the Raspberry:
sudo apt-get install ruby2.5-dev
sudo gem install bundler
- I then followed the getting-started chapter in the documentation. As indicated by an error message I also had to run
bundle update --bundler
I also did a few other things, but I won’t go into too much detail as you will find a lot of information online. If you have any problems feel free to contact me (see bottom of the page):
- Enabled ssh remote access: https://www.raspberrypi.org/documentation/remote-access/ssh/
- Setup initid: https://github.com/Smashing/smashing/wiki/Init.d-script (change paths and install daemon first!) and add to update.rc
- Wrote a cron job which turns the monitor on and off: https://www.screenly.io/blog/2017/07/02/how-to-automatically-turn-off-and-on-your-monitor-from-your-raspberry-pi/
- Enabled automatic security updates: https://www.elektronik-kompendium.de/sites/raspberry-pi/2002101.htm / https://www.zealfortechnology.com/2018/08/configure-unattended-upgrades-on-raspberry-pi.html
- Start Chromium in kiosk mode in autostart: https://raspberrypi.stackexchange.com/a/40745
- Disabled screen-sleep https://raspberry-projects.com/pi/pi-operating-systems/raspbian/gui/disable-screen-sleep ([SeatDefaults] is now called [Seat:*])
- You can find some thoughts about reducing burn in on your screen here: https://www.reddit.com/r/sysadmin/comments/67ty0p/dashboard_screens_without_burnin_issue/ . In the end we (we will come to that) I’ve implemented two dashboards and installed https://github.com/vrish88/sinatra_cyclist to be able to cycle through them automatically.
- Added ssh keys to be able to checkout from git without a password
- Added a cron job to pull my dashboard code from our repository from master at 6:00 in the morning in case there are any updates
- Added a cron job to restart at 6:15 in the morning due to running in memory problems. I might have to revisit this setup.
- Increased Swap from 100 MB to 1 GB (I did not want to do this as it can decrease the lifespan of the SD Card, but I ran into out of memory situations with smashing). This seems to be a known unsolved problem in dashing / smashing. I have created a pull request to update sprockets on ruby >= 2.5 which seems to help a little (see details in https://github.com/Smashing/smashing/pull/156) and updated all the gems (https://github.com/Smashing/smashing/pull/157). Using the techniques described in https://samsaffron.com/archive/2019/10/08/debugging-unmanaged-and-hidden-memory-leaks-in-ruby I couldn’t find a memory leak after that, but I might give it another try in the future.
The dashboard sparked a lot of interesting discussion during the opening party and we also got some great feedback about our innovative ways of working. Ever since the dashboard has been part of the new office and evolving into an important indicator of the current status and a source of new change initiatives.
I had succeeded in bringing back ambient awareness. That’s when I noticed a problem.
Applying the Three Ways of DevOps, especially by experimentation and by identifying bottlenecks in the build and test run, we were able to cut the full build/test cycle by a factor of 3 in the first few months of 2018. Moving our code to a git mono repo and containerizing our build environment in 2019 allowed us to provide feedback to our developers on every commit within minutes, not hours. Furthermore, automating our delivery allowed us to provide a new version of our software with the click of a button. This was great and we felt happier and so freaking agile.
Alex, the leading character of the story, is very proud of the increased “productivity” they get in the plant by applying robots when Jonah, the management guru, asks him a few questions. In summary the dialog evolves like this
Is the company now making more money? : No
Did you ship even one more product? : No
Are plant inventories down? : No
Are employee expenses down? : No
Then you didn’t really increase productivity, your inventories are going through the roof, aren’t they?
Looking at the dashboard, the inventory was starring me in my face:
Imagine all these tickets were boxes lying around in the hallway, they would have been way harder to ignore. They don’t have any value as long as they are not released. Furthermore, it doesn’t really make sense to add more.
Also see the discussion of Done vs. Done Done in Dominica DeGrandis’s - Making Work Visible - How to Unmask Capacity Killing WIP on page 122 ff.:
Think of a box of cereal sitting on a grocery store shelf. Corn flakes don’t provide any value to Kellogg’s until a customer buys them. Like inventory sitting on a shelf, a newly developed feature or bug fix doesn’t provide much value to the requestor until they can get their hands on it. — Dominica DeGrandis’s in Making Work Visible - How to Unmask Capacity Killing WIP
The bottleneck in development had shifted to testing and we were creating a lot of inventory.
In the Beyond the Phoenix Project audio book Gene Kim and John Willis discuss this shifting bottlenecks specifically in the IT domain. It is also discussed in the DevOps Handbook (pages 22, 23 in my copy) and by Gene Kim in an AMA of the The Unicorn Project: https://youtu.be/ReROx9-68V8?t=818 :
They are talking about 5 progressions
- Environment creation: A common first bottleneck is getting a deployment environment. A potential solution is to provide them on-demand and self-service e.g. by automating / virtualization / infrastructure of code
- Code deployment: It then often moves to code deployment, where a solution is automation and reducing hand offs and move towards self-service, single piece flow and continuous delivery
- Test setup and run: It then often progresses to testing (takes too long for faster deployments, manual tests, etc.): Massively automate the test process, move from integration tests to unit tests and parallelize.
- Overly tightly architecture: It then often moves to architecture. Small changes need a lot of approval of other teams etc.: Move to loosely coupled architectures / components that can be deployed independently
- If these constraints have been resolved, it moves to development or product managers, running out of great ideas or deciding which ideas to validate with real live customers. Effort should then shift to improving the flow from idea to delivery (“aha to ka-ching!").
When I heard and read this, I was reassured that our efforts were going in the right direction. It also reminded me of the J-curve of automation mentioned in the 2018 State of DevOps report which states that you really need relentless improvement, refactoring and innovation to reach a state of excellence.
DevOps metrics and measuring flow
In the DORA - Accelerate: State of DevOps 2019 report, the authors have identified four key metrics to differentiate low, medium and high performance:
- lead time of code changes from check-in to release
- deployment frequency
- time to restore: from detecting a user-impacting incident to having it remediated
- change fail rate as a measure of the quality of the release process: what percentage of changes degrade the service and require remediation.
While availability is shown in this figure, they do not include it in their cluster analysis as it does not apply in the same way to different software products.
The authors show that these metrics do not represent trade-offs between throughput and stability, but rather that high performers succeed in improving all these four metrics at the same time and stability and speed enable each other.
Based on these insights, and looking at our current state, I was especially interested in the throughput part and aimed to measure flow.
It defines 4 different Flow Items (features, defects, risks and technical debt) which describe all the work in a value stream (Mutually Exclusive and Comprehensively Exhaustive) and proposes to track the following metrics:
- Flow Load: The number of Flow Items being actively worked on in a value stream, denoting the amount of WIP (work in progress). Monitors over- and underutilization, which can lead to reduced productivity.
- Flow Time: The duration that it takes for a Flow Item to go from being accepted for work into the value stream to completion, including both active and wait time. Monitors if time to value is getting longer.
- Flow Velocity: The number of Flow Items done in a given time. Also referred to as throughput. Gauges whether value delivery is accelerating.
- Flow Efficiency: The proportion of items are actively worked on to the total time elapsed. Identifies when waste is increasing or decreasing in the process.
It also encourages to have a look at Flow Distribution, the allocation of Flow Items in a particular flow state across a measure of time, which helps to prioritize specific types of work during specific time frames in order to meet a desired business outcome / or see trade-offs.
It is business outcome driven as it also recommends to track business value, cost, quality and team happiness (with a survey) and correlate it to the Flow Metrics.
Carmen DeArdo gives a great overview:
If you want to know more about flow metrics you might be interested in watching two videos from last year’s All Day DevOps event:
- Vlatko Ivanovski - DevOps Metrics - Measuring What Matters
- Dominica DeGrandis - [Making Better Business Decisions With Flow Metrics] (https://www.youtube.com/watch?v=YS0Axx5SggY&feature=youtu.be&t=7342)
and this one
- Carmen DeArdo - [Use Flow Metrics to drive Business Results NOW] (https://www.youtube.com/watch?v=Xkoicoq6dGA&feature=youtu.be&t=4248) from All Day DevOps Spring Break Edition 2020.
So I created another dashboard (Data has been changed to protect the innocent, code can be found here https://github.com/rompic/Smashing-Flowboard ):
Notice that while it provides some insides into features and defects, we currently do not track risks and technical debt (some are in the improvement category) that explicitly. Every 60 seconds we rotate through this flow metrics dashboard and the status dashboard using (https://github.com/vrish88/sinatra_cyclist).
After the next deployment, I stood in front of the dashboard.
It dawned on me. We were shipping more often, but as we didn’t deploy from master, but rather patches from a release branch, on average we got slower. We had a fast lane for fixes, which were fixed on master and backported to the release branch (which is the way to go, if you use branches at all; see trunk-based development), but it still took us too long to ship features, which were waiting to be released.
It may look like a crisis, but it’s only the end of an illusion
So we looked into cutting our release cycle for major releases from every half-year to each quarter or even more often.
Still it seemed as if we were always late, with priorities / requirements changing in between these cycles.
I felt like we were improving our development process, constantly running, but remaining in the same spot, as in the red queen’s race:
“Well, in our country,” said Alice, still panting a little, “you’d generally get to somewhere else—if you run very fast for a long time, as we’ve been doing.”
“A slow sort of country!” said the Queen. “Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!”
— Lewis Carroll - Through the Looking-Glass, and What Alice Found There
Valuable ideas sit in 12m to 18m of big up front planning with no sense of urgency, no questioning if they might add more value than what has already been locked in the plans for the year. As soon as they reach the product development team they are urgent.
I had already heard of a similar phenomenon called Water-Scrum-Fall from Jez Humble in his GOTO 2015 presentation (also see Dave West’s article from 2011) Similarly, in Lean Enterprise (pos2959) it is stated that “making one process block more efficient will have a minimal effect on the overall value stream. Adopting agile in a single function (such as development) has little impact on the value stream and customer outcomes”. So this was not the first time I heard about this difference between agile development and business agility, but the “we are so freaking AGILE, yay!"-picture from Klaus Leopold in the above mentioned article really drove it home for me.
We had a limited system focus and had to turn to a powerful tool: Value Stream Mapping.
A brief Intro to Value Stream Mapping
As stated in the DevOps Handbook page 60ff) work typically starts with the product manager/owner gathering requirements based on a customer request or a business need. A development team adds this feature to their backlog, plans it for an iteration and implement the feature. The code is then build, integrated and tested. Finally it gets deployed and released to the customer where, if everything worked well, it creates the desired value.
Value Stream Mapping is a technique to visualize and understand the relevant and critical steps necessary to create and deliver value. Of course it can be traced back to Toyota. The current state is documented with representatives involved in each step in a workshop as well as people able to authorize the required changes, lean metrics are identified, and based on this the future state map (one to three years) is deducted as well as an action plan is created (typically three to twelve month). The improvement Kata can be used to move towards the future state. This approach allows to uncover bottlenecks, long wait times and eliminate waste/rework, as in value streams of any complexity no single person knows all the necessary steps that need to be performed to create value for customers.
For a draft agenda of a value stream mapping workshop see the dojo consortium’s website.
According to Chapter 7 in Jez Humble, Joanne Molesky and Barry O’Reilly ’s book Lean Enterprise it is not the goal to map every single step in detail, rather to get an overview with 5-15 process blocks. For each process, the team which performs it, the activity and the name is recorded. Real data is gathered about the current status: the people involved, barriers to flow, amount of work in each process block as well as queues / inventory between processes. Additionally three key metrics are recorded:
|Metric||What it measures|
|Lead Time (LT)||The time from the point work is made available to a process to the point it hands that work off to the next downstream process|
|Process Time (PT)||The time spent executing a particular process (with all necessary information, resources and working uninterrupted).|
|Percent complete and accurate (%C/A)||The proportion of times a process receives something from an upstream process that it can use without requiring rework|
Note that some authors also use cycle time as a metric. Karen Martin and Mike Osterling avoid using it at all in Value Stream Mapping: How to Visualize Work and Align Leadership for Organizational Transformation as it has several definitions and is used synonymously with different things.
Based on these metrics, summary metrics like total lead time, total process time, activity ratio (total process time divided by total lead time), accumulated / rolled %C/A are calculated.
Value stream mapping is used in various industries. See https://cloud.google.com/solutions/devops/devops-process-work-visibility-in-value-stream for an IT development related example and Karen Martin providing an overview of her Value Stream Mapping: How to Visualize Work and Align Leadership for Organizational Transformation book here.
Outcome and current status
While process improvements focus on where value is added, Value Stream Analysis focuses on identifying bottlenecks and eliminating waste. It turns out that this approach often has a way higher leverage. As described we found that we had a limited system focus, and needed buy-in to influence the process up- and downstream.
Luckily, at the same time the organization identified focus programs to improve flow and based on these started a Continuous Improvement initiative which is rolled out in 2020. We were able to connect to that program and harness what we learned to drive further change.
Due to the COVID19-Pandemic the time plan has been shifted a little bit and we are still in the middle of analysis.
However, since the beginning of the year:
- We moved to a light-weight quarterly planning cycle as discussed in Gary Gruver’s, book A Practical Approach to Large-Scale Agile Development to solve changing priorities and the Urgency Paradox.
- Our planning and status meetings were scattered over the week. Inspired by Hotjar we moved most of our regular meetings to Monday to provide more focus during the week.
- We track Work in Progress closely, plan to work in smaller batches and double our release frequency in 2020, providing monthly patch releases and quarterly minor releases.
- We have started collecting data on employee engagement and psychological safety per value stream using the westrum typology in a quarterly survey.
- We are heavily investing in test automation and built a new test track as well as simulation/emulation capabilities to test more of our use cases automatically.
While we made considerable progress in our journey, challenges still remain.
- We are discussing about reorganizing our teams based on cognitive capacity as discussed in Matthew Skelton and Manuel Pais' book Team Topologies as the cognitive load of our team was too high (software, hardware, firmware, etc.). We plan to have a platform team, a complicated subsystem team, value stream teams as well as enabling teams.
- We want to establish a common language using the stories from e.g. The Phoenix Project and The Unicorn Project
- We were planning customer visits (gemba walk) to better understand new use cases and need for improvement, but due to the COVID-19 pandemic we had to put them on a hold and find other ways to accomplish this goal.
Further references and information
If you want to know more, you should really read these books, follow these links or watch the talks of the authors:
- Mik Kersten - Project to Product
- Dominica DeGrandis - Making Work Visible - How to Unmask Capacity Killing WIP
- Eliyahu M. Goldratt - The Goal
- Gene Kim, John Willis - Beyond the Phoenix Project
- Jez Humble, Joanne Molesky, Barry O’Reilly - Lean Enterprise
- Gene Kim - The Unicorn Project
- Donald Reinertsen-The Principles of Product Development Flow: Second Generation Lean Product Development.
- Gene Kim, George Spafford, Kevin Behr -The Phoenix Project
- Gene Kim, Jez Humble, John Willis, Patrick Debois - DevOps Handbook
- Nicole Forsgren, Jez Humble and Gene Kim - Accelerate: The Science of Lean Software and Devops: Building and Scaling High Performing Technology Organizations
- Karen Martin, Mike Osterling Value Stream Mapping: How to Visualize Work and Align Leadership for Organizational Transformation - Video
- Vlatko Ivanovski - DevOps Metrics - Measuring What Matters
- Dominica DeGrandis - Making Better Business Decisions With Flow Metrics
- Carmen DeArdo - Use Flow Metrics to drive Business Results NOW
- Klaus Leopold - Rethinking Agile: Why Agile Teams Have Nothing To Do With Business Agility, Video
- Gary Gruver - A Practical Approach to Large-Scale Agile Development
- Matthew Skelton and Manuel Pais - Team Topologies
After I implemented the dashboards I found several professional / open source solutions that cater for the same or similar problem. So if you have a more complex setup or want to do something more serious, you might want to look into these:
- Tasktop Viz - announced October, 29, 2019
- HCL Accelerate, previously called Urban Code Velocity - version 1.0 announced June 19, 2018
- Hygieia - open sourced by Capital One in 2015.
Actually, Forrester has recently published an updated report called Elevate Agile-plus-DevOps with VSM which describes the benefits of the tools available in the emerging Value Stream Management market. They also have published a report named The Forrester Wave™: Value Stream Management Solutions, Q3 2020 which list 11 leading providers of such tools. At the time of writing it was possible to get a copy for the former from digital.ai, which was one of the companies listed, as well as the latter digital.ai.
Summary and Outlook
The main business problem we faced is delivering value in a flexible way, at speed and high quality to our internal and external customers. We were hindered by long development cycles, 6 month budgeting periods, high workloads and priorities that were often changing. A full cycle of building and testing one of our software and hardware products took more than 24 hours. So when you did something in the afternoon, you sometimes didn’t get feedback until the following day, but the day after. It had a negative impact on developer moral and felt like quicksand: the more we fought it, the more it pulled us in. We knew there must be a better way.
Briefly after moving to a new office in 2019, and knowing about the importance of making work visible and after having learned about the Flow Framework, I implemented a dashboard using an open source solution (smashing) which automatically gathered and visualized, among others, Flow metrics (Flow Load, Flow Time, Flow Efficiency, Flow Distribution, Flow Velocity) for our value stream. After putting in countless hours eliminating waste, improving the deployment pipeline, investing in automation and deploying new technologies, I wanted to answer a fundamental question: “Are we really moving faster?” It took me a while, and listening to Beyond the Phoenix Project and reading The Goal, to understand:
- We were creating a lot of inventory.
- We had a fast lane for fixes, but it still took us too long to ship features.
- We delivered more often, but the new bottleneck shifted to testing.
It became clear that we were trapped in local optimization (now described by Jonathan Smart as the Local Optimisation & the Urgency Paradox), we had a limited system focus, and needed buy-in to influence the process up- and downstream. We were able to connect our efforts to the Continuous Improvement initiative that had just started in the company. While it is nice that a top-down program fits so nicely to a bottom-up effort, we still have to be aware of and thaw the frozen middle i.e. middle managers who seem to resist transformation as the way they are incentivised did not change.
While we are still in the middle of the analysis, we were already able to soften the pain, which is also visible in the flow metrics that we track. At the same time we have to be aware that metrics could also do harm, as described in the recent HBR article Don’t Let Metrics Undermine Your Business (find a nice sketchnote by Kate Rutter here: https://twitter.com/katerutter/status/1234276317249425408), if they surrogate the strategy.
Another thing that I’m interested in is looking into better tracking risks and having a more detailed look at the productivity part of the model described in The 2019 Accelerate State of DevOps report.
Thanks for reading this article.