A couple of minor announcements that don’t need a separate post.
I have published a tiny Python helper module to clean JSON responses, especially the ones created by LLMs. You can find it here: https://pypi.org/project/json-repair/. The joy I am finding in writing code reminds me a lot of https://charity.wtf/2017/05/11/the-engineer-manager-pendulum/
Second, I have in beta a small demo page to help people create their self-retrospectives, now that the annual perf cycle is coming (and none of us likes that). If you are interested in trying it out and giving me feedback, let me know!
Some day ago I was talking to a fellow CTO who is part of a company that is scaling and now “is the moment to set KPIs for the Technology department”.
I remember the first time this happened to me and I was utterly lost, measuring the work of engineers or scientists is hard, and I didn’t have any good anchor at the time.
State-of-the-art
Nowadays there is a lot of literature on measuring productivity, the problem is, that most of it is utter garbage.
I won’t go too deep into it, you can read the rebuttals written by Kent Beck and Gergely Orosz against the latest McKinsey bullshit to have an idea1.
Gergely’s article contains a lot of best practices and if you are interested in the topic go read it.
I will focus here on my experience landing those ideas into an existing organization.
The obsession with measuring people is old
In the Age of Enlightenment, there was a widely popular view that the world was like a clockwork, precise and mechanical. So were peasants, if you were just able to train them, they would be the same as machines.
The view was widely shared in the army, which focused on training foot soldiers forcing them to memorize the number of steps they had to take based on the command, and in factories, with the same movements repeated over and over with precision.
It was popular among the elites, it reinforced their classist idea that the peasants were ignorant and could only be controlled like dogs or horses, and it was a simple solution for a complex problem.
Once liberal democracies started forming, and the old elites were shattered, a completely different doctrine emerged both in the army and in factories. Mechanization took all simple tasks away, and the role of knowledge workers started to emerge.
But the simple solution for a complex problem remained as appealing as it was in the 19th century, just with a different excuse.
The wish to measure everything is not completely unfounded
I don’t want to say that we should just pay software engineers a ridiculous amount of money and whatever happens, happens. That would be financial suicide.
In my experience is incredibly hard to explain how hard and expensive it would be to accurately measure everything that happens in a technology department.
And then there’s the problem of how difficult is for a manager to gather data or, as I call it, the quantistic theory of management:
The mere presence of a manager observing or measuring something is enough to make any observation or measurement completely useless.
Not because of Goddhart's law, although that doesn’t help, but simply because people will try to be their best selves in the presence of the person responsible for their bonus.
And, going back to the beginning, once I made all those points to the CEO or SVP, the answer was “But sales and HR can easily set targets”
Two bad incentives don’t make a good one
Sales is one of the oldest crafts in the world2 so it seems natural that it has more practice on how to set targets and how to make them work.
In my experience, this is mostly false.
Don’t get me wrong, the oldest incentive of “sales targets” is always there and it works extremely well when everything goes well and those targets are achieved. But as soon as something goes wrong, or when the sales executives are pressed to commit to a specific amount that is challenging to achieve, they will resort to all kinds of alternative metrics: the number of meetings with customers, the number of leads, customer satisfaction, etc etc.
And sales targets alone are a terrible incentive! You can lie to customers to close your contract, you could close contracts that are loss-making, or even just outright fraud. Ask Wells Fargo about it.
Recruitment is even worse, when I was in Booking back in 2016 the recruitment team was put under A LOT of pressure to recruit people with challenging targets on the number of hires.
The idea was that the bad incentive of pushing anybody through the door would be counter-acted by the engineers doing the interviews. This caused a ton of problems:
More than once during a debrief session I had to justify my rejection to the recruiter in an exchange that became more and more emotionally charged (since their money was on the table).
My group back then was contributing a lot of interviewers but we were also notoriously stricter than average, so the recruitment team decided to schedule all of us in the same interviews so that they would be the least promising candidates against a wall. Instead of distributing us and risking more candidates being rejected.
At one point it turned out that one recruiter outright lied to candidates to recruit as many people as possible, then used the big score to leave and find a better role.
All those problems were eventually resolved by changing the incentives but a lot of damage was done.
With this in mind, how do you set KPIs for your department? Because, like it or not, the CEO/SVP won’t just accept no from you.
Size matters
Setting KPIs is a delicate matter, it will upset people and cost you a lot of time.
If your team is below 50 people, you probably don’t need KPIs at all. This is because you, as the leader, must be hands-on enough to know if things are not going well: If your team is not shipping frequently enough, not committing daily, and not releasing as frequently as your industry allows3 you must be aware already, and be able to fix it.
For a small team the KPI of “shipping a lot” should be more than enough. Now you also want to keep an eye on quality, so keeping an eye on bugs, test coverage, and setting up some automated security scanners is a good idea. But, again, at that size, you should be able to talk to your engineers and ask them “Is our code crap?” and get a good answer.
At a slightly larger size, you won’t be able to know everything that is going on so you will need some monitoring, this is how I decided to organize it at Aidence:
First I had to find out what the business cared about and ask the CEO and the CCO, what, at a fundamental level, they needed from me.
They’d probably say something along the lines of ship faster, cheaper, higher quality. That is impossible, so press on to figure out which one of those metrics matters to them: if you are a company that is growing quickly probably pushing features out faster even if with lower quality is a good compromise, if you are in fintech you are very likely under pressure for high quality and low margins, etc etc.
In our case what mattered to the business was consistency, If we say we will push something out in the next release a) we will push it out in the next release b) the next release will come out more or less on a predictable schedule.
But there were also other stakeholders, the CFO cared about the cloud cost expenditure that was growing too fast when compared to our margins, and Regulatory was not happy about the absence of any real commitment to resolve bugs or vigilance tickets.
A lot of wishes…
At that size, you still don’t want to be too prescriptive on your KPI because of Goddhart’s law so I chose to go with a mix of targets and SLAs:
Cost became a target but not a total yearly budget. We calculated the “Cost per transaction” and used that as a target. Why? Because transactions were the unit we shared with sales, we billed by transaction so the cost to serve a transaction is a good anchor to improve company margins. You can do this too by calculating your unit of economics. If you are using the public cloud is pretty straightforward to calculate cost, if you are using your hardware then you will have to do a bit more work to calculate TCO (Total Cost of Ownership). Disclaimer: I favor the public cloud.
For security, defects, and regulatory vigilance tickets, we created an internal process with an SLA to monitor (roughly 2 weeks for a high-priority ticket). A hard target was too cumbersome, some tickets might be very hard to close, and in general, if you have a hard deadline for tickets you end up with a lot of “won’t fix”. Incentives again.
Finally, to achieve consistency in the release we worked a lot in setting up a release process with the target of a release every 6 weeks. In case we ran out of time, we could decide to cut the scope of the release and ship it out anyway. It was a compromise between the inevitable variability of the work in a startup and the predictability needed by the business.
Whatever you do, don’t use estimation.
All those points above required work, work from my side but also work from my team. This is something that sometimes business leaders miss: each new KPI brings with it a lot of work to set up processes, monitoring, and reporting. So it must be worth it, not just a whim. Otherwise, you are better off not measuring anything.
DORA, SPACE, and other fantastic beasts
If you are familiar with popular frameworks you are familiar with the acronyms of DORA and SPACE. I like the idea behind those metrics and I think that any company with enough maturity should attempt to deploy them.
A word of caution, measuring those metrics can be very hard unless your practices already align with the ones that the authors suggest. For example, how do you measure “change failure rate”? Do you have automated reporting? Do you have someone writing it in Excel?
Follow the money
Regardless of your KPIs and choices, my advice is to always follow the money:
If you can’t explain to the business how your department contributes to the business, you will always be a cost center, a necessary evil for the company. On the other hand, if you can always trace back your contribution to some kind of business metric (best if $$$), therefore able to show the ROI of your team, you will be a place to invest.
Measure systems not people
One last word of advice: don’t measure people but measure systems and the teams responsible for those systems.
It’s a bit counterintuitive because we don’t give bonuses to teams but to individuals. Measuring an individual, without being there side by side all the time, is impossible. How do you evaluate the individual contribution of pair programming or mob programming? What if someone is amazing at improving user stories and is making the team 100% more efficient?
What I find useful, from a systematic perspective is to, in order of priority:
Evaluate systems: How often this specific product/service/system is released/deployed? How many bugs are reported? …
Evaluate team performance: What is the cycle time? Is the team functional (from a collaboration point of view)? What is this team's contribution to the system/product/service that they are responsible for? …
Check-in on individuals: Finally, the people responsible for the team (tech leads, engineering managers) must be side-by-side with their team members to make sense of the above. Is everyone contributing equally? Is there someone hoarding all the knowledge? …
A very rough example. A long time ago one manager was arguing that all his team members were superstars that were contributing exceptionally. I pointed out to him that his team as a whole was underperforming, so the logical conclusion was that HE was the problem of the team. That led to a more nuanced conversation about his team…
In the end, measuring productivity affects productivity, so don’t be overzealous or you’ll be performing only on paper …
To be clear, I don’t blame McKinsey. They have been producing this crap for ages and they are hired precisely because they excel at making sure that CEOs can maximize profits and are justified by “science and best practices”. McKinsey had also a considerable role in the opioid crisis in the U.S. and engineered tricks to maximize the profits of pharma companies. There are a lot of articles out there if you are interested.
not the oldest
Ideally, you will be releasing daily but in regulated industries that is not possible, so you probably need to make an approximation