Why we focus on a single, reliable metric

Bill Harding February 6, 2019

It's no secret that we've invested a tremendous amount of energy into crafting Line Impact into a single, reliable metric. We want managers to rely upon it to inform their most valuable decisions. In effect, we've bet thousands of hours that there is a holy grail -- a single metric that can accurately represent which developers are having the greatest impact on your repos -- and thus your company. 

To a software developer approaching GitClear for the first time, it might seem like a reckless bet to make. Books could be filled with the internet screeds about the futility of quantifying developer output. Given the current state of Github's stats (which focus on raw lines of code, and commits made), these screeds have ample evidence to back them. Github stats are exactly as ineffectual as advertised.

Even our friends at Gitprime hedge their bets when it comes to embracing a single metric. Their approach is to provide roughly fifteeen metrics they believe combine to paint a picture of how much work a developer is getting done. They have a number of pages like this one:

Evaluation in many dimensions. Source: Getapp's Gitprime profile.

They've probably adopted this approach based on conversations with their customers. Customers will begin skeptical that any metric can capture a developer's productive output. The easiest way to work around this problem is to create so many metrics that none of them needs to be perfect, and no tough decisions need to be made about which metrics provide the essential truth. The benefit of this approach is that it's easier to get managers initially on board.

The drawback is that you don't want to rely on a collection of imprecise and poorly explained metrics to make your most business-critical decisions. What you need is fewer, better metrics. You need metrics that are documented, and whose method of calculation can be traced back to its origin to earn the trust of the dev team.

On GitClear, we eschew the "many metrics" solution to measuring developer output. Nowhere on GitClear do we explicitly provide "commits made" or "code churn" metrics, though it would be trivial for us to do so. The latter is one factor in our calculation of Line Impact, the former we believe has no intrinsic signal. But some customers will want to see these metrics. What's the harm in providing them?

For starters, the more metrics you have to sift through and consider, the less clarity there is in what conclusion to take from them.

Second, spreading data across many metrics makes it impossible to look at data across secondary dimensions. For example, if you want to know who's the most prolific test writer, you can rank all developers' Line Impact on the secondary dimension of "test code written." If you want to compare code velocity, you simply combine the primary metric with "time passed." None of this is of any use if you don't have a reliable metric serving as your primary axis.

And there's still another fundamental reason we've chosen to focus on creating a single metric to rule them all™️.


The only way we know to build a code metric that earns developer trust

There's only one way to create a single developer metric that works for every business case. It has to be extremely adaptable. It has to be transparent to the developers (i.e., via code review). And it has to be based from empirical analysis. The latter is the only way to build enough confidence in a metric to make managers feel comfortable making critical decisions with it.

What do I mean by "empirical analysis" in this context? Any metric purporting to accurately measure "productive code output per developer" must correspond to an expert's evaluation. More specifically, the results provided by Line Impact must correspond to the judgement of a company's most reliable code evaluators: its CTO, VP of Engineering, and other Senior Tech Managers. These are the knowledge holders with the credibility to judge which developers make the biggest positive impact on the code base.

GitClear is a system built to be configured using the judgement of your best software experts.

We recognize that expert opinions will vary. The ideal software engineering metric needs to pick smart defaults. But when the stakes are high, default settings aren't enough. We need to allow experts to tune the system to ensure that their intuition is reflected in the Line Impact values they see. Once your experts have calibrated the system, their confidence in it will carry down to the developers themselves.

A well-tuned measurement tool is what permits managers to evaluate the biggest, highest stakes questions -- like work from home productivity, or the performance of a new hire.


Putting the idea into practice

Every time a new customer imports their repo(s) with GitClear, we ask them the same question: do our results correspond with your prior beliefs?

It's an esoteric question, but it's essential to calibration. This calibration step is what connects the vision of your company's tech leadership with the degree of Line Impact. A good analogy is purchasing a new clock: the item you've bought is very precise at measuring the passage of time -- but until you've set the initial value (i.e., the current time), the accuracy of your new clock will rely upon the manufacturer being in the same time zone.

Once a technically-minded manager has reviewed the metric, they can begin to predict its results. This is why we describe ourselves as a "Gitprime alternative for technical managers." Once a technical customer has calibrated their code measurement empirically, it becomes a single, reliable metric. They can then leverage it forever thereafter. 


Optional case study: Calibration via file type multipliers

For the nerdy, let's delve into a specific example to make these concepts more tangible. Programmers know that not all file types are created equal when it comes to the ease of adding or removing lines. In web apps, one of the most basic manifestations of this idea is CSS files. A single line of CSS (or its close cousin, SCSS) can typically be written in 30-50% as much time as a line of Ruby, PHP, Java, or Python. Failure to account for this intrinsic property of the CSS file type leads to front-end developers (who spend the most time in such files) being heralded as the most productive developers at the company. To be clear, sometimes they might be the most productive -- but the fact that they tend to add, update, and remove more lines of code than anyone else isn't itself sufficient proof of their dominance.

In the interests of furnishing smart defaults, we automatically reduce the file type multiplier for files like CSS, that have short lines and high redundancy. But we also provide a settings page that offers more granular control. This setup page also offers a preview of how a potential change would affect the credit given to a sample set of your committers:

Reduce the Line Impact multiplier for "CSS" file type, and impact varies. Source: GitClear file type settings

In the interests of furnishing smart defaults, we automatically reduce the file type multiplier for files like CSS, that have short lines and high redundancy. But we also provide a settings page that offers more granular control. This setup page also offers a preview of how a potential change would affect the credit given to a sample set of your committers:

This real-time preview gives engineering managers the opportunity to try out a proposed multiplier. If the change brings Line Impact into better alignment with their own intuitions, they can keep it. File type multipliers are one of many ways we allow Line Impact to be calibrated to your tastes -- here are a few others.


When reliability is paramount, get empirical

At this point, I should probably re-emphasize that you don't need to spend hours combing over custom settings to start gaining insights from GitClear. We've gone to great lengths to infer the best default settings for your repo, based on our past experience processing similar code. Compared to relying on the classic developer evaluation techniques like "manager intuition," you can get a long way toward discovering new truths about your team's output without tweaking a single setting.

When you're ready to start making serious business decisions based on your code stats, you can do it with confidence by empirically calibrating Line Impact. If you're a CTO, VP of Engineering, or manager with experience writing code, you're ideally equipped to match Line Impact to your own judgement. Better still, if you're at a large company with multiple tech managers spread throughout the enterprise, each manager can individually tune their team's Line Impact calculation to their tastes (we allow per-repo and per-organization configuration that supersede company-wide settings). After you've fine-tuned Line Impact to align with your vision, you no longer need to sink your own time into reading code to manually assess who's stuck, or which meetings/policies are sapping your team's productivity.

If you're a non-technical manager, our technical support team can still help you craft the best data to rely upon for your business-critical decisions. Get in touch via our demo page, or just seize the day and start a free trial when you're ready to move beyond guessing at who your top performers are.

Bill Harding

CEO/Programmer, GitClear

Bill is driven by the challenge of how best to quantify valuable questions that defy quantification. It's possible this instinct may have been awakened in Bill at age 14, when he won a soft, stuffed bunny at the orthodontist for guessing the number of jelly beans in the gumball machine.


No comments have been left on this blog.

Login to leave a comment