Counting lines of code? It's 95% garbage and we have the data to prove it - GitClear

Counting lines of code? It's 95% garbage and we have the data to prove it.

GitClear was built to quantify developer contributions. Upon learning this, the most frequent question we receive from experienced technical managers is "does your tool use lines of code to measure developer work?" It's common for technical managers to have cultivated the belief that any attempt to quantify developers using lines of code is a fool's errand.

Their skepticism is justified. As we'll illustrate below, using data from real world repos, there is almost nothing of value to be gleaned from counting lines of code. Key word: almost.

Our tl; dr is that while less than 5% of all lines convey meaningful work being done, there are still profound insights to be reaped if one can isolate the tiny fraction of lines that matter, among the vast wasteland that is a typical git repo.

Finding a signal amid deafening noise

Many well-intentioned managers have sought to understand their developers throughput via metrics like those that Github provides:

  • • Lines of code changed
  • • Commits made over time
  • • Code additions and removals

The great thing about these metrics is that they are trivial to capture and they reflect real-time, quantifiable activity that's already occurring within any code base using git.

If we were Mr. Roboto, we would be fully supportive of the implications herein.

Given how easy it is to access these juicy GitHub visualizations, plus their location within a tab labeled "Insights," it’s tempting to use them to judge developer productivity. However, the data presented by Github is unfiltered, unprocessed commits and lines of code (henceforth, "LoC"). As we'll prove below, these unprocessed metrics hold only the slightest whisper of signal.

Imagine counting plates at a restaurant

Since it’s often non-technical managers who search for tools to measure developer productivity, it is they who must know the peril of using LoC to make management decisions. However, we haven’t the time to administer the requisite four-year Computer Science curriculum to bring them up to speed on why so much LoC is noise. Instead, let’s use a real world example to sidestep the need for first-hand programming experience. 

For our purposes, consider the following analogy:

measuring developer productivity by lines of code 

measuring restaurant productivity by numbers of plates used

As we'll discover, this analogy can be used to explain, in a non-technical manner, almost every reason that counting LoC is dangerously misleading. 

But let’s start with some good news for restaurants. If they’re using a lot of plates, they’re probably serving a lot of meals. There is signal here. The question is: what is tangled up with it? 

Clearly Restaurant C is the Best Restaurant because they use the most plates.

Come with us as we wade through the details of what makes it not quite impossible to measure a restaurant by its plates used. It requires an executive audit of the pitfalls we encounter counting plates. 

Bonus: in the process of talking about restaurants, we’ll reveal our original data gathered for this blog post. It indicates the top four ways “lines of code” get used, and how each of the most common ways are misleading and inaccurate. It’s essential to understand exactly how your data is being interpreted if you want to draw valid conclusions from the pretty graphs us toolmakers will happily provide you. 

Example 1: Empty Plates

Conventions dictate usage

At upscale restaurants, it's conventional for tables to be set with empty plates waiting to greet visitors. These plates contribute to the restaurant’s plate count, but many never even get used. All of the unused plates set out by convention do nothing to help us understand the restaurant's productivity.

Similarly, code lines changed by way of convention add a lot of noise. These are lines like:

  • • Whitespace changes (changing “ word” to “word”)
  • • Blank lines (adding visual readability)
  • • Language-based keywords (“begin”, “end”, brackets)

These types of line changes account for a staggering 54 percent of commits analyzed on GitClear in 2018. Understanding and devaluing these lines gets us a big step closer to identifying the work that made an actual impact.

Example 2: Splitting Plates

Moving stuff around doesn't tell us what we're after

When my wife and I dine out, we enjoy sharing different dishes on the menu. We receive two empty plates for ourselves, as well as several more filled with the all the different foods we order. We divide these up throughout the meal, moving food from plate to plate. The plates we use to move food around tell the restaurant nothing about how many delicious meals it is serving.

In code, it’s common for developers to cut and paste lines of code from one file to another. This type of line change happens rapidly, and tells us virtually nothing about the volume of productive work happening.

If you use a code quantification service that doesn’t recognize moved lines, the signal your measurements contain is being diluted by around 15%, the average percentage of "moved lines" among all lines of code encountered within GitClear repos.

Example 3: Fast and Cheap

Copying and Pasting Lines of Code

Say a certain restaurant has the cheapest eats in town. It serves a high number of meals each day, so we consider it to be productive. This only tells us half the story, though. We don’t know anything about the quality of those meals.

All meals are not created equal. So why would we treat them as such?

Most every developer has worked on a team with someone that creates volumes of “cheap” lines of code. When a developer copy and pastes code, they add lines that serve a purpose and get the job done. But rarely is that code considered impactful. It repeats existing code and becomes a hassle when it needs to be changed in the future.

If we’re measuring developer contributions, we shouldn’t reward this type of “cheap” code – even if it leads to successfully implementing a feature. That’s why GitClear identifies copied and pasted as well as find / replaced code and reduces their Line Impact.

Example 4: Location & Volume

Accounting for Different Programming Languages

What about a restaurant that serves a lot of meals because it’s located next to a hot tourist spot? Whether the restaurant is good or not, it will benefit from having increased foot traffic. Therefore, the number of plates are a product of the restaurant’s location more than anything else.

In code, certain languages naturally produce more lines of code than others. For verbose languages like HTML and CSS, a developer’s production will appear prolific, especially when implementing a new feature. Similarly, concise programming languages like Python, Ruby, or C# will mostly end up with fewer lines of code. PHP, Java and C++ fall somewhere in between.

GitClear makes it easy to customize Line Impact for different languages and file types. Managers can adjust the multipliers that our learning algorithm applies to match their own team measurements. While we’ve found that a line of CSS has 40% of the impact as a line of Ruby (the default setting), we want to be sure that each organization can calibrate it to fit their own needs.

Better measurement with Line Impact

It’s a tall order to use lines of code to measure a developer’s impact. Our data suggests that about 95% of all changed lines are noise from the standpoint of whether they result in a lasting impact to the code base. But we’ve determined it can be done -- with sufficient flexibility and attention to detail.

It took our team about 30 months of concerted pre-launch effort to build the system of checks and balances that digests all of the factors above (among others) and consolidates them into a single metric.

We call this metric Line Impact, and we believe that it’s the most accurate and comprehensive solution for teams who want to quantify the impact being made over time by their development team.

Much of our development time has gone toward ensuring that individual teams can adjust the scalars we apply to lines in different code categories, file types, and so on. We’ve worked hard to ensure that our default values are sensible for general use cases – this isn’t a tool you’ll need to spend hours configuring before you begin to draw profound insights. But we also appreciate that reasonable CTOs can come to very different conclusions of how much impact is made by writing a test, writing a line of CSS, or updating a file annotation.

Line Impact is not productivity

Read the next article in our series to learn more about what judgements can (and can't) be made with access to Line Impact.