Diff Delta Glossary

A lot of customers ask us how Diff Delta works. We offer a variety of answers depending on the level of depth the customer seeks. The most pithy exploration is our visualization of Diff Delta. The most thorough exploration is our article on counting lines of code. This page lies at a midpoint between the two. Here are a few of the most critical components of Diff Delta, with 1-2 paragraphs explaining why it matters.

Ignore Masks. Step one in carving away the 95% noise embedded in LoC is to ignore activity occurring in libraries, auto-generated files, and any other files that change without a developer doing meaningful work. Pluralsight Flow and GitClear allow customers to define, through UI or a .gitignore file, which files, directories, and file types should be excluded when assessing impact. GitClear goes one step further by allowing customers to write regular expressions that identify specific lines to be ignored (e.g., language keywords like end, #pragma, file includes, etc) within a file that will otherwise be processed as usual.

Greenfield Reduction. Every senior developer has experienced that day where they're assigned an urgent bug, spend the entire day tracking down its cause, before eventually discovering that the problem is resolved by changing a single line of code. In contrast, even junior developers have experienced days where, in the course of roughing out a new feature, they "write" 500 lines of code in a couple hours (quotes since much of that code is often copy/pasted boilerplate). The concept of a "greenfield reduction," which is implemented by both GitClear and Pluralsight Flow, is that adding or deleting a large swath of LoC over a short time window reflects considerably less value per line than changing a single line of code that hadn't been touched in years.

Churn Discount. Related to the Greenfield Reduction, "Churn Discount" describes the various techniques used by Pluralsight Flow and GitClear to reduce the value attributed to a newly committed LoC that was last committed within the past couple days. This discount reflects the typical working mode of developers forging a new feature: quickly write the "roughed out" implementation for debugging, then continually revise that hot mess until it's submitted as a pull request.

Identify Operations. The world of vanilla git is a very binary world: every LoC change is either an addition or a deletion (technically, git doesn't even know about additions and deletions -- just the contents of the repo at each commit. But since GitHub and all diff viewing tools show changes as additions and deletions, it's the de facto "reality"). The real world is a richer place. In the real world, developers add and delete LoC, but they also update lines, move lines, find/replace lines, and copy/paste lines. GitClear's focus on allowing developers to review code faster dovetailed with the need to identify all of these distinct operations, and discount the Diff Delta assigned to those that are trivial to perform (e.g., move, find/replace, and copy/paste). As of early 2019, Pluralsight makes no mention of factoring in specific operation types.

File Type Adjustments. Every type of file that a developer works in has its own set of syntax expectations, which give rise to varying levels of cognitive load required to write/upate lines. Files that implement view logic, like CSS/HTML files, tend to be heavy on the use of repetitive syntax and short lines. Files that implement modern programming languages, like Python, Ruby, and C# files, allow complex transformations to be implemented in a relatively small number of lines. Files that implement legacy programming languages, like Java and C++, are somewhere in between -- they are powerful languages that require a relatively large amount of boilerplate (i.e., low cognitive load) LoC. Consequently, GitClear affords managers the option to adjust the Line Impact multiplier for LoC based on the type of file. That is, if Python is a "10," Java might be a "5" and CSS could be a "2". As of early 2019, Pluralsight Flow does not yet offer file-based impact adjustments.

This is by no means an exhaustive list of the adjustments GitClear and Pluralsight Flow use to quantify LoC, but it serves to exemplify the trappings of a process by which LoC can yield meaningful signal when stripped of noise by a thoughtfully constructed interpretation algorithm.