Home » First-Party Research » Lines of Code Breakdown

Lines of Code Breakdown: A Compositional Analysis

Results of analyzing 10 million lines of code across the largest Open Source projects

Last updated July 1, 2025

There are a lot of tools that provide stats on lines of code (LoC). Conventional wisdom has long held that these metrics are fraught, but absent hard data, it has only been possible to gesture toward the disadvantages of relying on LoC, without statistical proof.

GitClear has previously asserted that only 5% of lines of code meaningfully evolve the repo's code base. Because it is an extraordinary claim that 95% of LoC is noise, it is beholden upon us to substantiate this claim with data. That is the purpose of this page.

The funnel below aggregates real world lines of code measurement across 796,014 commits in 109 open source repos between April 2, 2025 and July 1, 2025. On desktop, hover on a funnel step to get more details about it.

First step: All changed code lines

50,707,514 changed lines of code factored into analysis

All changed code lines

The total lines of code in our most recent data set. This includes all lines that changed in any commit, so it is equivalent to the "Lines of Code" metric provided by GitHub or Pluralsight Flow.

Distinct commits

26,059,486 lines remain

Distinct: Ignore duplicated fragments

This step rinses all lines of code that occurred in a branch that is discarded, or code that is committed in multiple branches or repos. Removes 24,648,028 lines

Effecting

20,850,402 lines remain

Effecting: Remove semantic lines

Changes that modify white space, blank lines, language keywords (e.g., begin, include), or types of lines that don't contain meaningful code content relative to the file type. Removes 5,209,084 lines

Substantive

15,691,082 lines remain

Substantive: Negate batch operations

Diff Delta approximates cognitive load per commit. Operations like move, cut/paste and find/replace change many lines but do not represent high cognitive load, so are discarded by this step. Removes 5,159,320 lines

Purposeful

1,755,550 lines remain

Purposeful: Rinse commit artifacts

To normalize away the difference between a developer who commits 100 times vs 1 time daily, we identify churned code, and we devalue large-scale additions (like new libraries). Removes 13,935,532 lines

💎
Result

3.5% of total

1,756k final LoC

Important code line changes

Once you've cut through all the layers of noise that cloud lines of code, you find only a fraction of code evolving its repo in a purposeful, substantive way. 1,755,550 (3.5%) impacting lines remain

How much noise does your analysis tool let through?

Since other git stat tools (including those that profess to offer "Engineering Insights") neglect to process some or all of the steps above, the "insights" that they offer are as likely as not to be false positives or commit artifacts.

If you would like to extract the fractional lines of code that correspond to meaningful work by developers, consider signing up for a free GitClear trial, or a demo.

Signature Features

Developer Experience

Manager Corner

Lines of Code Breakdown: A Compositional Analysis

Results of analyzing 10 million lines of code across the largest Open Source projects

First step: All changed code lines

All changed code lines

Distinct commits

Distinct: Ignore duplicated fragments

Effecting

Effecting: Remove semantic lines

Substantive

Substantive: Negate batch operations

Purposeful

Purposeful: Rinse commit artifacts

💎
Result

Important code line changes

How much noise does your analysis tool let through?

Lines of Code Breakdown: A Compositional Analysis

Results of analyzing 10 million lines of code across the largest Open Source projects

First step: All changed code lines

All changed code lines

Distinct commits

Distinct: Ignore duplicated fragments

Effecting

Effecting: Remove semantic lines

Substantive

Substantive: Negate batch operations

Purposeful

Purposeful: Rinse commit artifacts

💎Result

Important code line changes

How much noise does your analysis tool let through?

💎
Result