Does AI-Assisted Coding Deliver? A Difference-in-Differences Study of Cursor's Impact on Software Projects

13 points by Sietsebb


Sietsebb

Abstract, linebreaks and emphasis mine:

Large language models (LLMs) have demonstrated the promise to revolutionize the field of software engineering. Among other things, LLM agents are rapidly gaining momentum in their application to software development, with practitioners claiming a multifold productivity increase after adoption. Yet, empirical evidence is lacking around these claims.

In this paper, we estimate the causal effect of adopting a widely popular LLM agent assistant, namely Cursor, on development velocity and software quality. The estimation is enabled by a state-of-the-art difference-in-differences design comparing Cursor-adopting GitHub projects with a matched control group of similar GitHub projects that do not use Cursor.

We find that the adoption of Cursor leads to a significant, large, but transient increase in project-level development velocity, along with a significant and persistent increase in static analysis warnings and code complexity. Further panel generalized method of moments estimation reveals that the increase in static analysis warnings and code complexity acts as a major factor causing long-term velocity slowdown.

Our study carries implications for software engineering practitioners, LLM agent assistant designers, and researchers.

mitsuhiko

This is one more study that took place before the big adoption of agentic coding tools in April/May and falls in line with earlier findings.

I question I had was how they picked the reference repo set. It does not say how they excluded other AI generators from this sentence:

By scanning Cursor configuration files (e.g., .cursorrules) in GitHub repositories, we identify 807 repositories that adopted Cursor between January 2024 and March 2025. To construct a com parable control group, we use propensity score matching [25] to select 1,380 similar repositories from those never adopting Cursor during observation.

Now the age of the repositories makes it somewhat likely that most AI generation was taking place with Cursor, but it was by far not the only AI assisted generator in the time frame. They also call this out:

Contamination in never-treated controls. Even if DiD leads to unbiased estimates regarding Cursor adoption impact in studied repositories, interpreting specific results remains challenging. Control groups are likely contaminated with LLM-based tools, especially earlier ones like GitHub Copilot and ChatGPT [ 2 ], so estimates would be smaller than true LLM agent assistant impact compared to using no LLM at all.

I think the more interesting question for future researchers will be to see how the adoption of agentic coding tools changes the situation. I routinely come across repositories on GitHub now that are entirely AI generated, some of them very obviously so. Many of those are unlikely to last very long given the low quality, but this might have some knock-on effects if people start to depend on that via dependencies.

In general there is a lot of potential for research and I'm incredibly curious to see when the first studies cover Claude Code and similar modern agentic tools.