Browse code by meaning
37 points by Gabriella439
37 points by Gabriella439
I think a lot about project structure in the age of AI in general. And I don't mean that in the colloquial sense that people often say that phrase, I really mean I routinely, regularly, actually think about project structure in the age of AI. This is filesystem structure but also identifier naming and scoping too. There are so many facets around it I find interesting:
First, many agentic coding tools allow directory-scoped instructions (e.g. AGENTS.md/CLAUDE.md). A well-formed AGENTS.md can make a critical difference between perfect first-try agentic behavior or necessary back-and-forth. In newer projects, I've consciously reorganized some files I would've previously put next to others into separate directories ONLY so I can have a scoped AGENTS.md.
Second, agents work better when they find the right context faster and more reliably. The first line of most agents are filesystem operations like ls. So, it makes sense to organize your project in a way that is friendly to an agent, perhaps even at the expense of the human. This blog post talks about this, although I don't go so far in my own practice: https://ampcode.com/notes/by-an-agent-for-an-agent
Third, I am more and more often finding what I need in a codebase (even my own that I'm familiar with!) by just prompting an agent. Example from today: https://ampcode.com/threads/T-019c6781-46f6-76db-af2a-22a47ad1376b I always backstop that with "link me to the files you found" so I can open an editor and verify it myself. But it's just SO MUCH FASTER in many cases than manually digging through a bunch of files. So, as this gets better and faster, why does the filesystem structure matter?
Basically, I'm already browsing many codebases by "meaning" as this blog post states.
So, given all that, I've often wondered: should things like the default GitHub view actually be something closer to this blog post? Should it be structured by derived meaning with an AI search bar? By default? Of course, tree view should be available somewhere still.
I think we need search and a semantic directory tree (or something like it).
Search is good when you already have some idea of what you are looking for and you have a high level of intent, but a semantic directory tree helps users discover "unknown unknowns" and get a general "feel" for a project when exploring it for the first time without necessarily a specific intent.
Third, I am more and more often finding what I need in a codebase (even my own that I'm familiar with!) by just prompting an agent.
This. Honestly it's much easier to remember what a service did in terms of functionality or how a constant specifically played a role on a certain part of the system that remember the exact location on a file where the implementation was put in place, and to pray it's not duplicated. It's so much faster indeed and much less mental load
Offtopic: I'm really curious why this post it's getting flag as spam, I feel lobste.rs it's becoming a contrarian place where everything that relates to LLM or agents gets flagged, which I believe it's not even unfair but obtuse
One thing I would like to see from programming language tooling is a way to get the project file dependency graph in a standardized format. It's effectively a third way to consider project organization:
Currently, you can use LSPs to find where different functions/classes are used, but they don't provide the higher level interface that says which modules depend on other modules. I think this sort of information would be pretty interesting to feed into LLMs for automated refactoring.
there's LSIF which builds on top of LSP https://microsoft.github.io/language-server-protocol/specifications/lsif/0.6.0/specification/
From skimming the code I assume this just uploads every file to openai. Is there a way to run this with a local model at least?
Currently no, but I'm open to supporting this. If you open an issue on the issue tracker I can take a stab at it
This is really sweet! One issue I've had with a lot of LLM based tools is that they pretty heavily break from the Unix principles that we hold dear. They expect to encompass the whole world and wrest control from the user. I love composable tools because they provide some level of skill expression for users. This seems like a great use of LLM tools.
to be clear, this too is an "encompass the whole world" type of thing: to work, it must embed all your files, produce a similarity matrix (an all-pairs distance between embeddings), cluster etc. The quality of the similarity search depends greatly on the quality of the embedding model: this can be as simple as a bag of words index or an LLM like used here. I hesitate at calling all of this "searching by meaning" because that's highly contextual and should be measured.
Unless I'm misunderstanding the tool, I see this as a frameworks vs libraries type of divide. A framework manages the control flow and you plug in little bits of logic or information. A library provides some functionality and you manage the control flow. Crucially, a library can have deep functionality, it's just not opinionated about the control flow around it.
I see Claude Code as operating more like a framework because it's managing the control flow of a session. I might have misunderstood, but I saw this as operating more like a library because you run it on a directory, it provides the summarization, and you can consume the output. If it's more TUI-like then that's my bad.
Regardless, I think it would be nice to have more LLM based tools that provide deep functionality with a more traditional UNIX style responsibility.
First of all, this looks like a really neat tool and is a wonderful antithesis to the recent flood of anxiety-posts being discussed in a meta thread right now. It also touches on a great point-- we need more out of this space that feel a little more like "here's an application of new technology that quietly, meaningfully improves an experience" rather than "I shoved my chat interface into your app, would you like Copilot with that?"
From how I read your article, the LLM usage boils down to summarizing (possibly large) code files, then algorithmic parts take over. Do you have any intuition about what would happen if you scaled down to very small models, like the type that could run on an NPU? I could imagine this being a very quick and quiet background task that that hardware block was built for. It might unacceptably compromise the quality of results, or would it perhaps still be reasonable?
I'm not sure because I've never really experimented with smaller models much, but I can comment on what I do know.
The quality of the file summarization step doesn't need to be great. Usually an okay summary is enough as long as it captures the overall "vibe" of the file. So that's probably a good opportunity for saving on latency/cost and it's also the lion's share of the tokens spent, too.
The quality of the cluster labeling is way more important to the user experience and also the one where I would push on using the best model available given the requirements or leaning on non-LLM techniques (e.g. keywords/TF-IDF) to improve the labels. The reason why is that a core function of the cluster labeler is to basically reverse engineer what makes a cluster a cluster, which is often not obvious and sometimes requires insight on the model's part.
For example, when test-driving the algorithm on my partner's collection of memes there was one cluster which seemed very poor, meaning that the files seemed unrelated and the cluster label seemed not great. At first I thought I thought it was a problem with the clustering algorithm having grouped together the wrong files, but when I inspected the results to reverse engineer what the semantic embedding had indexed on it turns out that the cluster was actually good and the semantic indexer had clustered together a bunch of memes using the word "based", and the labeler for the cluster hadn't picked up on that pattern even though the file labels had enough information to make that inference (most of the file labels in that cluster had "based" in the description).
So most of the improvements in the algorithm's footprint will probably come from reworking the cluster labeling algorithm to do more with less.