Translating non-trivial codebases with Claude
15 points by nathell
15 points by nathell
This is neat! This sort of thing seems to be squarely in the wheelhouse of LLMs. I'm hoping that this also leads to more people getting interested in fuzzers, because fuzzing multiple implementations is one of the best way to discover divergences. I do wonder if you'd discover divergences if you fuzzed these implementations or if they're close enough (or a sufficiently small enough test suite) to actually match exactly.
Side point:
Somehow compile C++ Morfeusz to JVM bytecode. If there are ways to compile C++ to WASM, there should be some way to compile it to JVM, right?
Right, because you can run wasm on the JVM using Chicory! (Or GraalWasm if you're willing to use a custom JVM, though I think that doesn't work with your "single self-contained JAR" goal.)
Performance is not stellar but should be more than adequate for this application.
I had great success with "fuzzing" like that. I've been implementing ibtool by hand for a while, but then gave up the slog part (compare huge files, make minimal reproduction cases, then fix that one weird property, and repeat...) and handed it over to LLMs. I found at least 3 genuine bugs in Apple's implementation. For example deduplicating numbers too hard and ending up with a dictionary indexed: 1, 2, 3.0, 4, ... And getting random results, so some windows with get extra 17px width depending on the phase of the moon or something.
The magic prompt turned out to be "add more pathological test cases".
Funny, I'm going in the opposite direction, sort of.
Clojure works ok, but Opus 4.6 can't handle our old Clojurescript/shadow-cljs code base (one of the few things it consistently fails on), so I'm looking to migrate all that to Ts/Js.