Roko's dancing basilisk

8 points by spc476


hwayne

I've found DeepWiki really useful, but don't think I've ever actually looked at the generated docs. But if you ask "how does feature XYX" work, it'll generate links to what it thinks are the relevant sections of code. That's great for orienting me in reading the source.

I'm sure copilot or any other coding llm can do the same but with deepwiki I don't have to download the repo first

alexandria
  1. I do not have “Unsupported markdown: blockquote” or “Unsupported markdown: list” unary operators.

  2. Oh my God! I can't say how bad this backend matrix table is. It's all sorts of wrong. It's not that it got the supported/non-supported markers backwards, it appears to have just made up the results! [...]

  3. The example of writing an instruction to the various formats is wrong for the RS-DOS version—the type and length should be two bytes each, not one.

  4. The output format for -t is incorrect—it doesn't show a trace of the code being run unless the TRON directives are in use.

  5. Every example of the .ASSERT directive is just wrong as it did not use the proper register references, and memory dereferences need a @ (8-bit) or @@ (16-bit) prefix.

  6. Where you can use the .TRON direcive is wrong—it can be used anywhere; it's .OPT TEST TRON that can only be used inside a .TEST directive.

[...]

Overall, this was less obnoxious than having the LLMs write code, but I feel it's still too inaccurate to be let loose on unfamiliar codebases, which I suspect is the selling point.

TL;DR: Does not do a good job. Appears to do a "lot", but still confabulates and repeats itself incessantly, making it worthless to depend on for practical purposes unless you want to spend hours chasing your own tail over something it hallucinated.

If a junior came to you with this they would be summarily fired or there would be an investigation into what on earth is happening.

viraptor

Deepwiki seems to use Devin, but I'm not sure what model that uses under the covers. It would be nice to know to compare with (for example) running an opus agent to generate documentation. What I mean specifically is - the will be a huge difference in quality even different models and for some of the points mentioned I'd expect the recent releases to do a much better job.