Human proof for FOSS contributions
16 points by rdg
16 points by rdg
As much as I dislike these tools, recording people contributing is kind of dystopian. I would never bother contributing even if I was interested, and such thing would make me uncomfortable to use the project: Who knows what else you are recording? (I would trust you didn't, but the nagging feeling would be there.)
Who knows what else you are recording?
It sounds like the proposal is that a contributor would set up and run asciinema themself, and would attach the file to an email themself. So the contributor would know that they are sending only what asciinema records. And asciinema has existed for more than a decade and is open source, so I doubt that it secretly records more than what it claims to.
That said, I also wouldn’t contribute to any project that required an asciinema recording:
So I'm considering it a candidate to provide a proof that a patch was written by a human.
I would hope the consideration is where this stops. There are many ways of writing code by oneself, which this would be utterly incompatible with. That's even suggesting that recording like this would even be an effective filter, or if it'd be accepted by anyone (e.g., I do not contribute, but I would absolutely not if this was a requirement).
I share your desire to weed this out, but this is not the way.
Besides the good points raised in other comments, I would like to add that I don't think this is a good "community building" [1] move. I firmly believe that putting trust in other human beings is a positive signal to send, dare I even say a politically dissident, almost revolutionary, move, in a world where we are constantly taught otherwise. In my experience, human beings usually try to live up to what you put in them: trust them and they will act honestly and responsibly; ask them proof that they're not liars or cheaters, and they will try to game whatever "security measure" you put in their way.
One more thing to consider is that energy you put in any security measure such as this one is energy not spent on doing more interesting thing. Maybe having a LLM-written contribution by a dishonest contributor slip through the cracks occasionally, and working later to revert it is less time-consuming than reviewing videos of contributors' coding sessions.
This may sound incredibly naive; I am fine being labelled that way, the world I'm interested in building involves trust between humans.
[1] Disclaimer: I may be out of my depth here, the only "community" I am part of is niche and small and I -unfortunately- still do the largest part of the coding in it…
i agree with sibling comments that this would constitute an unreasonably high bar as a systematic requirement for contribution… that being said, i don’t necessarily think it seems completely untoward as a potential means for establishing trust.
if the person submitting the contribution is simply given the option to include a recording that they created, as an optional means of establishing credibility more quickly, that seems like a good pattern to support. i think it should be expected that submitters would always create these artifacts themselves: using some sort of instrumented system that records on the behalf of the maintainer lends itself too easily to the sort of snooping that u/Aks alludes to.
i don’t think this is remotely satisfying as an absolute or holistic account of human provenance, but i welcome this idea as part of a potential “diversity of tactics” that could be employed for distinguishing between pure-LLM, LLM-inflected, and pure-human works.
i also think focusing too much on using asciinema to cover terminal usage is limiting:
seeing more of the process gives the maintainer a better chance to build a theory of mind for the submitter.
this isn’t a perfect signal, and like the author mentions, it may eventually be more readily replicated by LLMs, but i think it has potential as an analog to artists “showing their process” to dispel claims that their work is the product of generative AI.
perhaps it is only a temporary window, but i think something like this could give us a better window into the problem of code provenance and LLM detection.
wanted to get these thoughts out before i slept… i’m very tired so please be nice if i said anything particularly silly…
seeing more of the process gives the maintainer a better chance to build a theory of mind for the submitter [...] it may eventually be more readily replicated by LLMs
Publishing terminal/editor recordings would also provide richer training data for LLMs. This could make LLMs better at mimicking humans, which is the more obvious concern. More subtle is that by learning to mimic the human process of programming, LLMs might become stronger at programming in general.
I don't know the Dillo project's goals with respect to LLMs, but I can imagine goals to which that would be counterproductive in the long term.
ah yes, turning this from an illegible “i will organically record my workflow” task into a very legible series of text records would definitely be a way to quickly undermine this signal… the illegibility probably helps explain why the idea is (or seems) promising: because when done as described in the OP, it doesn’t line neatly up with any activity that is well represented in the corpus
Have a good night! The only "silly" thing I'd argue is that these are purely technical solutions. Since we are talking about verifying humans I personally wouldn't want to contribute if someone I don't know is about to judge my messy process
I fully understand the desire to know if a LLM is involved and maybe more importantly to what degree. I'd expect a potential contributor to understand the code they submit, have it verified themselves and be able to have a conversation about it.
But, as others already said, this certainly isn't the way to do that verification. For starters, I am not a cli native. I am comfortable using a terminal but for programming a full IDE with GUI is where I do my work. I also expect that my workflow would be tedious to rewatch as I like to get a feeling for code by adjusting it, running it, throwing in log/print lines, redo things, etc amd repeat until I have fairly solid understanding of the code and a satisfying result. You might argue that this would show that I am human, but I also wouldn't feel comfortable showing that process to someone I don't know who will be judging it.
You might argue that this would show that I am human, but I also wouldn't feel comfortable showing that process to someone I don't know who will be judging it.
I think there is something very human about not wanting to be watched and judged.
Yup, unfortunately it also means no contribution from the human. Which sort of defeats the purpose ;)
I don’t know how folks tolerate it at their job to have an always-on keylogger & screenshare required. Even if you get dedicated hardware, it’s just so… yes, dystopian. I don’t even like how often cameras-on is required for meetings (even weirder is these web apps that don’t work if you don’t share a camera & I need to fire up OBS with a black screen). …& this is my feeling despite seeing how sadly important it can be for like students to make sure they are paying attention as it’s so easy to get distracted on a laptop.
I'd expect a potential contributor to understand the code they submit, have it verified themselves and be able to have a conversation about it.
I think this is the key trick - ask some questions about "why did you do it this way", see if the responses pass the smell test. You're probably asking those questions anyway!
Admittedly the LLM contributor is probably just pasting those questions into an LLM, but I'd hope it's able to distinguish that from a human genuinely considering the question and recalling their implementation decisions.
I am comfortable using a terminal but for programming a full IDE with GUI is where I do my work.
The reason I mentioned asciinema is because we have very strong restrictions on what we require from users in order to hack Dillo. We make sure that you can both run and build Dillo it in almost any machine. You only need about 150 megabytes of memory to build it, and about 10 minutes in my oldest single core CPU.
If we ask for a video recording, all those under a metered network connection will be left out. Similarly, if you don't have access to a electricity grid, having a video compressor running in the background would be a waste of precious power.
If you prefer a video, that's completely fine, but you would need to find storage for it because it won't fit as an email attachment.
I also expect that my workflow would be tedious to rewatch as I like to get a feeling for code by adjusting it, running it, throwing in log/print lines, redo things
Yes, this is what we all do and is perfectly fine to make mistakes, nobody is going to judge your patch based on how many mistakes you made. In fact I would argue that looking at which parts of the code cause you struggle is a good information to try to make them easier for future contributions.
You might argue that this would show that I am human, but I also wouldn't feel comfortable showing that process to someone I don't know who will be judging it.
This is a very good point. In fact, when I was writing the article and recording it myself, I also experienced that slightly unconfortable feeling of being watched. In my case, I thought it was mostly due to publishing the recording publicly, but that sending it only to a reviewer would reduce that unconfortable feeling.
I think it is not acceptable to make people unconfortable, and perhaps is a good idea to ditch the proposal completely. The main objective of the post was to gather feedback, nobody wants to feel they live in 1984.
I think this is missing the point that programming is not about typing but about problem solving which involves mostly reading, pondering and experimenting.
This mirrors some approaches to assessing coding assignments: requiring students to push to timestamped repos throughout the semester, or having graders inspect individual git commits. The idea is the same: process artifacts are harder to fake than final artifacts. But all we're doing is buying time. The asymmetry might hold today but it doesn't hold by construction.
I am not sure I like this idea, but this is clearly just a blog post, not a formal requirement. What I imagine this idea would evolve was:
It's going to take human time from maintainers to review the recordings at any reasonable fidelity... at which point I'd rather do something that's actually contributing some value. Perhaps require a live synchronous code review for the first (or first N) contributions from a new person, and take a bit of time to onboard them, including but not limited to whatever your policies for llm use are.