Lightweight protocol to assert authorship of content and vouch for humanity of others

52 points by beto

madhadron

AI-generated content is often plausible sounding but wrong.

This, of course, is true of masses of human generated content, too, which is why we have the system of citations and primary sources.

SamRW

Yes, people often have their own opinions, and in some cases these are demonstrably wrong. However in the case of AI, and more to this point of "plausible sounding but wrong", the worst part about AI is that the writing doesn't reflect anyone's true opinion.

If I read your blog, and you are talking some nonsense, at least I have the added bonus of knowing what you think. I may not agree with what you think, but I can read your perspective.

If I read a blog, and it was written by AI, it is a waste of my personal time. It would be great to have a way of knowing whether the content I am reading was written by a human. I'm not 100% sold on the idea of this standard's viability to be implemented in practice, but the idea is good.
BenjaminRi

I think the unique thing about AI is that it writes in the style of an academic, but thinks in the patterns of an infant. This mismatch is what makes its texts so dangerous. Before LLMs, the quality of the writing was often correlated with the qualities of the thoughts within the text, this is no longer the case. However, careful readers have always separated the artistic quality of the text from the thoughts within, so the issue is nothing new indeed.
beto

Fair point!

But keep in mind that the protocol has two aspects: (1) people declare their humanity, but also (2) you get to choose who you trust. So you could build a web-of-trust composed of only rigorous authors — people who will only post human-generated content that has linked sources and references and, in addition, that will only vouch for similar people.

My expectation is that small communities will arise as people start using the protocol and vouching for each other, with similar goals and practices. And given that the browser extension stops crawling at 5 hops, and warns you with a different color at 2 (yellow) and 3+ (orange) hops, the trust will likely remain inside these communities.
- Corbin
  
  Why does (1) matter if you've already committed to (2)? If you get to choose who you trust, why not choose to only trust humans?
  - beto
    
    (1) is important because when you post the human.json file to your website you can also vouch for other people. So someone who trusts you would benefit from that shared trust.
    
    It also provides the necessary metadata for the browser extension to work. When you trust a website with the browser extension it crawls the web of vouches, discovering other websites that are also "human-generated".
- eminence32
  I guess the "web of trust" thing is interesting, and might work in small cliques, but I'm not sure I've seen examples of a web-of-trust working at large scale.
  
  But I don't really see this proposal working.
  
  If you're worried about trying to identify low quality AI-content farms, those places are already using deceptive practices to attract eyeballs and so why wouldn't they just lie in their human.json file?
  
  If you're trying to learn about the authorship of some content that appears to be high-quality, then do you really care who wrote it? (This isn't a rhetorical question -- I can think of some times where the answer would be 'yes' and other times where the answer would be 'no')
  
  If you're trying to use authorship info as a way to figure out if the content is high quality, then maybe there is some signal in a human.json file, but it feels like a weak one to me. I do recognize this is a problem, though. AI-generated content is often plausible sounding but wrong.
  
  If you're just trying to reward human authors who eschew AI generators, then that's a fine goal, but I'm not sure a human.json file is the best way to do that
  - beto
    
    Author here. Thanks for the feedback!
    
    To be honest, I don't this is supposed to scale. It's more focused on the IndieWeb/smolweb/etc., where you build personal relationships with site authors, and you want to leverage the relationships they have built with other people. I have maybe ~20 sites where I can say I reasonably trust that the authors are writing their own content, and this protocol is a way to grow that number based on the people they trust.
    
    AI-content farms could post a human.json file, but you would still have to trust one of those websites, which you probably wouldn't. For the process to work you need a few "seed" websites where you trust the authors enough to also trust the people they're vouching for, so a high level of trust is needed when adding a seed.
    
    And I think quality is a non-goal here, like you said in your last point, it's more about rewarding human generated content. You stumble upon a website, you see the green dot on the browser extension, and you know that you and the author are connected through a web of trust, and you know that the content you're reading is genuine — even if low quality or factually wrong, someone took the time to write it.
    
    Edit: formatting
    
    eminence32
    
    If the goal is just to help promote and connect human authors, then I think I'd prefer (speaking as a reader, not as a website author) to just see some type of standardized footer that says something like "hey, I wrote all of this myself, and also here are some other people that I trust, go read them too"
    
    beto
    
    Not the same, but some people have started using /ai to describe their AI policy. I agree a standard footer would be nice!
    
    mcherm
    
    My biggest concern about the proposal is the scaling, and I think "this isn't supposed to scale" makes it essentially useless to me.
    
    Your proposal does a breadth-first search through all of the items in your web of trust (up to a certain depth). That inherently limits you to a small number of sites which can be certified. Even if you had a few hard-working individuals who were willing to maintain large lists of trusted-to-be-human sources, you couldn't really add them to your tree of trust because they would make the number of nodes you had to visit be too large.
    
    So I think any proposal which relies on "keep or build a local list of all sites known to be human" will be unable to scale to a size which would be useful.
    
    beto
    
    That's fair! But also, there's nothing preventing someone from writing their own crawler, and expose that through an API — moving the database of trusted sources from the browser to a server.
    
    sjamaan
    
    My biggest concern about the proposal is the scaling, and I think "this isn't supposed to scale" makes it essentially useless to me.
    
    This is actually the bit I like best about it. If it were large-scale and comprehensive, the system would easily be usable by LLM webscrapers to avoid model collapse. Even as it is, I'm already hesitant to adopt this protocol for the same reason. I don't want to help those fuckers.
    
    eminence32
    
    Comment removed by author
  - Gaelan
    
    I've been thinking about something similar, but implemented as a search engine crawler, on the assumption that (by and large) human-written web pages only link to other human-written web pages, because there's no reason to link to something of no value. Of course, implementing a search crawler is something of a fraught prospect at the minute.
    
    But god, a search engine that only covered human-written pages would be a godsend these days.
    
    fleebee
    
    It's not exactly what you're asking for, but Kagi has SlopStop that aims to downrank AI-generated pages flagged by users.
    
    I can't compare it to other search engines (because it's the only one I use), but I'm happy with my search experience and at least they're making an effort to fight slop.
    
    koala
    
    Kagi has smolweb, which has a curated list of domains. https://marginalia-search.com/ is also a thing. There are a few similar projects.
    
    (Unfortunately, they can be a bit hit and miss. Kagi smolweb only does English sites. My blog seems to have very few pages indexed in Marginalia.)
  - xyproto
    
    Content from the time before AI should also be considered as human.
    
    beto
    
    It's a valid point, but I'm not sure how to incorporate it in the protocol, since it can be easily spoofed. If you trust someone, the date is irrelevant; and if you don't trust someone (or they're not vouched by someone you trust), then you can't trust the date either.
  - sjamaan
    
    I have questions. Is one supposed to vouch only for people one knows personally? And what about sites of people who haven't used AI up to a certain date, and then start using it? I don't see myself keeping such a list up to date: after adding a site to it, it would probably stay there even if I've stopped following the site myself so I might not even realize they've started to post slop.
  - csomar
    
    This doesn't address how it will scale and it's the same as "a href linking" but with extra steps. At scale, it'll just create an industry similar to that of SEO where people build a trust chain with one another.
  - addison
    
    I like this. A while ago I was thinking about such trust mechanics. Webrings but directed graphs rather than proper rings. Ah well.
    
    Regardless of impl, I always worry about how these things can be abused. Other folks have already mentioned the spam generators simply lying, but I also worry about it effectively focusing scraping efforts. I would probably only deploy this with iocaine, and even then, I just generally wonder if the open web must close somewhat...