I see talk here and there about how any company or individual can easily use anything we post on Lemmy however they want. This could include AI training, behavior analysis, or user profiling. With the recent news of Reddit data being sold and licensed for AI training, I thought this would be a great time to preemptively discuss how we feel about this topic and brainstorm ways to discourage unwanted use of the content we post.

I’ve seen some users add a license to the end of each of their comments. One idea might be this: Add a feature to Lemmy where each user can choose a content license that applies to everything they post. For example, one user might choose to no rights for their content (like CC0) because they don’t care how their data is used. Another user might not want companies profiting off their posts, so they’d choose a more restrictive license.

I’m eager to here everyone’s thoughts on the whole topic, so to kick things off:

  1. Do you care how your public data and posted content is used? Why or why not?
  2. What do you think of choosing a content license for your Lemmy account? Does this contradict the FOSS model?
  3. Should Lemmy have features to protect user data/content in this way, or should that be left up to the user to figure out on their own?

Data is becoming an increasingly valuable commodity in the digital world. Hopefully these big-picture conversations can help us see what we value as a community and be more prepared for the future.

  • makeasnek@lemmy.ml
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    10 months ago

    These are interesting questions, thank you for getting the discussion started on them.

    Do you care how your public data and posted content is used? Why or why not?

    Not really. If I did, I wouldn’t post it somewhere public like lemmy. I guess if I were sharing source code or artwork I had made, I would feel differently about somebody taking those and breaking the license terms on that. But I don’t care if they’re used to train AI. Well-trained AI benefits all of humanity, and it’s not like they’re making copies, they’re just learning piecemeal from millions of pieces of content like mine. Whether or conventional licensing applies to AI at all is still a question of open legal debate that will probably take years to resolve.

    What do you think of choosing a content license for your Lemmy account? Does this contradict the FOSS model?

    I think this is a great idea and gives users some degree of additional control and clarifies for people who might want to use the data how they can use it. This can also be an interesting marketing tool to be able to say that on Lemmy you choose who uses your data and how, even if the enforcement mechanism is on the legal side not the technical side. The default should be public domain or copyleft license as that benefits the commons the most, but users should be able to make their own choice.

    Should Lemmy have features to protect user data/content in this way, or should that be left up to the user to figure out on their own?

    Not really aside from letting users choose licenses for their content. I do think AP should integrate encrypted DMs/messages like nostr etc has, this is an important feature. But that’s really outside of this particular discussion.

    Edit: Additional thought on licensing fees. If users could post, for example, their Bitcoin lightning address in their profile, they could automatically “license” their content this way. They could set a flat license fee in their profile per post or per word or whatever, perhaps it could be modified on a given post if the user wanted to, and if some company wants to come along and use their content, they could automatically pay for the licensing for that content. This would be an interesting way for users to get paid for their content. Lemmy and/or the instance could even take a portion of those payments, say 10%, and put it towards development. Having this all done via lightning would make this process automatable. Companies scraping AP/Lemmy data could search, find content, and then buy the content that suits them best. They might be willing to pay more for rare content types, for example, content on niche communities. Companies get proof, via the lightning transaction, that payment was made.

    As a user, I wouldn’t mind getting a few bucks per year for my content and knowing that my money is also contributing to Lemmy development and the sustainability of this whole fediverse thing. Nostr has a similar functionality with tips/zaps and tip pools, though it’s not based around licensing.