A slippery slope to eliminate copyright out right. The argument made is that AI is somehow more special and will otherwise lose to competition with China.
The flaw there is that AI is not more special than any other endeavor while all other American markets must equally compete with China.
What that failure means is that when anything is exceptional then everything becomes exceptional because the economic conditions apply equally and therefore bypassing copyright protections applies equally to anybody facing external competition.
How does one define "win" or "lose" in this supposed competition.
There is endless discussion of a "race" but I cannot find a single discussion of how "winning" or "losing" is actually determined (cf. wild speculation about what the future will look like -- may work for marketing puporses but almost always incorrect).
Copyrights are more often used to defend large corporations than small creators. As long everybody has a level playing field and individuals benefit from weaker copyright laws, it might actually make the world a better place. I'm not arguing for the complete elimination of copyright protections, but today's laws, in particular copyright duration, are immoral. This is as good of a starting point as any assuming OpenAI isn't the only who gets to benefit from it.
In a billion dollar company can use pirated books for a business, should we allow them to use pirated software too? Do you think that requiring a company to pay for Windows license is "immoral"?
> Copyrights are more often used to defend large corporations than small creators.
Are there numbers to this or is it empty conjecture? The reality is that resulting civil judgments apply the same regardless of owner size, which benefits small owners disproportionately to large owners with regard to windfall versus regular revenue. That is OpenAI’s principle concern: they don’t want to get sued into bankruptcy by numerous small rights owners.
Surely copyright isn't the problem here. Without copyright, music industry could pay nothing for the music..? Just copy it with impunity.
Music industry, presumably, takes a bet on many musicians, and only a few make it. The revenues made by the successful ones effectively subsidise the unsuccessful ones.
Also if musicians are so widely screwed by the bad industry, why don't they create a cooperative agency that treats them well? There's enough money sloshing around in successful musicians' coffers.
OpenAI has no moat. They're afraid of open source and want the government to protect them.
Microsoft doesn't think they're very cool anymore.
Sam Altman is going to have one of the quickest falls from grace in tech history. It's a shame he's using his time to try to legislate a worse world for the rest of us.
At the rate things are going in the US, "legislate" seems to be largely replaced by "executive directive", so maybe you don't have to worry about legislation. (We will still have the worse world part, of course.)
All of this, plus it's not even AI in the generic sense, it's just very advanced text generation, or a certain application of AI. So the chinese Gemini will offer to summarise e-mails at lower cost, who cares?
Copyrighted material includes works by authors from outside the US. By Berne convention, the exceptions which any country may introduce must not "conflict with a normal exploitation of the work" and "unreasonably prejudice the legitimate interests of the author". So if at least one French author does license their work for AI training, then any exception of this kind will harm their legitimate interests and rob them of potential income from normal exploitation of the work.
If the US can harm authors from other countries, then other countries may be willing to reciprocate to American copyright holders, and introduce exceptions which allow free use of the US copyrighted material for some specific purposes they deem important.
IANAL, but it is a slippery slope, and it may hurt everyone. Who has more to lose?
The legitimate interest that there does not exist a tool that allows any random person to create art in the same style as she does? Which could arguably devalue their offering?
Exactly. Some copyright holders do license their work for AI training. It certainly happens in the music industry, but I don't see why texts would be any different. The exception would harm their business.
I just wish they understood they are limited not by the content available, but by the intrinsic characteristics of the architecture and algorithms of LLMs. It's just not the AGI that will magically open it's eyes one day.The sooner we stop burning billions of dollars on it, the better.
... sprinkle in a lot of "strategy" too, to make the reader seem like they are smart. Lay "America/Americans" even thicker, to combine with the sense of higher purpose, i.e. patriotism.
I think an AI should be treated like a human. A human can consume copyright material (possibly after paying for it), but not reproduce it. I don't see any reason why the same can't be true for an AI.
The issue is so much about consumption of copyright material, but acquisition of that material.
Like a real person, AI companies need to adhere to IP and license or purchase the materials that they wish to consume. If AI companies licensed all materials they acquired for training purposes, this would be a non-issue.
OpenAI are looking for a free pass to break copyright law, and through that, also avoid any issues that would arise through reproduction.
A real person wouldn't have to pay to read random blog, Reddit comments, StackOverflow answers or code on GitHub (many open source licenses do not imply license for training).
They might have to pay for books, or use a library.
Should these cases be treated differently? If so, it might lead to more closed internet with even more paywalls.
I think those are less of an issue. They want to train on paywalled news articles, magazines and books. In addition to other media that the average person would have to pay for or would otherwise have limitations applied.
In my opinion, if any copyright related rule is applied to books or other paywalled content, it should equally apply any Joe Shmoe's blog or code on GitHub.
Rights holders are the economically marginal tail wagging the dog due to the disproportionate political power of content industries. All of Hollywood's annual revenues represented 2 weeks of telcos' SMS revenue back when you paid per message.
Well, if we finally have hundred-billion-dollar corporations pushing back on the copyfight around the continual expansion of copyright (e.g. the congressman for Disney, Sonny Bono) or abusive laws like DMCA, that's a welcome development.
Basically stole almost whole output of humanity both dead and alive, put it in their Frankenstein Monsters’ ever growing brains and now want to let em roam unsupervised longer and longer (AI agents) and continue to steal things.
Taking away human freedoms and giving em to agents 101
Most people will call it stealing, lawyers will find a way to call it differently.
So, you’re affirming that you can steal almost the whole creative output of humanity and not sit in jail your whole life?)
They not just stole or infringed, they profit from it, replace and compete with the very from whom they stole (or whom they infringed as you prefer calling it).
The model is like their private library they don’t allow you to enter or see, instead they have a strict librarian who spits hallucinated quotes at you.
The problem is in that. They are not Robbin Hoods who steal to share with the poor. They steal from the poor to make the rich richer. To enrich themselves, grab human freedoms and give those freedoms and more to AI agents.
You cannot steal the whole output of humanity and put in your brain. AI agents and companies already have massively more rights and freedoms than you and it’s gonna get much worse.
There is a narrow way through dystopias because intelligence is inherently static and non-agentic (think static 4d spacetime of a universe), we can open the Library and empower people by making models explorable like 3D games
You steal from others and make them pay - constant scraping cost money (traffic, server load, scraping protection). Then you should only be allowed to release open source models.
A ruling that only open source models can freely use copyrighted data for training would be a funny outcome and a big F you to OpenAI. I don’t expect it to happen but an interesting thought nonetheless.
> An export control strategy that exports democratic AI: For countries seeking access to American AI, we propose a strategy that would apply a commercial growth lens—both Total and Serviceable Addressable Markets—to proactively promote the global adoption of American AI systems and with them, the freedoms they create. At the same time, the strategy would use export controls to protect America’s AI lead, including by making updates to the AI diffusion rule.
Wonder how much the addition of copyrighted material affects how smart the resulting model is. If it's even 20% better LLM makers could be forced out of the US into jurisdictions that allow use of copyrighted data.
I suspect most LLM users will ~always choose the smartest model.
> most LLM users will ~always choose the smartest model
Most LLM users will choose the cheapest model which is good enough.
I think that LLMs' performance is already "good enough" for a lot of applications. We're in the diminishing returns part of the curve.
There are two other concerns:
1. being able to run the model on trusted infrastructure locally (so some jerk won't turn it off on a whim, and the data will remain safe and comply with the local data protection laws and policies)
2. having good tools to create AI applications (like how easy it is to fine-tune it to customer needs)
> how much the addition of copyrighted material affects how smart the resulting model is
Copyrighted material improve the models, not by making it smart, but more factually correct, because it will be trained on reputable, reliable and up-to-date sources.
The jump from llama2 to llama3 had something to do with meta downloading every textbook ever published and using it as training data.
The arguments by meta so far in that court case are absolutely terrible and I'm half expecting to see the world's first trillion dollar copyright infringement award.
Because - well a person can read copyrighted material it legally obtained the rights to, for example by purchasing a hard or electronic copy of the book or magazine. Alternatively, and according to the laws worldwide, if a person were to engage in massive theft for the purpose of "reading" all available copyrighted materials in the world, by obtaining copyrighted material without permission and consent of the copyright holder, they would be at least paying heavy fines, and in most jurisdictions also spend at least a few years in jail. Why should the same not apply to corporations and their executives?
I don't think there is actually a law anywhere that says you need to obtain the rights to copyright material to read/view them. The person or organisation showing it to you, which might be yourself, needs to have a license. Otherwise things like libraries couldn't exist and you wouldn't be allowed to lend books or even have books in your house that other family members can read.
Not saying that particularly impacts your argument about OpenAI, because an LLM in training is not a person. It is transforming data from one format to another for later consumption by people. Therefore they probably would need a license.
I mean, look at it this way. Let's say you purchase a Woody Allen film on DVD. Will anyone seriously prosecute you for watching it at home together with your friends? No, that falls within normal usage. But let's say you now organise a local watching event with the same DVD for 200 people in a hall somewhere, and charge everyone, whatever, $6 - just to cover the hall expenses. Will you be prosecuted? Very likely. Libraries are probably under some sort of "fair use" regulation due to public interest and such. They don't quite generate profit with their line of work - nor should they!
Right, but those 200 people won't be prosecuted for watching it, which was my point. The example I was thinking about when posting would be putting up a copy of copyright art in a public place. The people in the public place are not breaking the law by looking at it, only the person who placed it... well even then, would the workers who put it up be liable? Probably not, it's not reasonable for someone who puts up billboards to check the copyright license.
I do agree with this example in general. But I guess from my point of view, the OpenAI comes across more like the person enabling the use of copyrighted art, and would thus be subject to copyright regulations. Their users I'd see rather as the people viewing the art in public, perhaps unaware of the copyright restrictions. But it also seems like these discussions in themselves are a bit of distraction. If the LLMs worked exactly as they are being hyped up for the third year now, I think we all would get behind the effort. Who would care about copyrights if a magic machine could lead us into the so-called post-scarcity world, right? But sadly it does not appear to be nowhere near that goal, nor will it be, based on what we know about how the technology works. So here we are, discussing if mechanical parrots should read our books :)
Sure, so, can I make and sell my own Lilo and Stitch movie now? It'll be even better than the one about to release, and all that means is I'll deviate even less.
Because the AI is not a person. It doesn't seem like we're anywhere near AGI that could be considered a person. Training an LLM is taking existing content and transforming it into another format for later consumption by a person. That person can run prompts against the LLM to create derivative work, the LLM itself doesn't run prompts or do anything at all.
I don't know much about the legal side, but it seems to me, from the above, that the laws for copyright for LLMs should apply to the company training the LLM as if they're creating a derivative work that they will later sell or license for other people to interact with.
If I derive my work using multiple sources, do all the copyright holders from these multiple sources have an exclusive right on my work? How otherwise would people build a knowledge on some topic and then apply that knowledge to build a product if not by reading bunch of (book) material and studying other similar products?
But given that Trump clearly seems aligned with technobros, I wouldn't be surprised.
This will be good for the rest of the world, though. Other countries will be less likely to be aligned to US, end of US imperialism has been just speed up little by little.
A slippery slope to eliminate copyright out right. The argument made is that AI is somehow more special and will otherwise lose to competition with China.
The flaw there is that AI is not more special than any other endeavor while all other American markets must equally compete with China.
What that failure means is that when anything is exceptional then everything becomes exceptional because the economic conditions apply equally and therefore bypassing copyright protections applies equally to anybody facing external competition.
How does one define "win" or "lose" in this supposed competition.
There is endless discussion of a "race" but I cannot find a single discussion of how "winning" or "losing" is actually determined (cf. wild speculation about what the future will look like -- may work for marketing puporses but almost always incorrect).
Copyrights are more often used to defend large corporations than small creators. As long everybody has a level playing field and individuals benefit from weaker copyright laws, it might actually make the world a better place. I'm not arguing for the complete elimination of copyright protections, but today's laws, in particular copyright duration, are immoral. This is as good of a starting point as any assuming OpenAI isn't the only who gets to benefit from it.
In a billion dollar company can use pirated books for a business, should we allow them to use pirated software too? Do you think that requiring a company to pay for Windows license is "immoral"?
Ideally, software shouldn't be copyrightable, or patent-able. It's what FOSS is based on (couldn't remove copyright, so let's hack it via copyleft).
Well, just you try not to pay for ChatGPT...
Of course, how else could we train the neural networks to run the programs?
/largest_company
> Copyrights are more often used to defend large corporations than small creators.
Are there numbers to this or is it empty conjecture? The reality is that resulting civil judgments apply the same regardless of owner size, which benefits small owners disproportionately to large owners with regard to windfall versus regular revenue. That is OpenAI’s principle concern: they don’t want to get sued into bankruptcy by numerous small rights owners.
Copyright significantly powers revenue to corporations than individuals. Take music - this article show it's only 12% to individual musicians.
https://www.rollingstone.com/pro/news/music-artists-make-12-...
Surely copyright isn't the problem here. Without copyright, music industry could pay nothing for the music..? Just copy it with impunity.
Music industry, presumably, takes a bet on many musicians, and only a few make it. The revenues made by the successful ones effectively subsidise the unsuccessful ones.
Also if musicians are so widely screwed by the bad industry, why don't they create a cooperative agency that treats them well? There's enough money sloshing around in successful musicians' coffers.
...right, lets make sure we protect the little, undercapitalised startup OpenAI from the large corporations holding them back :)
OpenAI has no moat. They're afraid of open source and want the government to protect them.
Microsoft doesn't think they're very cool anymore.
Sam Altman is going to have one of the quickest falls from grace in tech history. It's a shame he's using his time to try to legislate a worse world for the rest of us.
At the rate things are going in the US, "legislate" seems to be largely replaced by "executive directive", so maybe you don't have to worry about legislation. (We will still have the worse world part, of course.)
All of this, plus it's not even AI in the generic sense, it's just very advanced text generation, or a certain application of AI. So the chinese Gemini will offer to summarise e-mails at lower cost, who cares?
Copyrighted material includes works by authors from outside the US. By Berne convention, the exceptions which any country may introduce must not "conflict with a normal exploitation of the work" and "unreasonably prejudice the legitimate interests of the author". So if at least one French author does license their work for AI training, then any exception of this kind will harm their legitimate interests and rob them of potential income from normal exploitation of the work.
If the US can harm authors from other countries, then other countries may be willing to reciprocate to American copyright holders, and introduce exceptions which allow free use of the US copyrighted material for some specific purposes they deem important.
IANAL, but it is a slippery slope, and it may hurt everyone. Who has more to lose?
And I hope that Mistral.AI takes note.
> then any exception of this kind will harm their legitimate interests
Pray tell what legitimate interest of the author is harmed by LLM's training on that work? No one is publishing the authors book.
The legitimate interest that there does not exist a tool that allows any random person to create art in the same style as she does? Which could arguably devalue their offering?
What I think the parent meant is the interest to sell license to others to train on their data.
Exactly. Some copyright holders do license their work for AI training. It certainly happens in the music industry, but I don't see why texts would be any different. The exception would harm their business.
Example please? It's always been fair use to train on accessible data. It's how for eg: so much of research has been going on for decades.
I just wish they understood they are limited not by the content available, but by the intrinsic characteristics of the architecture and algorithms of LLMs. It's just not the AGI that will magically open it's eyes one day.The sooner we stop burning billions of dollars on it, the better.
The follow-on prompt was "add the word freedom a lot more."
... sprinkle in a lot of "strategy" too, to make the reader seem like they are smart. Lay "America/Americans" even thicker, to combine with the sense of higher purpose, i.e. patriotism.
I think an AI should be treated like a human. A human can consume copyright material (possibly after paying for it), but not reproduce it. I don't see any reason why the same can't be true for an AI.
Then, we should also put the AI in jail when it's breaking copyright laws. Or being an accessory to breaking copyright law.
An AI that's breaking copyright laws shouldn't be legal. So yes, it's kind of like putting it in jail.
The issue is so much about consumption of copyright material, but acquisition of that material.
Like a real person, AI companies need to adhere to IP and license or purchase the materials that they wish to consume. If AI companies licensed all materials they acquired for training purposes, this would be a non-issue.
OpenAI are looking for a free pass to break copyright law, and through that, also avoid any issues that would arise through reproduction.
A real person wouldn't have to pay to read random blog, Reddit comments, StackOverflow answers or code on GitHub (many open source licenses do not imply license for training).
They might have to pay for books, or use a library.
Should these cases be treated differently? If so, it might lead to more closed internet with even more paywalls.
I think those are less of an issue. They want to train on paywalled news articles, magazines and books. In addition to other media that the average person would have to pay for or would otherwise have limitations applied.
In my opinion, if any copyright related rule is applied to books or other paywalled content, it should equally apply any Joe Shmoe's blog or code on GitHub.
Yeah, shorten the terms of copyright on original works by about 90%, and call it a win for everyone except for rights holders.
Rights holders are the economically marginal tail wagging the dog due to the disproportionate political power of content industries. All of Hollywood's annual revenues represented 2 weeks of telcos' SMS revenue back when you paid per message.
From https://news.ycombinator.com/newsguidelines.html > "Otherwise please use the original title, unless it is misleading or linkbait; don't editorialize."
In this case appropriate headline would be: OpenAI’s proposals for the U.S. AI Action Plan.
Well, if we finally have hundred-billion-dollar corporations pushing back on the copyfight around the continual expansion of copyright (e.g. the congressman for Disney, Sonny Bono) or abusive laws like DMCA, that's a welcome development.
Basically stole almost whole output of humanity both dead and alive, put it in their Frankenstein Monsters’ ever growing brains and now want to let em roam unsupervised longer and longer (AI agents) and continue to steal things.
Taking away human freedoms and giving em to agents 101
What stealing? None of the original content is gone. Perhaps "infringement" is a more apt word.
Yes, if you’ll infringe like they, you’ll be in jail forever
Ignoring the non-sequitur on jail, I guess you're affirming that it's not stealing?
Most people will call it stealing, lawyers will find a way to call it differently.
So, you’re affirming that you can steal almost the whole creative output of humanity and not sit in jail your whole life?)
They not just stole or infringed, they profit from it, replace and compete with the very from whom they stole (or whom they infringed as you prefer calling it).
The model is like their private library they don’t allow you to enter or see, instead they have a strict librarian who spits hallucinated quotes at you.
The problem is in that. They are not Robbin Hoods who steal to share with the poor. They steal from the poor to make the rich richer. To enrich themselves, grab human freedoms and give those freedoms and more to AI agents.
You cannot steal the whole output of humanity and put in your brain. AI agents and companies already have massively more rights and freedoms than you and it’s gonna get much worse.
There is a narrow way through dystopias because intelligence is inherently static and non-agentic (think static 4d spacetime of a universe), we can open the Library and empower people by making models explorable like 3D games
You steal from others and make them pay - constant scraping cost money (traffic, server load, scraping protection). Then you should only be allowed to release open source models.
A ruling that only open source models can freely use copyrighted data for training would be a funny outcome and a big F you to OpenAI. I don’t expect it to happen but an interesting thought nonetheless.
> An export control strategy that exports democratic AI: For countries seeking access to American AI, we propose a strategy that would apply a commercial growth lens—both Total and Serviceable Addressable Markets—to proactively promote the global adoption of American AI systems and with them, the freedoms they create. At the same time, the strategy would use export controls to protect America’s AI lead, including by making updates to the AI diffusion rule.
What a bunch of gibberish hot garbage.
It works for comedy without changing a word. Impressive.
Wonder how much the addition of copyrighted material affects how smart the resulting model is. If it's even 20% better LLM makers could be forced out of the US into jurisdictions that allow use of copyrighted data.
I suspect most LLM users will ~always choose the smartest model.
> most LLM users will ~always choose the smartest model
Most LLM users will choose the cheapest model which is good enough.
I think that LLMs' performance is already "good enough" for a lot of applications. We're in the diminishing returns part of the curve.
There are two other concerns:
1. being able to run the model on trusted infrastructure locally (so some jerk won't turn it off on a whim, and the data will remain safe and comply with the local data protection laws and policies)
2. having good tools to create AI applications (like how easy it is to fine-tune it to customer needs)
> how much the addition of copyrighted material affects how smart the resulting model is
Copyrighted material improve the models, not by making it smart, but more factually correct, because it will be trained on reputable, reliable and up-to-date sources.
The jump from llama2 to llama3 had something to do with meta downloading every textbook ever published and using it as training data.
The arguments by meta so far in that court case are absolutely terrible and I'm half expecting to see the world's first trillion dollar copyright infringement award.
Incorrect. Llama 1 trained on books3 dataset.
All of it is copyrighted material
If this happens, I hope they get banned in Europe. This is unacceptable.
Can anyone also use copyrighted source code, e.g. from OpenAI?
[dupe]
More discussion:
OpenAI asks White House for relief from state AI rules
https://news.ycombinator.com/item?id=43352531
The arrogance of these people is without end.
If a person can read copyrighted material and produce derivative works, why not an AI?
Because - well a person can read copyrighted material it legally obtained the rights to, for example by purchasing a hard or electronic copy of the book or magazine. Alternatively, and according to the laws worldwide, if a person were to engage in massive theft for the purpose of "reading" all available copyrighted materials in the world, by obtaining copyrighted material without permission and consent of the copyright holder, they would be at least paying heavy fines, and in most jurisdictions also spend at least a few years in jail. Why should the same not apply to corporations and their executives?
I don't think there is actually a law anywhere that says you need to obtain the rights to copyright material to read/view them. The person or organisation showing it to you, which might be yourself, needs to have a license. Otherwise things like libraries couldn't exist and you wouldn't be allowed to lend books or even have books in your house that other family members can read.
Not saying that particularly impacts your argument about OpenAI, because an LLM in training is not a person. It is transforming data from one format to another for later consumption by people. Therefore they probably would need a license.
I mean, look at it this way. Let's say you purchase a Woody Allen film on DVD. Will anyone seriously prosecute you for watching it at home together with your friends? No, that falls within normal usage. But let's say you now organise a local watching event with the same DVD for 200 people in a hall somewhere, and charge everyone, whatever, $6 - just to cover the hall expenses. Will you be prosecuted? Very likely. Libraries are probably under some sort of "fair use" regulation due to public interest and such. They don't quite generate profit with their line of work - nor should they!
Right, but those 200 people won't be prosecuted for watching it, which was my point. The example I was thinking about when posting would be putting up a copy of copyright art in a public place. The people in the public place are not breaking the law by looking at it, only the person who placed it... well even then, would the workers who put it up be liable? Probably not, it's not reasonable for someone who puts up billboards to check the copyright license.
I do agree with this example in general. But I guess from my point of view, the OpenAI comes across more like the person enabling the use of copyrighted art, and would thus be subject to copyright regulations. Their users I'd see rather as the people viewing the art in public, perhaps unaware of the copyright restrictions. But it also seems like these discussions in themselves are a bit of distraction. If the LLMs worked exactly as they are being hyped up for the third year now, I think we all would get behind the effort. Who would care about copyrights if a magic machine could lead us into the so-called post-scarcity world, right? But sadly it does not appear to be nowhere near that goal, nor will it be, based on what we know about how the technology works. So here we are, discussing if mechanical parrots should read our books :)
Sure, so, can I make and sell my own Lilo and Stitch movie now? It'll be even better than the one about to release, and all that means is I'll deviate even less.
This was settled prior to LLMs - you can't do that because the characters names are copyrighted. LLMs change nothing here.
Because the AI is not a person. It doesn't seem like we're anywhere near AGI that could be considered a person. Training an LLM is taking existing content and transforming it into another format for later consumption by a person. That person can run prompts against the LLM to create derivative work, the LLM itself doesn't run prompts or do anything at all.
I don't know much about the legal side, but it seems to me, from the above, that the laws for copyright for LLMs should apply to the company training the LLM as if they're creating a derivative work that they will later sell or license for other people to interact with.
Copyright does not restrict consumption. It only restricts reproduction. To restrict consumption you need a patent.
Good then that LLMs don't reproduce content.
They produce derivative works, which is also an exclusive right of a copyright holder.
If I derive my work using multiple sources, do all the copyright holders from these multiple sources have an exclusive right on my work? How otherwise would people build a knowledge on some topic and then apply that knowledge to build a product if not by reading bunch of (book) material and studying other similar products?
If they can prove it in court. Would be much easier to do for a LLM than for a human one would think.
> It only restricts reproduction
and distribution.
> ... a person can read copyrighted material
Yes, after paying for it.
People have to pay for it
Didn’t read but
No
I don't think I've ever read anything this disingenuous
This is so wrong in so many levels.
But given that Trump clearly seems aligned with technobros, I wouldn't be surprised.
This will be good for the rest of the world, though. Other countries will be less likely to be aligned to US, end of US imperialism has been just speed up little by little.
[dead]