
Last month, Microsoft’s GitHub announced Copilot, a new AI helper service for software development. GitHub Copilot supports a variety of languages ââand frameworks and can provide suggestions for entire lines or entire functions right in an IDE. GitHub Copilot is powered by OpenAI Codex, and it is formed on billions of lines of open source code. Since the announcement was made last week, some copyright enthusiasts have criticized GitHub. Some have even claimed that Copilot is removing open source code to provide a paid AI service to developers.
1)
Hi. I know you are excited about the co-pilot.
GitHub has scratched your code. And they plan to charge you for the co-pilot after you help train him further.
It is truly disappointing to see people take joy in seeing their work and time being used by a company worth billions.
– Brian P. Hogan (@bphogan) July 2, 2021
2)
Some thoughts on Github Copilot and copyright (I’m not a lawyer).
Copilot uses a version of GPT3 formed on GPL licensed code. The GPL gives everyone the right to copy and to make derivatives. Derivatives inherit from the GPL.
Copilot can sometimes memorize and repeat snippets of code. pic.twitter.com/1JLwfQI65l
– Mark O. Riedl (@mark_riedl) June 30, 2021
3)
âI’m leaving GitHub because the co-pilot is using my OpenSource code for trainingâ is such a strange move. Anyone can access it and GitHub can provide OpenSource code to it from anywhere and US copyright law allows this. I’m also pretty sure we shouldn’t be strengthening copyright laws …
– Armin Ronacher (@mitsuhiko) July 3, 2021
4)
âI’m leaving GitHub because the co-pilot is using my OpenSource code for trainingâ is such a strange move. Anyone can access it and GitHub can provide OpenSource code to it from anywhere and US copyright law allows this. I’m also pretty sure we shouldn’t be strengthening copyright laws …
– Armin Ronacher (@mitsuhiko) July 3, 2021
I don’t understand this whole argument that GitHub Copilot violates the copyright of the GPL code. First, machine-generated code should not be viewed as derivative work. If an AI release qualifies as a derivative work, you cannot create a music recognition app because your AI model will be based on copyrighted music content. Second, even though Copilot generates the exact short snippets of code from the training data sets, it should not be considered copyright infringement. For example, consider the code below.
if (i
i = i + 1;
You cannot claim the copyright of the above code because it is not original code. GitHub Copilot should be able to suggest such snippets to developers without violating copyright laws. It will be interesting to see how Microsoft and GitHub react to these copyright criticisms in the coming days.