Published on October 23, 2022
In Endless Origins

GitHub Copilot: The Latest in the List of AI Generative Models Facing Copyright Allegations

GitHub Copilot, the text-to-code AI tool, has been facing accusations of stealing people’s codes. So, what’s next for AI-generative models?

By Ayush Jain

Listen to this story

GitHub Copilot, the text-to-code AI tool, has been—for the most part—revolutionary in determining how people code. Twitter has been erupting with people expressing how this new AI tool has benefitted them with organisation heads and developers alike hailing it for saving much of their time.

However, the latest discussion surrounding it suggests that things are murky.

Tim Davis, Professor – Computer Science, Texas A&M University, took to Twitter to express his resentment over Copilot producing his copyright code for a particular prompt.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy

Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Chris Rackauckas, lead developer of SciML, also shared a thread of Armin Ronacher from July 2021, adding, “Github Copilot spits out the Quake source code. It just repeats its training data often, even without OSS licenses”.

I feel sorry for you given how many really bad takes you're getting here. For some more ammo, here's a thread that shows that Github Copilot spits out the Quake source code. It just repeats its training data often, even without OSS licenses. Oops.

But beyond this, the latest news that has been making rounds is about June 2022, Butterick cautioned organisations creating software products against the use of Copilot, as they would be taking part in using someone else’s intellectual property, albeit unintentionally.

Download our Mobile App

GitHub is trained upon cites a passage from GitHub’s website showing how Microsoft plays safe by pushing the blame onto the end user:

“You are responsible for ensuring the security and quality of your code. We recommend you take the same precautions when using code generated by GitHub Copilot that you would when using any code you didn’t write yourself. These precautions include rigorous testing, IP [(= intellectual property)] scanning, and tracking for security vulnerabilities.”

In a recent statement, Open AI claimed that the training material from public repositories is not meant to be included in the output generated by Copilot. Additionally, their analysis has shown that a vast majority of the output (>90%) doesn’t match the training data.

There is a divided opinion (a grey area, if you will) about who “legally” stands right among the two parties. GitHub has made it clear that the users need to check if the code used is free of copyright infringement, but at the same time, the open-source communities see the whole facade of “AI training is fair use” for their copyrighted codes to be a disregard for their rights. See, for example, this statement by Butterick: “By claiming that AI training is fair use, Microsoft is constructing a justification for training on public code anywhere on the internet, not just GitHub.”

Hence, there is little clarity over who is to be held accountable for this—Is it Copilot or the end users employing the AI-generated code for their product?

GitHub’s claim that AI training comes under fair use needs more inspection. This is not the first time questions of copyright have sprung forth in AI applications. It has been a persistent issue throughout the recent surge in AI generative models.

In an interview with Ben Sobel by IPW in 2017, Sobel explains the problem as a “fair use dilemma”. His argument goes like this:

(i) If Machine Learning doesn’t come under fair use, then organisations have to pay remedies to millions who form the training data on which machines learn. This will hinder any progress in the field.

(ii) But, if it does come under fair use, it is likely that organisations will take liberty in using the intellectual labour of people for their own profit.

Therefore, it will not be a stretch to say that the legal aspect of AI use is in difficult terrain. If there is a case for Butterick to take the makers of Copilot to court, the outcome of the lawsuit will have a huge impact on the future of open-source communities and AI generation models.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

3 Ways to Join our Community

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

MOST POPULAR

From Silicon Valley to New Delhi: Can Rajeev Bring Tech Glory to India?

His background as a techie and his exposure to Silicon Valley played a crucial role in his appointment to the IT Ministry today

Much Awaited Breakthrough in LLMs Has Finally Arrived

The reign of DIY chatbots is set to begin, thanks to Hugging Face.

Is GPT-4 Powered Bing Chat Still Crazy?

Bing Chat has finally been confirmed to use GPT-4. Is it fixed now?

Why is NewsGPT Bad News for Journalism?

These models are significantly low on accuracy, hence relying on them for something as significant as news could lead to catastrophic consequences

Top 6 Use Cases of GPT-4

The much-awaited LLM is here with the promise of more accurate and safer responses

GPT-4 Predictions: Hits and Misses

With OpenAI’s official GPT-4 launch, predictions went haywire.

Indian Startups’ American Dream Turns into Nightmare

Y Combinator, the prestigious startup accelerator, has come under fire as Indian startups backed by the accelerator are facing trouble after the collapse of Silicon Valley Bank.

Fraud of the Rings: Can Amazon be Trusted with Your Data?

A ransomware gang claims to have breached Amazon-owned smart security camera company, Ring, and is threatening to release its data

Council Post: Evolution of Data Science: Skillset, Toolset, and Mindset

In my opinion, there will be considerable disorder and disarray in the near future concerning the emerging fields of data and analytics. The proliferation of platforms such as ChatGPT or Bard has generated a lot of buzz. While some users are enthusiastic about the potential benefits of generative AI and its extensive use in business and daily life, others have raised concerns regarding the accuracy, ethics, and related issues.

Why GPT4 Might Disappoint You

Even after the announcement yesterday, Sam Altman was eager to admit how much of a perfect model GPT4 wasn’t

GitHub Copilot: The Latest in the List of AI Generative Models Facing Copyright Allegations

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy

Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Download our Mobile App

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

MOST POPULAR

From Silicon Valley to New Delhi: Can Rajeev Bring Tech Glory to India?

Much Awaited Breakthrough in LLMs Has Finally Arrived

Is GPT-4 Powered Bing Chat Still Crazy?

Why is NewsGPT Bad News for Journalism?

Top 6 Use Cases of GPT-4

GPT-4 Predictions: Hits and Misses

Indian Startups’ American Dream Turns into Nightmare

Fraud of the Rings: Can Amazon be Trusted with Your Data?

Council Post: Evolution of Data Science: Skillset, Toolset, and Mindset

Why GPT4 Might Disappoint You

Our mission is to bring about better-informed and more conscious decisions about technology through authoritative, influential, and trustworthy journalism.

Shape The Future of Tech

Who we are

OUR brands

OUR Services

our conferences

AWARDS

FOR DEVELOPERS