AI Code Is Like Public Domain Code

GitHub’s CoPilot tool may well be revolutionary, according to Bradley Kuhn. An AI trained by reading a massive and unidentified corpus of code, assumed to mostly be open source and licensed for any use to Github under their terms of use, it is able to watch what you are coding in your IDE and make suggestions on how to autocomplete the code – potentially at length. It is a kind of Clippy for code. It has just had the ultimate validation; Amazon copied it.

Spitfire in Guildhall Square, Southampton (ironically with no space for a co-pilot)
No room for a co-pilot

Sure, quit Github

While that may seem an unalloyed good to many programmers, there is an outbreak of moral panic surrounding it, as evidenced by the recent call to boycott GitHub because of it. Now, I am all in favour of people using distributed tools instead of centralised ones. Git itself is intended as a distributed tool and in a way it’s offensive for GitHub to have annexed its name to create a centralised and proprietary control point.

I am also keen for everyone as far as they can to exercise self-sufficiency over their computing and control of their personal data, and given Git was written as a response to the final abridgement of that self-sovereignty by the author of an earlier tool that the Linux developers were dependent on, Github is again somewhat offensive. Those would both be fine reasons to encourage people to move on from Github and to escape the social honeypot of carefully crafted network effect funnels that it embodies.

… but not because of Copilot

But Copilot is not a great reason to quit, or at least not for the reasons people insist on articulating. Those reasons seem strong on copyleft maximalism and the homeopathic thinking that assumes because there was GPL vapour in the air everything written at the time is infused. They also seem laced with a residual mistrust of Microsoft.

  • Copilot is unlikely to be infringing copyright. Certainly not in the USA. Probably not in most other places (although see Brown for more nuance). Even for humans, learning patterns doesn’t infringe copyrights, and quoting minimal or essential fragments rarely rises to the level needed for protection by copyright. Copyrights are not the same as patents, and re-expressing the same idea does not amount to infringement – even if such infringement were possible for a machine. Which it is not, so all these considerations are moot in many jurisdictions.
  • Copilot is unlikely to be breaching the GPL. That could only happen if copyright was being infringed. Just because the author of a work doesn’t like use of their code by Microsoft’s tool, that doesn’t somehow create an infringement that triggers the license.
  • Copilot in not morally bankrupt for using open source code for training. The whole point of Open Source Free Software is to give anyone the unconditional right to study the code and learn from it. If that’s a via an automated tool that makes the matter more efficient, it makes no difference.

Making a new thing that does the same as my patented widget is always an infringement of my patent, but making a new thing that does the same as my copyrighted code is not. An unfortunate consequence of the propaganda term “Intellectual Property” is that non-specialists munge all the concepts for all of {Copyright, Patents, Trade Secrets, Trademarks, Database rights} into one big hairball and assume anything matching the hairball triggers some form of infringement of any/all of the concepts. So arguments that mix-and-match IP concepts to imply an infringement are … problematic.

You shouldn’t use it for Open Source though

AI code helpers like Copilot are thus very unlikely to infringe rights per se. But that doesn’t mean code made by them should be welcome in Open Source projects.

To summarise a long article, Reda concludes that the output of an AI like Copilot is best understood as Public Domain. But ironically, that’s the real problem with Copilot for an Open Source developer. Public Domain is not Open Source, and AI-generated code introduces friction that works against the Open Source network effect for just the same reasons. As Brown explains, not every jurisdiction has the same degree of certainty or the same attributes to its conclusion about AI-generated works as seems commonly understood in the USA.

So while you may feel comfortable using AI-generated blocks in your code, what will you write in the pull-request to give others the same confidence? Even Github (and indeed Amazon) are at pains to point out that’s your responsibility, not theirs. Their tool may be a very helpful learning aid, but it’s something of a trap for the responsible Open Source contributor.

There’s a different case to be understood in every jurisdiction both about the code origin and the threshold for copyrightability. While the (many) lawyers I have heard from have largely waved a hand and said the arguments would never stand up in court, the arguable cases create a context where a community can’t rely on AI-generated code without further advice. Just like Public Domain, that added friction makes it non-viable for any community serious about provenance.

The biggest challenges are the ones exerting subtle, systemic steering effects that people don’t take seriously. Github may not be a digital scofflaw, but their tool is a Siren tempting you onto rocks that can ruin communities.

(Thanks to the Patreon backers who made this post possible)

A Rights Ratchet Score Card

A draft scorecard for determining if a software project is open as bait for a business pivot or genuinely keeping your freedoms protected.

Open or closed? You decide.

The seven signs a project is following the rights-ratchet route to riches and the framework for going beyond licensing can be augmented by some straightforward indicators of an issue. None of these alone is necessarily a cause for concern, but the more clicks, the more risks. Here’s a rough-and-ready first draft of a scorecard to check whether your software supplier considers you a community peer and will respect and protect your essential freedoms, or visualises you more like one of those pods in The Matrix. Just count the clicks; the more clicks, the higher the risk this is a rights-ratchet that will end up closed.

Continue reading

OSPOs As Community Advocates

Is your Open Source Program Office just part of your corporate defences, or is it the community’s advocate inside your company as well?

Supplies?

As we considered before, corporations are not a person, but are rather a vehicle for the collective expression of the vision of many individuals, showing the outworking of the processes and systems they devise to embody their vision. So the work of an Open Source Program Office (OSPO) needs to address social change within the company and address community needs outside as well as compliance or other corporate self-protection.

Continue reading

A Bit Of A Stretch

The lesson Elastic’s restrictive relicensing teaches is that those using open source to ratchet a software startup will forsake software freedom eventually if they’re aggregating rights. That’s no reason to believe open source needs updating.

Camille Claudel's sculpture "The Mature Age" illustrates the abandonment of principle as the subject matures.
“The Mature Age” — Camille Claudel

Many of the responses to the decision of Elastic to drop open source licensing for their products and instead use restrictive commercial licensing have involved asking whether they were justified to do so based on market conditions (especially the provision of a service by Amazon Web Services) or diving into the minutiae of what they actually did. Some see it as support for the hypothesis that the very definition of “open source” is out of date. But to do so is to swallow the bait of distracting explanation and overlook the actual value of open source and why Elastic — and the cloud databases before them — no longer care about it.

Continue reading

Going With The Grain

If you’re managing community or developer relationships for your employer, a crucial principle is to “go with the grain” of the community — promote and embrace the freedoms it needs and the expectations it cherishes — rather than take actions that result in easily-anticipated opposition.

More at https://devrel.net/community/going-with-the-grain

5 Reasons Facebook’s React License Was A Mistake

Facebook’s BSD+Patent license combo fails not because of the license itself but because it ignores the deeper nature of open source.

Beware Falling Rocks

In July 2017, the Apache Software Foundation effectively banned the license combination Facebook has been applying to all the projects it has been releasing as open source. They are using the 3-clause BSD license (BSD-3), a widely-used OSI-approved non-reciprocal license, combined with a broad, non-reciprocal patent grant but with equally broad termination rules to frustrate aggressors.
Continue reading

Growing The Community

How can you grow an open source community? Two blog posts from The Document Foundation (TDF) illustrate a proven double-ended strategy to sustain an existing community.                                                                                                            Spanish

Fern Fiddlehead

Since it was established in 2010, the LibreOffice project has steadily grown under the guidance of The Document Foundation (TDF) where I’ve been a volunteer — most lately as a member of its Board. Starting from a complex political situation with a legacy codebase suffering extensive technical debt, TDF has been able to cultivate both individual contributors and company-sponsored contributors and move beyond the issues to stability and effectiveness. Continue reading

Assume Good Faith

You feel slighted by a comment on a mailing list, or a forum post has failed to be moderated live. How should you react?

Wolf

A recent exchange on a user forum caught my eye, one that’s typical of many user interactions with open source communities. Someone with a technical question had apparently had the answer they needed and to help others in the same situation had posted a summary of the resolution, complete with sample code. When they came back later, the summary was gone. Continue reading

7 Rules For Engaging Communities On Legal Matters

When you need to discuss a license, a legal document like a CLA or a governance rule with an open source community, what’s the best approach to take?

Squirrel pops up behind log to check things outHaving watched a fair number of people attempting to engage both the Open Source Initiative’s licensing evaluation community and the Apache Software Foundation’s legal affairs committee, here are some hints and tips for succeeding when your turn comes to conduct a discussion over legal terms with an open source community. Continue reading

Engaging Open Source Communities

At FOSDEM 2017, Simon gave a well-attended talk explaining many of the things that could go wrong for a company trying to engage a large open source project over legal or governance issues. Based loosely on a mailing list thread at the Apache Software Foundation, the talk highlighted seven things to avoid and gave ideas on how to do so.

Continue reading