AI Code Is Like Public Domain Code

GitHub’s CoPilot tool may well be revolutionary, according to Bradley Kuhn. An AI trained by reading a massive and unidentified corpus of code, assumed to mostly be open source and licensed for any use to Github under their terms of use, it is able to watch what you are coding in your IDE and make suggestions on how to autocomplete the code – potentially at length. It is a kind of Clippy for code. It has just had the ultimate validation; Amazon copied it.

Spitfire in Guildhall Square, Southampton (ironically with no space for a co-pilot)
No room for a co-pilot

Sure, quit Github

While that may seem an unalloyed good to many programmers, there is a good deal of moral panic surrounding it, as evidenced by the recent call to boycott GitHub because of it. Now, I am all in favour of people using distributed tools instead of centralised ones. Git itself is intended as a distributed tool and in a way it’s offensive for GitHub to have annexed its name to create a centralised and proprietary control point.

I am also keen for everyone as far as they can to exercise self-sufficiency over their computing and control of their personal data, and given Git was written as a response to the final abridgement of that self-sovereignty by the author of an earlier tool that the Linux developers were dependent on, Github is again somewhat offensive. Those would both be fine reasons to encourage people to move on from Github and to escape the social honeypot of carefully crafted network effect funnels that it embodies.

… but not because of Copilot

But Copilot is not a great reason to quit, or at least not for the reasons people insist on articulating. Those reasons seem strong on copyleft maximalism and the homeopathic thinking that assumes because there was GPL vapour in the air everything written at the time is infused. They also seem laced with a residual mistrust of Microsoft.

  • Copilot is unlikely to be infringing copyright. Certainly not in the USA. Probably not in most other places (although see Brown for more nuance). Even for humans, learning patterns doesn’t infringe copyrights, and quoting minimal or essential fragments rarely rises to the level needed for protection by copyright. Copyrights are not the same as patents, and re-expressing the same idea does not amount to infringement – even if such infringement were possible for a machine. Which it is not, so all these considerations are moot in many jurisdictions.
  • Copilot is unlikely to be breaching the GPL. That could only happen if copyright was being infringed. Just because the author of a work doesn’t like use of their code by Microsoft’s tool, that doesn’t somehow create an infringement that triggers the license.
  • Copilot in not morally bankrupt for using open source code for training. The whole point of Open Source Free Software is to give anyone the unconditional right to study the code and learn from it. If that’s a via an automated tool that makes the matter more efficient, it makes no difference.

Making a new thing that does the same as my patented widget is always an infringement of my patent, but making a new thing that does the same as my copyrighted code is not. An unfortunate consequence of the propaganda term “Intellectual Property” is that non-specialists munge all the concepts for all of {Copyright, Patents, Trade Secrets, Trademarks, Database rights} into one big hairball and assume anything matching the hairball triggers some form of infringement of any/all of the concepts. So arguments that mix-and-match IP concepts to imply an infringement are … problematic.

You shouldn’t use it for Open Source though

AI code helpers like Copilot are thus very unlikely to infringe rights per se. But that doesn’t mean code made by them should be welcome in Open Source projects.

To summarise a long article, Reda concludes that the output of an AI like Copilot is best understood as Public Domain. But ironically, that’s the real problem with Copilot for an Open Source developer. Public Domain is not Open Source, and AI-generated code introduces friction that works against the Open Source network effect for just the same reasons. As Brown explains, not every jurisdiction has the same degree of certainty or the same attributes to its conclusion about AI-generated works as seems commonly understood in the USA.

So while you may feel comfortable using AI-generated blocks in your code, what will you write in the pull-request to give others the same confidence? Even Github (and indeed Amazon) are at pains to point out that’s your responsibility, not theirs. Their tool may be a very helpful learning aid, but it’s something of a trap for the responsible Open Source contributor.

There’s a different case to be understood in every jurisdiction both about the code origin and the threshold for copyrightability. While the (many) lawyers I have heard from have largely waved a hand and said the arguments would never stand up in court, the arguable cases create a context where a community can’t rely on AI-generated code without further advice. Just like Public Domain, that added friction makes it non-viable for any community serious about provenance.

The biggest challenges are the ones exerting subtle, systemic steering effects that people don’t take seriously. Github may not be a digital scofflaw, but their tool is a Siren tempting you onto rocks that can ruin communities.

(Thanks to the Patreon backers who made this post possible)

Don’t Call It Relicensing!

Using open source elsewhere is not relicensing, it’s overlaying a second license.

So you’re considering taking some open source code under a minimal, non-reciprocal OSI-approved license and putting it under a different open source license, hopefully in combination with your original code (or another form of larger project). 

Don’t call this “relicensing” – it is not! The original license will continue to apply and you remain responsible for complying with its requirements. Only the copyright holder can change the license. You’re not relicensing – instead you are using the rights the license has given you and applying an additional license to the combination of the earlier work and your work. 

Continue reading

A Rights Ratchet Score Card

A draft scorecard for determining if a software project is open as bait for a business pivot or genuinely keeping your freedoms protected.

Open or closed? You decide.

The seven signs a project is following the rights-ratchet route to riches and the framework for going beyond licensing can be augmented by some straightforward indicators of an issue. None of these alone is necessarily a cause for concern, but the more clicks, the more risks. Here’s a rough-and-ready first draft of a scorecard to check whether your software supplier considers you a community peer and will respect and protect your essential freedoms, or visualises you more like one of those pods in The Matrix. Just count the clicks; the more clicks, the higher the risk this is a rights-ratchet that will end up closed.

Continue reading

Hybrid PDF: Schrödinger’s Document

Need to send a file most people won’t need to edit? Send a file that’s both editable and final form at the same time, a kind of Schrödinger’s Document – a Hybrid PDF.

One of the scourges of e-mail is file attachments, and particularly those from people sending files made by their newly updated word-processor or presentation programme that half the people receiving it can’t open. While proprietary software vendors love this errant behaviour (it keeps up the pressure for people to re-purchase software they don’t really need so they can read other people’s work – AKA “upgrades” – or to subscribe to an online service that keeps them trapped), it’s really anti-social behaviour.

Continue reading

Chalk and Cheese

How similar are open source development and standards development? Not at all, and even the words they have in common mean different things in each.

It’s a traaap

It is often asserted that open source and open standards are in some way similar. For example, in the accompanying letter to a recent submission to the European Commission, a major European-based technology company that is very active is standardisation said:

Continue reading

All open source licenses are permissive. They give you permission in advance to use the software for any purpose, to improve the software any way you wish and to share the software with whoever you want. They are the opposite of proprietary licenses, which place restrictions on each of these freedoms. Any license with restrictions would not be considered OSD compliant.

All open source licenses include conditions. Some relate to attribution. Some relate to reciprocal licensing. None of them restrict how you can use, improve and share the software, although you must comply with the conditions in order to do so. Some people consider some conditions so onerous they rise to the level of restrictions, but the consensus of the community has been they are wrong.

Today’s licensing games are thus mainly about testing where the accumulated burden of conditions is effectively a restriction – “constructive restriction”. There’s certainly a line where that would become true – for example, where the conditions associated with deploying the software as a cloud service are so hard to comply with that the software is effectively unusable in that field of use.

The OSD doesn’t include much to help with this so it’s contentious every time and sometimes leads to sophistry. This is probably the area where the Open Source Initiative needs to do the most work to modernise the license approval process.

All open source licenses are permissive

FOSS vs FRAND is a collision of worldviews

Of late there have been a number of interventions sponsored by the world’s largest and most profitable tech patent holders to muddy the waters about open source and FRAND licensing of patents in standards by arguing contentious minutiae like the intent of the authors of the BSD license. This is happening because of the clash of industries I wrote about in 2016, with companies fundamentally based on extracting patent royalties unable to imagine any other way of doing business so mistaking the issue of FRAND as being about license compliance rather than as it being an obstacle to the very purpose of open source in commercial software — collaboration with others.

I found an amazing number of experienced and expert colleagues across industries failing to grasp this fundamental, so I’ve written a paper 🗎 about it. Published today by Open Forum Europe, it explains why compliance legalities are the wrong lens for studying the issue and introduces terms for exploring why representatives from different industry background fail to understand each other despite apparently using the same terminology (spoiler: they mean different things by the same words).

Many thanks to the colleagues who have made valuable suggestions that have improved the clarity of the document, and to the various patrons who have contributed to covering my time. Get in touch if you’d like me to come to your event or company and talk about these things.

Should we celebrate the anniversary of open source?

Tomorrow here in Portland at OSCON, OSI will be celebrating 20 years of open source. I’ve had a few comments along the lines of “I’ve was saying ‘open source’ before 1998 so why bother with this 20 year celebration?”

43427372221_5c3afe5d39_h

That’s entirely possible. The phrase is reputed to have been used descriptively about free software — especially under non-copyleft licenses — from at least 1996 when it appeared in a press release. Given its appropriateness there’s a good chance it was in use earlier, although I’ve not found any reliable citations to support that. It was also in use in another field well before then, to describe military or diplomatic intelligence obtained by studying non-classified sources.  Continue reading

Article 13 – An Existential Threat

The Electronic Frontier Foundation has published a letter from more than 70 leaders in the emerging meshed society (including me) which criticises Article 13 of the European Union’s proposed new copyright regulations. This Article starts from the assumption that the only role of an individual is to consume copyrighted works and hence deduces that any act of publication on the part of an individual must be infringing the copyrights of a corporation unless proven otherwise. The text doesn’t state things that clearly, but the effect is unmistakable. It’s as if a politician was proposing to ban syringes because addicts use them, without considering that hospitals do too.  Continue reading

One Last Push To Save The API

A group of computer experts – including me – asked a US court to think again about fair use of APIs this month.

Tomás Saraceno artworks at SF MOMA: Stillness in Motion—Cloud Cities

It was an unlucky fact that Oracle’s case against Google over Android started with patents. Their initial case fell apart almost immediately, with almost all the patent claims invalidated. The implausable backstop copyright case Oracle made against Android’s use of language-essential definitions in the Java APIs (and thus against the freedom of developers everywhere) carried on though. The initial patent case meant that the appeal when Oracle soundly lost ended up at the Court of Appeals for the Federal Circuit (CAFC) — the specialist patent appeals court in the USA — and not at a court competent to dispense copyright justice.  Continue reading