AI Code Is Like Public Domain Code

GitHub’s CoPilot tool may well be revolutionary, according to Bradley Kuhn. An AI trained by reading a massive and unidentified corpus of code, assumed to mostly be open source and licensed for any use to Github under their terms of use, it is able to watch what you are coding in your IDE and make suggestions on how to autocomplete the code – potentially at length. It is a kind of Clippy for code. It has just had the ultimate validation; Amazon copied it.

Spitfire in Guildhall Square, Southampton (ironically with no space for a co-pilot)
No room for a co-pilot

Sure, quit Github

While that may seem an unalloyed good to many programmers, there is an outbreak of moral panic surrounding it, as evidenced by the recent call to boycott GitHub because of it. Now, I am all in favour of people using distributed tools instead of centralised ones. Git itself is intended as a distributed tool and in a way it’s offensive for GitHub to have annexed its name to create a centralised and proprietary control point.

I am also keen for everyone as far as they can to exercise self-sufficiency over their computing and control of their personal data, and given Git was written as a response to the final abridgement of that self-sovereignty by the author of an earlier tool that the Linux developers were dependent on, Github is again somewhat offensive. Those would both be fine reasons to encourage people to move on from Github and to escape the social honeypot of carefully crafted network effect funnels that it embodies.

… but not because of Copilot

But Copilot is not a great reason to quit, or at least not for the reasons people insist on articulating. Those reasons seem strong on copyleft maximalism and the homeopathic thinking that assumes because there was GPL vapour in the air everything written at the time is infused. They also seem laced with a residual mistrust of Microsoft.

  • Copilot is unlikely to be infringing copyright. Certainly not in the USA. Probably not in most other places (although see Brown for more nuance). Even for humans, learning patterns doesn’t infringe copyrights, and quoting minimal or essential fragments rarely rises to the level needed for protection by copyright. Copyrights are not the same as patents, and re-expressing the same idea does not amount to infringement – even if such infringement were possible for a machine. Which it is not, so all these considerations are moot in many jurisdictions.
  • Copilot is unlikely to be breaching the GPL. That could only happen if copyright was being infringed. Just because the author of a work doesn’t like use of their code by Microsoft’s tool, that doesn’t somehow create an infringement that triggers the license.
  • Copilot in not morally bankrupt for using open source code for training. The whole point of Open Source Free Software is to give anyone the unconditional right to study the code and learn from it. If that’s a via an automated tool that makes the matter more efficient, it makes no difference.

Making a new thing that does the same as my patented widget is always an infringement of my patent, but making a new thing that does the same as my copyrighted code is not. An unfortunate consequence of the propaganda term “Intellectual Property” is that non-specialists munge all the concepts for all of {Copyright, Patents, Trade Secrets, Trademarks, Database rights} into one big hairball and assume anything matching the hairball triggers some form of infringement of any/all of the concepts. So arguments that mix-and-match IP concepts to imply an infringement are … problematic.

You shouldn’t use it for Open Source though

AI code helpers like Copilot are thus very unlikely to infringe rights per se. But that doesn’t mean code made by them should be welcome in Open Source projects.

To summarise a long article, Reda concludes that the output of an AI like Copilot is best understood as Public Domain. But ironically, that’s the real problem with Copilot for an Open Source developer. Public Domain is not Open Source, and AI-generated code introduces friction that works against the Open Source network effect for just the same reasons. As Brown explains, not every jurisdiction has the same degree of certainty or the same attributes to its conclusion about AI-generated works as seems commonly understood in the USA.

So while you may feel comfortable using AI-generated blocks in your code, what will you write in the pull-request to give others the same confidence? Even Github (and indeed Amazon) are at pains to point out that’s your responsibility, not theirs. Their tool may be a very helpful learning aid, but it’s something of a trap for the responsible Open Source contributor.

There’s a different case to be understood in every jurisdiction both about the code origin and the threshold for copyrightability. While the (many) lawyers I have heard from have largely waved a hand and said the arguments would never stand up in court, the arguable cases create a context where a community can’t rely on AI-generated code without further advice. Just like Public Domain, that added friction makes it non-viable for any community serious about provenance.

The biggest challenges are the ones exerting subtle, systemic steering effects that people don’t take seriously. Github may not be a digital scofflaw, but their tool is a Siren tempting you onto rocks that can ruin communities.

(Thanks to the Patreon backers who made this post possible)

Legally Ignoring The License

Perhaps the biggest current challenge to open source software is companies which ignore open source software licenses. That sounds so “yesterday” from an era of license scanners and compliance scares. But the issue is as relevant today as it was 20 years ago – just not the way you think!

Contributor agreements have been a controversial topic throughout their history. The choices by Elastic (and others before them) to relicense previously open source software under a licensing arrangement that discriminates against certain users threw the use of contributor agreements into sharp relief. But the controversy around them focused too much on the wrong problem. The main problem with a copyright-assigning agreement is not it giving the right to the aggregator to relicense the work (although that is a problem as it enables the end game of a rights ratchet). The main problem is it allows the aggregator, uniquely in the community, to ignore the license altogether.

A Brief History Of Scareware

All Open Source licenses grant unconditional permission in advance to those who comply with their terms to use, improve and share the software in any way and for any purpose. At a stroke, scope for artificially making the (inherently non-rivalrous) software scarce are eliminated. Of course, that’s a serious problem if you’re an entrepreneur whose imagination only extends to directly monetising access to the software.

Right from the start wily entrepreneurs realised that Copyleft licenses scared and confused some people, especially lawyers. So they sold customers the right to replace the open source license with a proprietary one – for some reason something customers’ lawyers found less scary. The pioneer of this approach was probably Sleepycat Software Inc whose BerkeleyDB embeddable database came under a source-available arrangement that left their users in no doubt that they had to make their own private work available to the public. Sleepycat sold a “commercial use” license that didn’t have the same requirement but which also left the user with none of the four freedoms. Selling indulgences had been profitable in the middle ages and it also worked for Sleepycat, all the way to acquisition.

Inspired by that success, many other companies sold indulgences. As the market wised up to the GPL and corporate counsel was no longer scared by it, companies transitioned to using other scareware licenses such as AGPL as well as to using “open core” approaches where the commercially valuable functions were not in the open source code at all. By using the no-charge availability of the software to gain adoption, free adoptors could be converted to paying customers and ultimately to lock-in. Some users of this strategy – notably SugarCRM – were able to ratchet back the freedom over time until they had an old-style proprietary software business.

Controlling The “Community”

However, there was an inconvenience. For much software, gaining adoption meant persuading cautious, picky developers to use the code — hand-waving to the boss was no longer enough. Once they were using the source, developers might well improve it. Inspired by the likes of Apache and Mozilla they then might well share their improvements, thus forming a community to produce better code than you. So it was smart to invite and use their improvements and thus keep control of the community.

But then the presence of these contributions under the GPL would make you subject to the GPL yourself and unable to sell indulgences or ignore the license. The fix to this was to speak to the sense of fairness and desire for an easy life (and pleasure of recognition) and claim it was in everyone’s interests for the core company to own all the copyrights. Apache and the FSF helped things along by socialising the idea of copyright assignments1. All these factors led developers to agree to gift the IP rights to their work to the core company. The name of such a document is a contributor license agreement or CLA.

Once they have a CLA, a company is able to aggregate all the rights to the software as if they own it. This has several consequences for the project:

  • They can sell indulgences, so that some community members are able to ignore the license.
  • They can ignore the license as well, enabling open core models that could otherwise be impossible.
  • They can do secret deals with other companies to treat the code as their own or even sell the complete rights, including to a company that actually wants to end the project. Because they can act secretly they can potentially preempt forks.
  • They can make releases without community consensus, making it impossible for peers to join in.
  • They can change the license by fiat, including to one that harms bona fides contributors they want to disadvantage
  • They can end the public project completely, as SugarCRM did.

Socially Unacceptable

An open source licence is a multi-lateral constitution of a community, setting norms that apply equally to all. Having every developer and user subject to the same terms is one of the pillars of community. A copyright assignment provides unqualified and unappealable immunity to all that. The presence of one in a commercially-backed project is almost certain to mean someone doesn’t want to be subject to the rules and norms everyone else must abide by, usually as part of a rights ratchet. They and their sham freedoms should no longer be tolerated by open source contributors.


Footnote 1: In both cases the CLA is – at best – marginal to the community. At Apache, the CLA is redundant with section 5 of the Apache License which many people believe grants all the rights the community needs. Folklore at Apache says that IBM’s lawyers were not sure of that and just to be certain insisted there be a CLA as well. At the FSF, the CLA is also redundant with GPLv3 (and likely with GPLv2 as well) but it has long argued that the FSF needs to own the copyrights in the USA in order to pursue license compliance — even though they don’t do so much and the surrender of copyright reduces the ability of the actual developers to choose to enforce. Both are frequently cited by abusers as justification for their actions.

An End To API Gaslighting?

The Supreme Court decision in Oracle vs Google ends a decade-long nightmare for open source developers.

Sunlight or gaslight?*

The decision of the US Supreme Court (SCOTUS) to reverse the erroneous conclusion of the US Federal Circuit appeals court (CAFC) that Google’s use of the Java SE API in Android was a copyright infringement comes as a great relief to open source programmers everywhere. Software developers have always assumed that merely including a function prototype in their code does not require copyright permission as it’s just a fact about the implementation.

Continue reading

The Missing Stakeholders

The coming wave of digital regulation may claim to target “Big Tech” but will inevitably end up harming citizen-innovators most because regulators have forgotten to include them in their process.

Tracks in the snow suggest a bird being captured by a cat.
Stakeholder-Citizen Interaction

Here come the regulators. “Big Tech” companies like Facebook and Google definitely deserve some guide-rails, as well as some consequences for the unwanted impacts they have foisted on society along with the desirable ones. Facebook in particular has some deep, serious consequences of its amorality due soon. But so far, pretty much every regulation relating to the digital realm is defective.

Continue reading

Settling Scores With Giants

A UK national newspaper approached me to write an opinion piece about the copyright directive vote today, at short notice. When I asked to be paid for my work, they suddenly decided to just treat it as a “written interview”. Seems wanting to see authors paid is only an issue when they’re not the ones paying. I stopped as soon as I heard, but already had a rough unpolished draft so – here it is.

 

9540449977_abac503d52_o

There’s a vote in Strasbourg this week about copyright rules. The pretext is an update of the original 18 year old copyright directive to make it fit for the Internet age. But there’s something strange going on, fuelled by legions of lobbyists paid with old money. The thinking behind the copyright directive will only #fixcopyright for the dragons of content who want to harm the giants of tech. That “dragons vs giants” conflict will just leave the rest of us smaller creatives trampled, burned –  and unpaid.

Change Due

There’s no doubt that we need to revisit copyright for the digital age and make some adjustments to the rules. When the original directive was negotiated in the 90s the Internet as we know it was in its infancy. But in the midst of all the necessary change, some dark forces have been at work to settle scores with the young upstarts of the Internet age — with no concern for who else gets hurt in the process.

I make part of my living writing and am a published author, so I have the greatest respect for the idea that people who create things should be paid for their work. I’ve also spent my career in the software industry, with a special interest in Open Source Free Software — the community-collaborative approach that’s built the software which today runs the world. Success there depends entirely on copyright law, so you’d expect me to be a massive fan – and I am.

There are many ways to use the rights which creators get to their work. Sometimes simply being paid by other people to enjoy the work is the answer; I am very keen on any publisher paying me to write this for you, for example (narrator: they didn’t)! But there are other approaches.

If I want to collaborate with a group of people to make the computer software that runs my web site, I may prefer to make the use of my copyrighted code freely available to the other people sharing the work with me, and give them the freedom to improve it and share their work with others too. Doing that means I can easily work with the best programmers in all the different companies that use the software. I’ve not “given it away” — I get value from my copyright by sharing their innovation and maintenance of the software.

I might also prefer to use my copyright in my writing (or if I were a musician, in my music) to excite new readers and listeners who wouldn’t pay for what they don’t know. As publisher Tim O’Reilly once said, “The problem for most artists isn’t piracy, it’s obscurity.” I might choose to make my book or music available freely for download to build a fan base that will buy printed versions or attend concerts. People who did that wouldn’t be “pirates” but prospects.

Not just money for the middle man

So here’s the problem with what’s going on with the copyright directive. It doesn’t recognise that any of these alternative approaches to using copyright exist. The background thinking is infused with the views of a world where only corporations care about copyright, citizens only consume it and any other behaviour must therefore be wrong. It lacks any viable understanding of other worlds – such as mine – where copyright is freely licensed to enable valuable returns like developer collaboration, consumer network effects, small artist exposure and new author visibility. Instead its old-world thinking is anxious to preserve the significant funds skimmed off by the middle men of publishing.

It’s thinking that’s not too worried about most of the actual creators – just the big-money ones. They’re keen to preserve a retired superstar’s pension but not too bothered about my income, because only one of those makes the publisher rich.

The Dark Arts

It’s worse than just self preservation though. The proposed effects swing dramatically in the opposite direction. Whoever the dark fingers pulling the strings are, they want to shut down any and all new avenues of creativity.

  • Want to link to a web site? The proposals want you to negotiate a copyright license first (article 11).
  • Want to host writers on your web page? The proposals want to assume everything uploaded is pirated until proven otherwise (article 13).
  • Using open source software on your web site? The proposals can be understood to open you to high compliance and liability costs (article 13).
  • Doing research for a project? Text and data mining may be restricted to formal institutions (article 3).

All these – and many more – are being gleefully pursued in the name of stifling Google and Facebook, but because they arose in minds imbued with the business norms of the industrial revolution, they have no concept that individual citizens might be impacted as well. They accuse people like me of being “paid by GAFA” to oppose them, rather than recognising their lobbying tramples on our work.

The process at the Parliament is only half of the activity – the European Commission also has a set of proposals it wants to see implemented. It’s quite hard to say who exactly is behind each specific friendly-fire-prone measure on both sides. The text in Parliament is emerging in a constant flurry of drafts and amendments that make it impossible for a normal person to keep track. Even the few people working to defang these proposals in Brussels are struggling to find what text will actually appear in front of the Parliament until it’s too late.

But more ominously, there are insiders supporting the copyright extremists in both places. The official Twitter accounts of a number of Commission bodies have been pumping out partisan misinformation throughout the process, and representatives of the Commission have been remarkably dismissive of concerns, preferring to wave them away as the work of the “tech giants” with no regard for the “content dragons” whose paws seem to occupy every puppet that pushes back at me.

It’s Not Over

That’s why I will remain concerned whatever happens in Strasbourg this week, even though the Parliament vote is just a step in a longer process. The next stage after the Parliament decides on its goals will be “trilogues” as the Parliament, Commission and Council meet to harmonise their respective proposals. Given the propensity of insiders to allow their thinking to be dominated by the “content dragons” and to dismiss the concerns of new model pioneers like myself as “just the tech giants trying to derail us” I have great fear that we’ll see a repeat of GDPR here. That showed us measures supposedly protecting European citizens actually inflicting extensive collateral damage on small innovators while hardly inconveniencing the multinational giants who were supposedly in the cross-hairs. When dragons settle scores with giants, it’s the little people who get trampled and burned.

 

[Want to keep me writing even though the big guys expect me to do it for free? Please support me on Patreon!]

The Legislative Disconnect Of The Meshed Society

What is the “meshed society”? It is people, joined together by the Internet, able to interact — to collaborate, to create, to transact and to relate directly with each other — without the need for another person to mediate or authorise. As we discover more and more ways to disintermediate our interactions, society is transformed: from a series of hubs with privileged interconnecting spokes intermediating supply to consumers at their tips, into a constantly shifting meshed “adhocracy” of temporary connections, transactions and relationships of varying length. In the adhocracy, individuals play the roles of user, repurposer, maker, buyer, investor and collaborator in a constantly changing spectrum of combinations.  Continue reading

Article 13 – An Existential Threat

The Electronic Frontier Foundation has published a letter from more than 70 leaders in the emerging meshed society (including me) which criticises Article 13 of the European Union’s proposed new copyright regulations. This Article starts from the assumption that the only role of an individual is to consume copyrighted works and hence deduces that any act of publication on the part of an individual must be infringing the copyrights of a corporation unless proven otherwise. The text doesn’t state things that clearly, but the effect is unmistakable. It’s as if a politician was proposing to ban syringes because addicts use them, without considering that hospitals do too.  Continue reading

One Last Push To Save The API

A group of computer experts – including me – asked a US court to think again about fair use of APIs this month.

Tomás Saraceno artworks at SF MOMA: Stillness in Motion—Cloud Cities

It was an unlucky fact that Oracle’s case against Google over Android started with patents. Their initial case fell apart almost immediately, with almost all the patent claims invalidated. The implausable backstop copyright case Oracle made against Android’s use of language-essential definitions in the Java APIs (and thus against the freedom of developers everywhere) carried on though. The initial patent case meant that the appeal when Oracle soundly lost ended up at the Court of Appeals for the Federal Circuit (CAFC) — the specialist patent appeals court in the USA — and not at a court competent to dispense copyright justice.  Continue reading

Copyright Needs Radical Reform

Use of copyright today far exceeds the ways its framers imagined. We need reform, not just adjustments.

Cow Pulling Lawn Mower In Delhi

Copyright is back in the news in Europe. In the UK, the Digital Economy Bill proposes to increase the maximum prison sentence for online copyright infringement to ten years. Meanwhile, an extensive modernisation of copyright for the EU is also in progress, with a goal of making the treatment of copyright the same across Europe, especially in relation to digital media. Continue reading

DRM Is Toxic To Culture

In pursuit of market control now, deployers of DRM are robbing us of our culture in perpetuity by enclosing the future commons.

Dry Stone Wall

Ancient dry stone enclosure wall in Cornwall, England

It’s possible that you think that unauthorised use of copyrighted music, films and books is such a serious problem that it’s worth giving away a little of your convenience and freedom in exchange for stopping it. If you do, I’d like to suggest you think again – and time is running out.  Continue reading