AI Code Is Like Public Domain Code

GitHub’s CoPilot tool may well be revolutionary, according to Bradley Kuhn. An AI trained by reading a massive and unidentified corpus of code, assumed to mostly be open source and licensed for any use to Github under their terms of use, it is able to watch what you are coding in your IDE and make suggestions on how to autocomplete the code – potentially at length. It is a kind of Clippy for code. It has just had the ultimate validation; Amazon copied it.

Spitfire in Guildhall Square, Southampton (ironically with no space for a co-pilot)
No room for a co-pilot

Sure, quit Github

While that may seem an unalloyed good to many programmers, there is a good deal of moral panic surrounding it, as evidenced by the recent call to boycott GitHub because of it. Now, I am all in favour of people using distributed tools instead of centralised ones. Git itself is intended as a distributed tool and in a way it’s offensive for GitHub to have annexed its name to create a centralised and proprietary control point.

I am also keen for everyone as far as they can to exercise self-sufficiency over their computing and control of their personal data, and given Git was written as a response to the final abridgement of that self-sovereignty by the author of an earlier tool that the Linux developers were dependent on, Github is again somewhat offensive. Those would both be fine reasons to encourage people to move on from Github and to escape the social honeypot of carefully crafted network effect funnels that it embodies.

… but not because of Copilot

But Copilot is not a great reason to quit, or at least not for the reasons people insist on articulating. Those reasons seem strong on copyleft maximalism and the homeopathic thinking that assumes because there was GPL vapour in the air everything written at the time is infused. They also seem laced with a residual mistrust of Microsoft.

  • Copilot is unlikely to be infringing copyright. Certainly not in the USA. Probably not in most other places (although see Brown for more nuance). Even for humans, learning patterns doesn’t infringe copyrights, and quoting minimal or essential fragments rarely rises to the level needed for protection by copyright. Copyrights are not the same as patents, and re-expressing the same idea does not amount to infringement – even if such infringement were possible for a machine. Which it is not, so all these considerations are moot in many jurisdictions.
  • Copilot is unlikely to be breaching the GPL. That could only happen if copyright was being infringed. Just because the author of a work doesn’t like use of their code by Microsoft’s tool, that doesn’t somehow create an infringement that triggers the license.
  • Copilot in not morally bankrupt for using open source code for training. The whole point of Open Source Free Software is to give anyone the unconditional right to study the code and learn from it. If that’s a via an automated tool that makes the matter more efficient, it makes no difference.

Making a new thing that does the same as my patented widget is always an infringement of my patent, but making a new thing that does the same as my copyrighted code is not. An unfortunate consequence of the propaganda term “Intellectual Property” is that non-specialists munge all the concepts for all of {Copyright, Patents, Trade Secrets, Trademarks, Database rights} into one big hairball and assume anything matching the hairball triggers some form of infringement of any/all of the concepts. So arguments that mix-and-match IP concepts to imply an infringement are … problematic.

You shouldn’t use it for Open Source though

AI code helpers like Copilot are thus very unlikely to infringe rights per se. But that doesn’t mean code made by them should be welcome in Open Source projects.

To summarise a long article, Reda concludes that the output of an AI like Copilot is best understood as Public Domain. But ironically, that’s the real problem with Copilot for an Open Source developer. Public Domain is not Open Source, and AI-generated code introduces friction that works against the Open Source network effect for just the same reasons. As Brown explains, not every jurisdiction has the same degree of certainty or the same attributes to its conclusion about AI-generated works as seems commonly understood in the USA.

So while you may feel comfortable using AI-generated blocks in your code, what will you write in the pull-request to give others the same confidence? Even Github (and indeed Amazon) are at pains to point out that’s your responsibility, not theirs. Their tool may be a very helpful learning aid, but it’s something of a trap for the responsible Open Source contributor.

There’s a different case to be understood in every jurisdiction both about the code origin and the threshold for copyrightability. While the (many) lawyers I have heard from have largely waved a hand and said the arguments would never stand up in court, the arguable cases create a context where a community can’t rely on AI-generated code without further advice. Just like Public Domain, that added friction makes it non-viable for any community serious about provenance.

The biggest challenges are the ones exerting subtle, systemic steering effects that people don’t take seriously. Github may not be a digital scofflaw, but their tool is a Siren tempting you onto rocks that can ruin communities.

(Thanks to the Patreon backers who made this post possible)

An End To API Gaslighting?

The Supreme Court decision in Oracle vs Google ends a decade-long nightmare for open source developers.

Sunlight or gaslight?*

The decision of the US Supreme Court (SCOTUS) to reverse the erroneous conclusion of the US Federal Circuit appeals court (CAFC) that Google’s use of the Java SE API in Android was a copyright infringement comes as a great relief to open source programmers everywhere. Software developers have always assumed that merely including a function prototype in their code does not require copyright permission as it’s just a fact about the implementation.

Continue reading

What Did Sun’s OSPO Do?

Started in 1999 and established as an official corporate function in 2005, Sun’s Open Source Program Office (OSPO) was among the first in the industry and maybe the first to use the name.

As I’ve discussed in earlier posts, corporations are the vehicle for the collective expression of many individuals. However, to the outside world they are a monolith, and are expected to be consistent as well as predictable in their actions.  With the many varied, implicit expectations and explicit obligations that different open source projects have, transforming a company’s reputation into that of a good actor in open source is a complex task.  It’s also a necessary one if you expect other actors to invest their time and work in your project, or to give you influence in steering a project together.

Continue reading

Accommodating Open Source In Standards Processes

Holders of zero-tolerance positions on both sides of the divide need to realise that accommodating open source productively inside standards bodies is both viable and happening now.

A fine balance

You’ll recall that open source and open standards are orthogonal concepts where even the words they share (like “open”) are defined differently. That doesn’t mean they are mutually exclusive, nor that they are bad together – they can be cultivated well in the same garden. There is great value from accommodating the two orthogonal concepts so that neither is invalidated by non-mandatory elements of the other. When they combine, great value is unleashed.

Continue reading

Software Freedom For Business Value

Software freedom is important to as an idea, but it also creates all the value of open source for business and should be jealously guarded by OSPOs.

In talking about open source, I and others routinely use the expression “software freedom” to refer to the set of rights upon which the open source phenomenon is based. It arises as a synonym for “free software”, an unfortunately ambiguous term that leads people hearing it for the first time to conclude all the primary attributes of open source software relate to money — price, cost-of-ownership, license fee and so on.

“Software freedom” puts the focus in the right place — on the essential liberties required to benefit from the software. One problem with this alternative term is we are becoming accustomed to hearing discussions of “freedom” be limited to activist or political contexts, and consequently regard the term “software freedom” with caution. But a focus on software freedom isn’t just for the revolutionaries.

Continue reading

Permission Beyond Licensing

Is that single-company-controlled project actually open source in the sense of delivering software freedoms to you or just about delivering prospective customers to its host company? Here are 7 tests.

I frequently sum up the nature of open source licensing as granting permission in advance to developers or users to use, improve and share the software for any purpose. But the “Permission In Advance” lens has uses beyond just the rights to copyrights and patents granted in an OSI-approved license. 

In my consulting engagements, I use a “thinking tool” to help clients work through their proposals for new open source community activities. Evaluating a project’s licensing, patent, and community management strategy — both to join it and to host it — should begin with the question: “How confident are community members that they have permission in advance to do whatever they need to succeed?” The more reasons for confidence, the larger the community.

Here are some of the questions community members will be asking, perhaps silently, about single-company open source projects and their own agency as a member of the community:

Continue reading

OSPOs As Community Advocates

Is your Open Source Program Office just part of your corporate defences, or is it the community’s advocate inside your company as well?

Supplies?

As we considered before, corporations are not a person, but are rather a vehicle for the collective expression of the vision of many individuals, showing the outworking of the processes and systems they devise to embody their vision. So the work of an Open Source Program Office (OSPO) needs to address social change within the company and address community needs outside as well as compliance or other corporate self-protection.

Continue reading

Corporate Maturity As Stochastic Outcome

Corporate open source maturity may be better evaluated by considering the actions of individuals and small groups statistically rather than evaluating the stated corporate strategy

It’s easy to forget that corporations (and indeed large non-profits) are not a person, but are rather a vehicle for the collective expression of the vision of many individuals, as well as the outworking of the processes and systems they devise to embody their vision. Things happen not because a faceless corporation somehow chooses to act in a certain way from a point in time, but because of the persuasive decisions of actual people, acting within their belief systems and directing the work of others.

Collective Behaviour

Every good – and bad – decision ultimately goes back to an individual somewhere, and corporate effects are in many ways emergent. So fitting a maturity model to a corporation may not be the best way to predict its future outcomes. That is a trailing indicator driven by the behaviours of the individuals within the company.

Continue reading

How Much Interoperability Is Enough?

Interoperability is primarily a matter of the use case, not of the technology. Policymakers considering interoperability mandates need to be watchful for extremes of perfection or compromise, which both offer a game to be exploited by the unscrupulous.

How I Learned To Type

Reviewers of a paper concerning interoperability complained that some sections seemed to imply only 100% functional equivalence would be acceptable, and told us “much smaller percentages are perfectly adequate.” So how much interoperability is enough interoperability? The answer, dear to the hearts of every politician, is “it depends”. 

Continue reading

The Missing Stakeholders

The coming wave of digital regulation may claim to target “Big Tech” but will inevitably end up harming citizen-innovators most because regulators have forgotten to include them in their process.

Tracks in the snow suggest a bird being captured by a cat.
Stakeholder-Citizen Interaction

Here come the regulators. “Big Tech” companies like Facebook and Google definitely deserve some guide-rails, as well as some consequences for the unwanted impacts they have foisted on society along with the desirable ones. Facebook in particular has some deep, serious consequences of its amorality due soon. But so far, pretty much every regulation relating to the digital realm is defective.

Continue reading