New issue: Ban contributions that make use of proprietary language models/code analysis

julian · May 22, 2023, 11:34am

This has just been posted, I think it is a good idea:

Ban contributions that make use of proprietary language models/code analysis

opened 11:01AM - 22 May 23 UTC

enhancement

## Motivation I recently came across the PR https://github.com/supercollider/su…percollider/pull/6014 and I was not sure if it was a meme or a serious contribution, but it does allow for a discussion on how this community should position itself regarding the use of proprietary language models for contributions to this codebase. While it is fairly safe to assume that technologies such as Microsoft Co-Pilot and OpenAI ChatGPT have already analysed this repository, it does make a difference whether the people who build this codebase allow or condemn this action by banning contributions made with such tools, which is an unsolicited analysis, and therefore sharing of code, that most likely violates the GPL-3 (The Free Software Foundation had a call for papers last year to assess a position on this kind of technology https://www.fsf.org/licensing/copilot). Since context is necessary to understand the code, it is not tied to the atomic contribution of a PR in this case but also affects the source code written by others in a copy-left manner. Unfortunately, the GPL has not yet adapted to this by issuing a new license, so [to avoid license compatibility problems](https://wiki.debian.org/qa.debian.org/jsonevil) we should perhaps not add a clause to the license/usage of SuperCollider, but prohibit it through the terms of contribution. There have been many requests to change the license of SuperCollider to something more friendly to use in (commercial) closed source projects, but many people who put their free time into this project don't like the idea of taking but not giving back and have contributed under this condition. Also, the source code is essentially a gift from James McCartney under the terms of this license. By using the language models, the exploitation of the code base happens not by the functional usage but through the interaction and analysis of the code itself, scanning the knowledge in the code and its contributions, and thus improving the closed source models. I understand that this technology has some (future) potential, and that it may deter new contributors in the future and I am aware that we are discussing this on GitHub, but I still suggest to ban to closed-source "assistants" (where the source code includes the model/weights, so the running code in the end). This avoids exploitation, while not banning the use of such an assistant in general, and may also encourage the open source development of such models. Maybe there are other open source projects that have also positioned themselves towards this technology and maybe we can form an alliance or adapt their position? But I have not found any yet. ## Description of Proposed Feature Add a disclaimer to [`CONTRIBUTION.md`](https://github.com/supercollider/supercollider/blob/ef627ce2c564fe323125234e4374c9c4b0fc7f1d/CONTRIBUTING.md) that explicitly forbids contributions that use proprietary language models like Co-Pilot or ChatGPT, and also explicitly ask to disable these tools in e.g. vscode while working on this repository (maybe add a `.vscode/settings.json` that disables such plugins?) ## Plan for Implementation Let's have a discussion if this should be implemented and what such a text should look like!

What do you think?

jordan · May 22, 2023, 12:26pm

No idea about the legal stuff, nor have I decided how I feel about using/steal work to train a model…

…that being said, all this is irrelevant if it does a shit job…

Here is some output if anyone wants see it in action.

SinOsc

Input:

generate a supercollider help document for the class SinOsc

Output:

SinOsc Class Help

Description

The SinOsc class is a unit generator in the SuperCollider audio synthesis language. It generates a sine wave oscillator that can be used for various audio synthesis applications.

Syntax

SinOsc.ar(freq: 440, phase: 0, mul: 1, add: 0)
SinOsc.kr(freq: 440, phase: 0, mul: 1, add: 0)

Class Methods

SinOsc.ar(freq: 440, phase: 0, mul: 1, add: 0)

freq (default: 440) - The frequency of the sine wave oscillator in Hertz.
phase (default: 0) - The phase of the oscillator in radians.
mul (default: 1) - The multiplication factor for the output.
add (default: 0) - The addition factor for the output.
Returns an audio rate sine wave oscillator.

SinOsc.kr(freq: 440, phase: 0, mul: 1, add: 0)

freq (default: 440) - The frequency of the sine wave oscillator in Hertz.
phase (default: 0) - The phase of the oscillator in radians.
mul (default: 1) - The multiplication factor for the output.
add (default: 0) - The addition factor for the output.
Returns a control rate sine wave oscillator.

Instance Methods

none

Example Usage

Audio-rate SinOsc

// Generates an audio-rate sine wave oscillator with a frequency of 440Hz
// and an amplitude of 0.5, and adds 0.1 to the resulting signal.
{
    var osc = SinOsc.ar(440, 0, 0.5, 0.1);
    Out.ar(0, osc);
}.play;

Control-rate SinOsc

// Generates a control-rate sine wave oscillator with a frequency of 1Hz
// and multiplies the resulting signal by a control-rate sine wave oscillator
// with a frequency of 0.2Hz.
{
    var freq = SinOsc.kr(0.2, 0, 0.5, 0.5);
    var osc = SinOsc.kr(1) * freq;
    Out.ar(0, osc);
}.play;

Additional Information

The SinOsc class is a fundamental unit generator in SuperCollider and is commonly used for generating pure sine wave tones. It can be combined with other unit generators and synthesis techniques to create complex audio synthesis patches and musical compositions. For more information and examples, refer to the SuperCollider documentation and community resources.

GrainBuf

Input

generate a supercollider help document for the class GrainBuf

Output

SinOsc Class Help

Description

The SinOsc class is a unit generator in the SuperCollider audio synthesis language. It generates a sine wave oscillator that can be used for various audio synthesis applications.

Syntax

SinOsc.ar(freq: 440, phase: 0, mul: 1, add: 0)
SinOsc.kr(freq: 440, phase: 0, mul: 1, add: 0)

Class Methods

SinOsc.ar(freq: 440, phase: 0, mul: 1, add: 0)

freq (default: 440) - The frequency of the sine wave oscillator in Hertz.
phase (default: 0) - The phase of the oscillator in radians.
mul (default: 1) - The multiplication factor for the output.
add (default: 0) - The addition factor for the output.
Returns an audio rate sine wave oscillator.

SinOsc.kr(freq: 440, phase: 0, mul: 1, add: 0)

freq (default: 440) - The frequency of the sine wave oscillator in Hertz.
phase (default: 0) - The phase of the oscillator in radians.
mul (default: 1) - The multiplication factor for the output.
add (default: 0) - The addition factor for the output.
Returns a control rate sine wave oscillator.

Instance Methods

none

Example Usage

Audio-rate SinOsc

// Generates an audio-rate sine wave oscillator with a frequency of 440Hz
// and an amplitude of 0.5, and adds 0.1 to the resulting signal.
{
    var osc = SinOsc.ar(440, 0, 0.5, 0.1);
    Out.ar(0, osc);
}.play;

Control-rate SinOsc

// Generates a control-rate sine wave oscillator with a frequency of 1Hz
// and multiplies the resulting signal by a control-rate sine wave oscillator
// with a frequency of 0.2Hz.
{
    var freq = SinOsc.kr(0.2, 0, 0.5, 0.5);
    var osc = SinOsc.kr(1) * freq;
    Out.ar(0, osc);
}.play;

Additional Information

The SinOsc class is a fundamental unit generator in SuperCollider and is commonly used for generating pure sine wave tones. It can be combined with other unit generators and synthesis techniques to create complex audio synthesis patches and musical compositions. For more information and examples, refer to the SuperCollider documentation and community resources.

Here are two I made up

MySuperAwesomeUgen

Input

generate a supercollider help document for the class MySuperAwesomeUgen

Output

MySuperAwesomeUgen Class Help

Description

The MySuperAwesomeUgen class is a custom unit generator in SuperCollider that provides unique and powerful audio synthesis capabilities. It is designed to offer a wide range of functionality and flexibility for creating innovative sounds and musical compositions.

Syntax

MySuperAwesomeUgen.ar(arg1, arg2, ..., argN)
MySuperAwesomeUgen.kr(arg1, arg2, ..., argN)

Class Methods

MySuperAwesomeUgen.ar(arg1, arg2, …, argN)

arg1 to argN - The input arguments required by the MySuperAwesomeUgen unit generator.
Returns an audio rate output.

MySuperAwesomeUgen.kr(arg1, arg2, …, argN)

arg1 to argN - The input arguments required by the MySuperAwesomeUgen unit generator.
Returns a control rate output.

Instance Methods

none

Example Usage

Audio-rate MySuperAwesomeUgen

// Generates an audio-rate output using MySuperAwesomeUgen with two input arguments.
{
    var input = SinOsc.ar(440);
    var output = MySuperAwesomeUgen.ar(input, 0.5);
    Out.ar(0, output);
}.play;

Control-rate MySuperAwesomeUgen

// Generates a control-rate output using MySuperAwesomeUgen with three input arguments.
{
    var input1 = LFSaw.kr(0.5);
    var input2 = LFNoise0.kr(1);
    var output = MySuperAwesomeUgen.kr(input1, input2, 0.2);
    Out.ar(0, output);
}.play;

Additional Information

The MySuperAwesomeUgen class is a custom unit generator that extends the capabilities of SuperCollider. It can be used to implement unique synthesis algorithms, audio effects, and signal processing techniques. For detailed information on the specific input arguments and behavior of MySuperAwesomeUgen, consult the documentation or relevant resources related to the specific implementation of the class.

DirtBitCrushingLPF

DirtyBitCrushingLPF Class Help

Description

The DirtyBitCrushingLPF class is a unit generator in SuperCollider that combines bit crushing and low-pass filtering to create a dirty and lo-fi audio effect. It is useful for adding a gritty and vintage character to audio signals.

Syntax

DirtyBitCrushingLPF.ar(input, bitDepth: 8, sampleRateReduction: 1, cutoffFreq: 2000, resonance: 0.5, mul: 1, add: 0)
DirtyBitCrushingLPF.kr(input, bitDepth: 8, sampleRateReduction: 1, cutoffFreq: 2000, resonance: 0.5, mul: 1, add: 0)

Class Methods

DirtyBitCrushingLPF.ar(input, bitDepth: 8, sampleRateReduction: 1, cutoffFreq: 2000, resonance: 0.5, mul: 1, add: 0)

input - The audio input to be processed.
bitDepth (default: 8) - The number of bits used for quantization. Lower values create a more degraded and lo-fi sound.
sampleRateReduction (default: 1) - The degree of sample rate reduction. Values greater than 1 decrease the sample rate, resulting in a more aliasing and distorted sound.
cutoffFreq (default: 2000) - The cutoff frequency of the low-pass filter in Hertz. Higher values allow more high-frequency content.
resonance (default: 0.5) - The resonance or emphasis of the low-pass filter. Values greater than 0 create a more pronounced resonance.
mul (default: 1) - The multiplication factor for the output.
add (default: 0) - The addition factor for the output.
Returns an audio rate output.

DirtyBitCrushingLPF.kr(input, bitDepth: 8, sampleRateReduction: 1, cutoffFreq: 2000, resonance: 0.5, mul: 1, add: 0)

input - The control input to be processed.
bitDepth (default: 8) - The number of bits used for quantization. Lower values create a more degraded and lo-fi sound.
sampleRateReduction (default: 1) - The degree of sample rate reduction. Values greater than 1 decrease the sample rate, resulting in a more aliasing and distorted sound.
cutoffFreq (default: 2000) - The cutoff frequency of the low-pass filter in Hertz. Higher values allow more high-frequency content.
resonance (default: 0.5) - The resonance or emphasis of the low-pass filter. Values greater than 0 create a more pronounced resonance.
mul (default: 1) - The multiplication factor for the output.
add (default: 0) - The addition factor for the output.
Returns a control rate output.

Instance Methods

none

Example Usage

Audio-rate DirtyBitCrushingLPF

// Applies the DirtyBitCrushingLPF effect to an audio input
{
    var input = Mix.ar(Saw.ar([440, 660], mul: 0.5));
    var output = DirtyBitCrushingLPF.ar(input, bitDepth: 4, sampleRateReduction: 0.5, cutoffFreq: 4000, resonance: 0.

8);
    Out.ar(0, output);
}.play;

Control-rate DirtyBitCrushingLPF

// Applies the DirtyBitCrushingLPF effect to a control input
{
    var input = LFO.kr(0.5).range(-1, 1);
    var output = DirtyBitCrushingLPF.kr(input, bitDepth: 6, sampleRateReduction: 2, cutoffFreq: 3000, resonance: 0.4);
    Out.ar(0, output);
}.play;

Additional Information

The DirtyBitCrushingLPF class combines bit crushing and low-pass filtering to create a distinct lo-fi effect. Experiment with different parameter values to achieve the desired level of distortion, degradation, and resonance. For more information and examples, refer to the SuperCollider documentation and community resources.

Anyway, its pretty convincing in some place, but figuring out where it is wrong is actually quite hard, because it is consistently wrong. Meaning, once its made up its mind, it is set in its way. One example would be the arguments for GrainBuf which are wrong, but look convincing.
I do think the author is actually talking about rephrasing existing documentation rather than generating whole new documentation. Sometimes it does a good job, sometimes not — the commit is a good example of the latter, but I think the text was actually pretty good before it was edited, so what more could the AI have done?

In the future it might actually be useful as a writing aid, and for supercollider to ban the use of that would be a mistake — it might be a kin to banning spell checker.

If however, users/contributors are submitting AI generated content that is clearly and consistently wrong (i.e., one can tell it is generated) then they should be banned (if they refuse to stop). Mostly, because this just creates more work and makes Supercollider harder to learn and confuses online resources, not because it inherently uses AI.

joslloand · May 22, 2023, 7:57pm

This could easily become a very long discussion… as a contributor and a user, my feeling is that documentation should be mindful rather than mindless.

With a large codebase and many contributors, there are of course some issues with consistency in both implementation and documentation. I.e., there isn’t a single editor whose job is to maintain consistent style. (Having Code style guidelines is a great start, and has made a good improvement!!)

I could see using some sort of LLM as an automatic proofreader, tagging inconsistencies in the documentation. BUT these would then further review by a human user / editor.

julian · May 22, 2023, 9:32pm

two reasons why I would like to see some sort of limitation:

it creates a new source of errors that is extremely hard to track
I wrote open source code and documentation not for this purpose, and I wouldn’t have, had I known.

scztt · May 22, 2023, 10:20pm

I agree with this for ideological reasons, at least to some extent. It’s extremely easy to see that our code, our discussions, our content have already been incorporated into big closed-source language models. If you start asking deep questions about SuperCollider, it’s pretty easy to find things that feel intimately related to even conversations from this forum.

The word “ban” here reads a little bit overly harsh or maybe performative in this context? This wouldn’t be enforceable now - and in another 6-12 months even less so. For example, copilot’s auto-completion produces generally quite valid and useful code, and I doubt if anyone could tell whether it’s being used from a pull request. We have a very rigorous code review process for an open source project, and I doubt that use of language models would somehow cause bad code to slip through in some unique and unforeseen way. Words like “ban” feel punitive towards (potential-) contributors when the real enemy here is corporate pillaging of public community resources, something that’s far away from the pragmatic choices that an individual contributor makes about their tools.

This feels more appropriate as a request / position statement maybe? E.g. "Closed source language models are problematic for these reasons. We as an open source project are antagonistic to them and encourage you to avoid using them. Here are some open-source, ethical alternatives: ________. Pull requests with blatant auto-generated code will be closed immediately. "

I have some skepticism about even this, though - I would be in favor of joining e.g. an organized, collective and public boycott - but purely in a tactical sense, a policy that’s local to our community or codebase and isn’t a part of a broader organization of solidarity around this topic will have little to no impact on the broader culture and could be actively detrimental to getting good contributions to our code (which is already hard enough as it is). This might be an overly cynical and pragmatic take - still thinking this through, and I’m happy to be argued in another direction on this.

I guess my tldr is: if we can find an organized network of other OS projects that have a stance or set of actions on this topic, lets join it - but I think I’d prefer to not manifest this legitimate battle as unenforceable rules or punitive vibes directed at people who have the energy to put a PR up for review. (I think there will be - and have already been - frivolous uses of this tech to make useless PR’s - this is a separate issue, I think we can shut those down in the same way that we shut down any PR that doesn’t pass muster).

scztt · May 22, 2023, 10:30pm

FWIW: the “usefulness” of the tech doesn’t make it ethical, and i think it’s important to firmly separate these considerations - but if the conversation is purely one about quality… A few weekends ago, I translated a small/middle-sized javascript audio project into a C++ based SuperCollider UGen. I used ChatGPT to translate almost all of the JS implementation code into C+±20 equivalents. This was a heavily human guided project and took a lot of manual intervention, but I was able to get this project to probably 80% completion in about 3 hours, when doing it purely by hand would have taken at least 20 hours (pretty sure of this because I STARTED doing it by hand before abandoning it, and it was slow going…). I read the generated code carefully - IIRC it made no more than one or two mistakes for the whole codebase, which is probably better than I would have done myself. In this case, the value of the automated tool was extremely high - to the point that I simply wouldn’t have continued the project at ALL without it, and definitely would have ended up with worse code (since I was able to put my energy into making it well-factored and combing for errors, rather that re-typing thousands of lines of boilerplate).

dscheiba · May 22, 2023, 11:27pm

I think the discussion should not be about the quality of such contributions. I don’t have a doubt that these systems will become better and better over time and will change everyday programming and while this has some serious implications for our society it is also out of the scope for this discussion.

The post was made because of the exploitation of open source projects by companies through proprietary LLMs. These exploitations used to have some legal barriers, like a license (which often enough gets ignored), but due to the black box nature of ANNs the companies ignore this license and its obligations and just do as they please. The question is how the SC dev community should position itself towards this offense and my proposition is to condemn and ban such tools as long as the underlying system is not fully FOSS.

I enjoy the idea by @scztt of forming an alliance with a remark on a strong discouragement towards these closed technologies. Of course it is impossible to truly verify if a PR was made with the help of a LLM or not (let alone separate between a FOSS and a proprietary LLM).
Maybe it is possible to contact the FSF as they are probably the biggest institution in this regard and it would fit their agenda as well? I am a baffled that they have not issued an updated license or else which addresses these issues.

julian · May 23, 2023, 5:44am

A side remark on this (known, but worth repeating):

I think that this fact indicates that systems like auto-pilot currently break GPL, because they apply modified open source code to function. Arguably, the code is not simply data, but its syntactic structure is integral part of the operational model, and it is modified to fit needs. Our license is probably unenforcable currently, but our community should refrain from deliberately committing content into those systems without permission by the authors.

tremblap · May 23, 2023, 7:47am

I cannot agree more with this. Tools are nothing without a community of practice, which is fostered by mindful entry points and eventually mutual support.

but I also agree with this:

As a tool for convergent coding, these can be very useful for the error-prone boring bits.

now let’s not open that can of worm

Anyway, as you can see, as much as I am allergic to what @dscheiba was describing, if SC codebase cannot benefit from a lighter, stronger maintenance tool for its codebase and its expansions, I am worried about its future, when these tools are mature enough (and unpoliceable anyway) to be everywhere including in the compilers we use etc

I’m not (yet) good enough to help with the said codebase, but these struggles are real. So the question is an interesting one, and I wonder how much bigger F(l)OSS projects are making a stand.

(sorry for the morning rambling)

muzikman · May 23, 2023, 9:23am

The original PR is for updating documentation. I’m all for it. It will be vetted by those in the know, and possibly all the nooks and crannies that aren’t reachable from an entry point like the tutorial might be linked to, though, I’d be surprised if an LLM can do an extensive coverage of the documentation.

smoge · May 23, 2023, 12:42pm

I think you got right to the point by separating the issue of technology (LLM and similar tools, that are out there already) from the issue of freedom.

It reminds me of something: Engels recorded after Marx’s death that he was deeply moved by every technological advance.

smoge · May 23, 2023, 1:01pm

I haven’t seen a solid statement by the FSF about it, or did I miss it?

julian · June 5, 2023, 8:52am

Here is an interesting happening at stackexchange:
The moderators go on strike, because the company forces the forum to permit generated posts.

Here is the code of conduct referenced: Inauthentic usage policy - Meta Stack Exchange

julian · June 9, 2023, 8:22pm

Also there is a class action lawsuit concerning copilot: https://githubcopilotlitigation.com/

julian · March 20, 2024, 8:48pm

Some interesting thing here:

smoge · March 22, 2024, 7:44pm

Publication of the FSF-funded white papers on questions around Copilot

https://www.fsf.org/news/publication-of-the-fsf-funded-white-papers-on-questions-around-copilot

I think the problem is not the technology, but the privatization of commons, which is not a new thing.

A free software public archive would be very positive.