Done, thanks!
This is a good slide deck to articulate the benefits of AI in ideal contexts! What do you see as the trade offs in this scenario?
In the real world, where āslopā and ārainbowsā exist on a continuum, do you have a principled way of demarcating acceptable from unacceptable AI use in PR submissions?
If a maintainer cannot discern between slop and otherwise, this either means he is bad, or the generated code is good enough to pass his standards and therefore not āslopā
The trustworthiness and track record of the contributor as well as the quality of the artifacts should be the guiding factor.
Just like for non-LLM-assisted contributions, for that matter. The process by which the code is written should not matter (wrt quality concerns, that is).
+1 to that
I largely agree with this assessment of the two extremes one can experience with using an LLM for developing software.
One thing I will add, as a cautionary tale: Co-coding with an LLM is a bit like being a ālead engineerā in a software development team. You can be extremely hands-on, and regularly contribute code yourself and actually review other peopleās code, but you can also be more hands off.
It can be very tempting to hand off more and more tasks and give more and more trust to the Agent (especially as they get better), but then you are starting to gamble that it wonāt hallucinate or break something that might be really hard to fix.
So just remember to be a little bit paranoid. Or a lot paranoid.
Thanks for the reply to the second question! Just in case you missed the first: What do you see are the trade offs to using AI co-editing?
As @thumper pointed out above, there is a risk of quality slips if one starts delegating too much, at least with current models that are still far from perfect. In this respect, though, things can only get better.
Not to be a bummer, but things could get worse if LLm training becomes recursive, and the data begins to degrade, leading to a falloff in model performance.
But we donāt have to think about that right now ![]()
For other tasks, like creative writing, yes.
For tasks like coding, where you can easily implement extrinsic ways for the model to assess its own performance on virtually infinite new problems, this is hardly the case.
The difference is that I am not capable of remembering, verbatim, every piece of code Iāve ever read and so I canāt quote it all back to you without any contributions of my own.
I take abstract lessons away from looking at concrete examples of code and use them in my future work. If I copy outright I can look to see whether Iām allowed to do so first.
This is not what AI does. Iāve heard stories from developers of entire proprietary codebases that were accidentally exposed to an AI and the AI can be persuaded to quote back in their entirety (just by asking for some code that can fulfil their public API contract). And because of how AI is abstracted it often doesnāt know the license terms of the code itās quoting.
I think thereās a bit of a language barrier here, where we see the word ālearningā and think we know what it means, as if AI learning and human learning are the same thing.
I completely agree. Which does mean the maintainer has to look at each and every line of each and every submission!
But you can copy paste it and change variable names and constants. This is how, in my professional life, 99.9% of coding is done (both pre and post LLMs). You branch an existing file and you change as little as possible to implement the necessary functionality. Starting from scratch is very rare and extremely inefficient.
Yes I can do that, and I will do it knowingly and consciously, often from other code in the same codebase. If that code is licensed itās up to me to decide whether to follow the terms of that license or not.
If all the code I use comes from AI, that could still be verbatim copies of licensed code, but the AI probably couldnāt even tell me if it is or how itās licensed. So again, not the same thing at all.
If weāre fine with accepting this future, just know that weāre fine with doing away with software licenses, because according to the law thatās developing around AI, anything that comes out of an LLM will end up being unlicensed. Which ultimately means that AI could just end up being a black-box license obfuscation tool.
My counter argument to this is that code is very local.
If I am working on the darktable codebase, chances are that the model will copy and paste stuff from the codebase, because it has access to it, it is in its context window (which takes precedence over its training data) and itās more related to what I am doing than anything else, both in terms of subject matter and for stylistic adherence.
If you are starting a new project, the chances of the LLM to quote from memory are higher. BUT (big but), great effort is put into training models so that they never recite from memory, unless they are explicitly asked to do so. Coding agents are no exception.
And again, models should NOT be trained with copyrighted material. If they are, itās a blunder.
And by the way⦠come on, all code looks the same if you look at it from a sufficient distance. Itās if/for/while/do/case/switch/ constructs nested into each other with different constants and variable names. And the code that we all write is at best imitative, so I really find all the variants of the ācopyā related counter arguments a bit weak.
Itās not so much the quality of the āslopā, but the amount that can be generated. Even a very bad PR takes time to close, and will keep on polluting the logs and history.
Donāt they always? ![]()
I agree with this but itās a different argument and discussion. This already happened before, on different levels of intensity of course, by certain individuals wanting to get collaborations for their CVs
So the only trade off an organization suffers is unrelated to the adoption of a bot? For, an employee screwing up and halting a production line isnt a problem with production lines but with the employee.
I do think LLMs can generate decent enough code at this point given sufficient direction and parameters. If you outline a problem and you understand the problem youāre trying to solve, it can generate a lot of boilerplate code and save time. If you have a vague idea of what youāre trying to do, it might generate something, but the output is going to be vague as well.
Tools now are much more aware of the code within the project, and takes that context into account. Contrast to a year or two ago, I experienced Copilot clearly giving me someone elseās code. I work in academia so thereās some things that arenāt too uncommon, working with grants and such. It was clearly giving me code that wasnāt related to my project, but was ripped straight from someone elseās code. Now I canāt remember the last time Iāve seen this happen. It might be pulling knowledge from training data using other peoplesā code but itās generating code that is consistent to my own.
That said, I do worry about all sorts of concerns unrelated to code quality: provenance of the training data, licensing issues, long-term affects on the brain, environmental concerns, etc. I try to keep usage minimal and to boilerplate stuff that I donāt need to think about.
The other day I manually typed out a bunch of boilerplate, no Copilot, no autocomplete, no multiline editing or anything like that, just one character at a time boring and mundane code. It was mentally liberating ![]()
My 2 cents on this topic:
I agree that AI is a tool, and it is not good or bad by itself, and it is possible to make good use out of it.
However, capitalism corrupts this tool while seeking more profit.
As mentioned by others in this thread, under the current conditions, it causes horrible environmental harm, it causes malicious men to create images of naked people without their consent, etc. And all that is profitable, so any harm is just ignored by governments and companies.
Those are systemic problems, that can only truly be solved by overcoming capitalism.
But that is not gonna happen any time soon.
I am skeptical if we can mitigate the issue in the individual level by not giving money, nor access/views to AI companies, or any kind of boycott.