AI code contributions

bastibe · May 14, 2025, 8:38pm

I was recently researching two folding bikes. Out of desperation, I asked ChatGPT to compare these bikes. I noticed that it confused multiple models of one manufacturer. I also asked it to say something about their comparative rolling resistance. This it got wrong entirely. The math was just plain wrong. Calling it out on that, it corrected to more wrong math.

Of course it all sounded plausible. I could only catch these errors because I happened to have researched these topics myself. But the problem is, I must assume it will generate similar garbage for topics I don’t know well.

For now, I can therefore not trust ChatGPT.

But that assessment seems increasingly rare. I’ve reviewed job applications obviously written by LLMs. I’ve talked to applicants who evidently regurgitated LLMs in the interview. I’ve seen self-assessment forms in yearly job reviews written by LLMs. I’ve seen so. Many. Effing. People. Answer a question with “I don’t know, but here’s what ChatGPT said…”.

As long as I’m physically able, I’ll continue thinking for myself, thank you very much.

[Sorry for the tangent. Parts of this happened today. Feel free to delete if it’s too far off base.]

darix · May 14, 2025, 10:00pm

Both somewhat related to this:

Vente · May 14, 2025, 10:34pm

I see Matt Parker, I like

Lawrence37 · May 15, 2025, 1:19am

Absolutely.

Yes, I think people have to be very careful when contributing AI-generated code. In some cases, it won’t be a concern. A good example, and a case I have use AI for, is converting specifications for a data structure into the actual data structure. There’s only one right result, so you arrive at the same code regardless of if you used AI or not. In other cases, AI has enough freedom to potentially replicate code it has seen elsewhere.

I feel like this will just become an argument about what constitutes copying and pasting. If an LLM sees code during training, it is possible that it will commit it to memory and will later regurgitate the code. I’m 99% sure I saw this happen. This is what I mean by copying. Humans can do it too, but most people don’t commit code snippets to memory. Even so, clean-room design exists to avoid potential legal issues.

kofa · May 15, 2025, 4:16am

But often copy & paste from StackOverflow…

Lawrence37 · May 15, 2025, 5:36am

… which makes it easy to know which license applies: What is the license for the content I post? - Help Center - Stack Overflow

hatsnp · May 15, 2025, 8:09am

To investigate AlphaEvolve’s breadth, we applied the system to over 50 open problems in mathematical analysis, geometry, combinatorics and number theory. The system’s flexibility enabled us to set up most experiments in a matter of hours. In roughly 75% of cases, it rediscovered state-of-the-art solutions, to the best of our knowledge.

And in 20% of cases, AlphaEvolve improved the previously best known solutions, making progress on the corresponding open problems.

From the group that started by making very good Starcraft AI’s to alpha fold and now this. I wonder what they have cooking for the future. Even among all the LLMs, Deepmind is probably the most impressive group out there working on these technologies

DanielLikesDT · May 15, 2025, 9:21am

Same here, even simple questions to review a formula to check the strength of a bolted connection resulted in the safety margin factor being on the wrong side of the fraction line. You can imagine the results from this mistake …

MStraeten · May 15, 2025, 9:50am

but thats not the fault of the tool but the fault of the engineer who’s not able to select proper tools. Thats like taking a monochrome Leica to make astonishing color shots

bastibe · May 15, 2025, 10:14am

The genius of Google was that they framed “search” as providing five blue links, but left the evaluation of these links up to the user.

AI, however, purports to provide one authoritative answer, and can not use that same cop-put.

darix · May 15, 2025, 10:23am

TBH: My reaction was more like “oh so you wasted more energy to brute force an algorithm by constantly rerunning the LLM”. Especially after the short snippet from the Deepmind dev who was like “oh yeah then you can come back after a few weeks and see if any of those brute force approaches did net you something faster/better”

hatsnp · May 15, 2025, 10:30am

To be fair, if that 4x4 matrix multiplication improvement is spread throughout the world it might still be a net plus energy wise given that it’s used everywhere. This is not to discredit the argument of wasted energy on AI. I believe a ton of it is wasted for the simplest tasks, but we can’t always look at it that way.

hatsnp · June 3, 2025, 7:29am

I recommend everyone interested to read the rest of the section. You can also read every prompt cloudflare used to do this. The code was heavily reviewed and edited by cloudflare engineers, but it seems to have cut their time cost by a ton.

This library (including the schema documentation) was largely written with the help of Claude, the AI model by Anthropic. Claude’s output was thoroughly reviewed by Cloudflare engineers with careful attention paid to security and compliance with standards. Many improvements were made on the initial output, mostly again by prompting Claude (and reviewing the results). Check out the commit history to see how Claude was prompted and what code it produced.
“NOOOOOOOO!!! You can’t just use an LLM to write an auth library!”
“haha gpus go brrr”