I was recently researching two folding bikes. Out of desperation, I asked ChatGPT to compare these bikes. I noticed that it confused multiple models of one manufacturer. I also asked it to say something about their comparative rolling resistance. This it got wrong entirely. The math was just plain wrong. Calling it out on that, it corrected to more wrong math.
Of course it all sounded plausible. I could only catch these errors because I happened to have researched these topics myself. But the problem is, I must assume it will generate similar garbage for topics I donât know well.
For now, I can therefore not trust ChatGPT.
But that assessment seems increasingly rare. Iâve reviewed job applications obviously written by LLMs. Iâve talked to applicants who evidently regurgitated LLMs in the interview. Iâve seen self-assessment forms in yearly job reviews written by LLMs. Iâve seen so. Many. Effing. People. Answer a question with âI donât know, but hereâs what ChatGPT saidâŚâ.
As long as Iâm physically able, Iâll continue thinking for myself, thank you very much.
[Sorry for the tangent. Parts of this happened today. Feel free to delete if itâs too far off base.]
Yes, I think people have to be very careful when contributing AI-generated code. In some cases, it wonât be a concern. A good example, and a case I have use AI for, is converting specifications for a data structure into the actual data structure. Thereâs only one right result, so you arrive at the same code regardless of if you used AI or not. In other cases, AI has enough freedom to potentially replicate code it has seen elsewhere.
I feel like this will just become an argument about what constitutes copying and pasting. If an LLM sees code during training, it is possible that it will commit it to memory and will later regurgitate the code. Iâm 99% sure I saw this happen. This is what I mean by copying. Humans can do it too, but most people donât commit code snippets to memory. Even so, clean-room design exists to avoid potential legal issues.
To investigate AlphaEvolveâs breadth, we applied the system to over 50 open problems in mathematical analysis, geometry, combinatorics and number theory. The systemâs flexibility enabled us to set up most experiments in a matter of hours. In roughly 75% of cases, it rediscovered state-of-the-art solutions, to the best of our knowledge.
And in 20% of cases, AlphaEvolve improved the previously best known solutions, making progress on the corresponding open problems.
From the group that started by making very good Starcraft AIâs to alpha fold and now this. I wonder what they have cooking for the future. Even among all the LLMs, Deepmind is probably the most impressive group out there working on these technologies
Same here, even simple questions to review a formula to check the strength of a bolted connection resulted in the safety margin factor being on the wrong side of the fraction line. You can imagine the results from this mistake âŚ
but thats not the fault of the tool but the fault of the engineer whoâs not able to select proper tools. Thats like taking a monochrome Leica to make astonishing color shots
TBH: My reaction was more like âoh so you wasted more energy to brute force an algorithm by constantly rerunning the LLMâ. Especially after the short snippet from the Deepmind dev who was like âoh yeah then you can come back after a few weeks and see if any of those brute force approaches did net you something faster/betterâ
To be fair, if that 4x4 matrix multiplication improvement is spread throughout the world it might still be a net plus energy wise given that itâs used everywhere. This is not to discredit the argument of wasted energy on AI. I believe a ton of it is wasted for the simplest tasks, but we canât always look at it that way.
I recommend everyone interested to read the rest of the section. You can also read every prompt cloudflare used to do this. The code was heavily reviewed and edited by cloudflare engineers, but it seems to have cut their time cost by a ton.
This library (including the schema documentation) was largely written with the help of Claude, the AI model by Anthropic. Claudeâs output was thoroughly reviewed by Cloudflare engineers with careful attention paid to security and compliance with standards. Many improvements were made on the initial output, mostly again by prompting Claude (and reviewing the results). Check out the commit history to see how Claude was prompted and what code it produced. âNOOOOOOOO!!! You canât just use an LLM to write an auth library!â
âhaha gpus go brrrâ