Long, IMV balanced, article published in Science. Maybe ChatGPT can summarise it… This is the prepub version so freely available from one of the authors. Touches on many issues including use of creators’ work.
TBH, I found it a bit vapid. Given that it is to be published in “Science”, I would expect it to be rather more oriented towards reasoning models in science.
The one sentence I found telling was, “There is no reason, in principle, why an artificial system could not do so at some point in the future”, referring to artificial systems searching for truth. The implication being that they do not do so at the moment.
I guess that’s fair. I was just happy to read something that wasn’t either superintelligence is coming next week or this is a Valley scam
I was at a Café Scientifique session on Tuesday night. While many are interesting, they tend to be more technological than scientific. This one was on integrity in research (which was more sociological than scientific).
In the questions, there was one on AI, which effectively asked how one could ensure integrity when AI was being used. The answer was that nobody has effectively got to grips with this as yet.
(There was a second question, how can you guarantee research integrity when you have a president who has called climate science,“a hoax”, and a secretary of health who still believes that the MMR vaccine causes autism)
I think there’s more to the authors’ argument, tbh, some of which you point at in your latest comment. Farrell has summarised and extended the argument in a thread on Bluesky:
This is a screenshot of one of the key points:
I don’t know whether it is completely apparent, but I come at this from articles reading how wonderful AI will be for science. I remain sceptical, for a number of reasons:
- My understanding is that reasoning models use Bayesian inference over a large solution space, that they sample statistically. However, this begs the question in that it assumes the solution space does actually contain a solution.
- If a solution does not exist in the solution space, then we get the phenomenon of hallucinations because, as the author says, these systems “have no conception of truth and falsity”. This is obviously less than ideal if you are searching for a hypothesis to explain a particular set of phenomena.
- One problem in the philosophy of science is that of the underdetermination of theories by data. If the solution space contains multiple hypotheses that simply save the appearances (σῴζειν τὰ φαινόμενα), then this is no better than the original Greek theories of astronomy, which simply provided a method of calculation, but no correspondence with reality.
- There is an article by Ernan McMullin called “The Virtues of a Good Theory” (Available in this book, but not online). In it McMullin details the properties a good theory should have. Now some of them (explanatory power, empirical fit, parsimony) one might be able to devise likelihoods for, though I doubt that they are there at the moment. However, there are others that might be more difficult (consonance with other theories for example), and the paper details some metaphysical principles for which I can’t see how you would produce a likelhood function. FInally, there properties that i will call “guards” (such as effects not preceding causes, or hypotheses that are contrary to other, well established laws and theories, such as the second law of thermodynamics), where calculating likelihoods would be difficult.
All in all, scepticism (a prime scientific virtue) in the face of hype seems the best way forward.
Thanks. First, the philosophy and the science are mostly beyond me. My understanding doesn’t go beyond the odd popsci book, blog or podcast, or my own mental capacity.
I think I can make sense of your argument.
Roughly, I suppose the argument from Farrell is that these models are tools that can trawl existing data and perhaps resurface otherwise buried correlations or patterns, and perhaps then suggest probabilities for potential causal relationships.
But basically at this point I’m the proverbial chimp at the typewriter trying to write Shakespeare so should probably stop exposing my ignorance.
Sorry to come at you with some more philosophy. This is essentially the model that Francis Bacon suggested in his Novum Organon. Trawling through data without any preconceptions, looking for “buried correlations or patterns”.
Needless to say, that isn’t the way science works (if it ever did).
One of it’s problems is that all observations are theory-laden. Take what we do, photographing stuff and presenting our results. However, the idea that what we present bears some relation to reality is based on subsidiary theories, how optics and sensors work for example.
There was a famous debate at the Royal Society, where Robert Boyle attempted to show that, contrary to Aristotle, a vacuum is possible. He said that he had got some good results using his air-pump, when it was working correctly. To which Thomas Hobbes asked the question, “How do you know that it is working correctly”.
Once again, if one is searching a large hypothesis space, how do you know that the hypotheses you come up with are consonant with the subsidiary hypotheses that the major hypothesis depends on?
Sure. Hence a tool, rather than an agent.
The authors are correct, and this is exactly a part of the problem. Most people are misled by the abilities of current systems by believing they are reasoning, being critical, and giving relatively reliable information.
The current systems are based on LM/LLM algorithms, and thus are only parroting what they’ve learned that look the most similar to the past sequence of words (not actual words, but that’s the idea). It’s a simple stimulus - response system that finds the most probable next “word” in a sentence, quite literally. They don’t “think” in the sense they don’t even iterate nor verify the answer. They have no notion of truth or critical thinking.
People being thus misled apply that “AI” technology to tasks like creative processes and problem solving. Typical examples are “generative” AI - a misnomer - to create images or produce programming source code.
There can’t be any creative process since those algorithms reproduce the best average resemblance to their training data, with the possibility of hallucinations when there wasn’t anything in the training close enough to the input. At best, it will be average, and likely decreasing in quality once they start to train on their own output. And there can’t be problem solving, either, at least past the basic pattern recognition, since there is no iterative and critical thinking.
It’s not to say it’s useless. They are good language models, so that’s an area in which they can bring some progress. Neural nets can also be used to classify, but there’s nothing new here since it was already used for that in the 90s.
The proponents could argue that the systems are doing what Thomas Kuhn called “normal” or “puzzle” science, namely working within a particular paradigm, assumed to be correct until it is shown otherwise.
But inference to the average isn’t inference to the best explanation. I would also ask that, given the size of the solution spaces, do they even get to an average, rather than some local maximum?
Bit tangential but Vox Media’s Good Robot series on its Future Perfect podcast is very good. Just listened to second episode on actual harms from bad use of ai and going back to listen to first episode.
C’mon, WTAF
Anything that they can get their hands on seems to be the modus operandi of AI companies.
Sites are reporting that the majority of their hits are from AI crawlers - AI crawlers haven't learned to play nice with websites • The Register
It’s just that scraping the web for stuff people have posted vs scraping an illegal website that has stolen people’s work, including academic journal articles with unique information, seems another order of criminal. Meta specifically looked into seeking them from the owners but decided it was “too expensive” and too much hassle, according to the report.
Ah, so doing in actuality, what Emo Philip said as a joke:
“When I was a kid I used to pray every night for a new bicycle. Then I realised that the Lord doesn’t work that way so I stole one and asked Him to forgive me.”
I agree with all this, but even if we just look at it from a practical/procedural approach, LLMs can generate scientific papers at a rate reviewers will not be able to keep up with. Yes, if you invest enough effort you can separate the wheat from the chaff, but there is only so much effort to go around so AI-generated nonsense will slip through because it can be made to look very, very convincing and the underlying process assumes basic honesty from the submitters.
We are grossly underprepared for this kind of misconduct. (And, TBH, all other kinds of scientific misconduct, but now we have a scale issue).
Well, it is nice and sunny here, but now I am depressed. I agree with you, and the person who made the same point at the meeting I went to.
It may be that we need to use LLM on papers to ensure that they haven’t been written by LLM
Another thing I have come across is the decay of LLMs if they use training material from LLMs rather than the original material that they are currently using.
No doubt. Even at human-speed generation, plenty of nonsense slipped through.