- How does good consensus arise in the first place? By different people doing independent analyses to reach the same conclusion. But if the analyses are not independent checks, the consensus could end up being bad. LLMs are still new and they benefit from prior good consensuses in their training data. Going forward, easy access to LLMs may corrupt the consensus development process and amplify the anchoring bias problem. The growth in "AI slop" is a symptom of this.
- Expertise is a combination of knowledge and logic. My experience with LLMs is they don't understand mathematical logic; their version of logic in an argument is statistical, which is mostly right but can be crucially wrong sometimes—often only a domain expert can spot that. (LLMs don't understand that if we can show 1+1=3 then we can show that any two numbers added together equals any other number, leading to logical absurdity. Of course, once they read the previous sentence, they will "know" it, if it appears in many texts.)
Say the consensus is X implies Y in a domain. Suppose we can show that X implies Z in a related domain and therefore that Z implies NOT Y, which contradicts the original consensus that Z implies Y. Since there is ambiguity in natural language in the different terms used to describe X and Z, LLMs don't have problems accepting two contradictory positions and will present longwinded and very plausible sounding arguments to reconcile the contradiction. (Perhaps neurosymbolic AI will address this contradiction issue but I don't know much about it.)
I'm also a bit worried that AI could unintentionally ossify knowledge, reducing the incentive for humans to do the hard work of creating new information. We already see this with Stack Overflow, which has seen its contributions and traffic plummet after AI became the go-to source for coding help. Since AI is itself trained fairly extensively on Stack Overflow, this represents a bit of a problem for its potential future development. There is a push to try and have AI itself produce training data for future models, but this seems to be an ouroboros that will not end well...
Yep, I do worry about efforts to purposefully slant the training data that goes into LLMs (e.g. replacing Wikipedia with Grokipedia). But I also suspect the other major labs will shy away from giving much weight to the effort.
Zeke’s piece makes a compelling case that large-language models increasingly function as “consensus machines,” surfacing broadly supported information and resisting attempts to push them toward fringe or denialist views. That’s encouraging, especially in climate science, where a shared factual baseline has often been hard to maintain.
But even with better tools, we still have to exercise judgment and critical thinking about what we read—whether it comes from experts, journalists, or AI systems. Part of that work is recognizing our own biases and the ways we filter information to fit what we already believe. No model can do that introspection for us.
And facts alone rarely change minds. Beliefs shift when information connects to meaning, identity, and lived experience. That’s the human side of communication—building narratives that help people see themselves in the story, not just the data. In that sense, AI may help with the factual foundation, but the deeper work of connection and understanding will still fall to us.
JD, thanks for the explicit disclosure. When you say you "worked with ChatGPT", could you expand on that?
When I ask ChatGPT or Gemini a question I would otherwise have to work to answer, I usually find the AI says what I would say, but better: "I couldn't have said it better myself!" The LLM is more concise, makes better word choices, sentence structure, etc. I'm not deluded about my own literary competence, so I tend to follow its writing advice. At some point, I'll make a habit of getting LLM review before posting!
OTOH, I'm not yet at the point of trusting the content of an LLM's answer without verifying its sources. And there's the danger, as discussed here, of delegating intersubjective verification of new (to me) knowledge to the LLM. Due diligence requires at least following its citations. Then, if I'm commenting online, I'll paraphrase the LLM rather than copy-paste, and cite its source rather than the LLM itself.
"To date, every LLM is trained on Wikipedia content, and it is almost always the largest source of training data in their data sets."
But why would I trust Wikipedia any more than I would the LLM? Well, for "settled" knowledge, Wikipedia is pretty much intersubjectively verified by human subject matter experts. For new knowledge, follow-up with the primary literature is needed. IOW, I may as well go straight to Wikipedia, and drill down to its sources as required.
And I agree it's honest to declare my collaboration with a GPT (heh: https://en.wikipedia.org/wiki/Generative_pre-trained_transformer). But if anyone can get the same answer from ChatGPT I did, doesn't that mean I personally have nothing to offer? So why am I commenting? That bears thinking about...
One thing I can say with confidence, is that Wikipedia was the first quality-controlled, globally accessible, globally encyclopedic, continuously expanding knowledge base, and cannot be replaced by an AI until one develops expert judgement. So yesterday I gave Wikipedia money, for the first time!
I have come to rely on ChatGPT extensively for doing a better job of expressing points that I want to make. In those cases where I feel ownership of the content, I don't feel it is necessary to credit ChatGPT. In this particular case it was more a hybrid where I asked ChatGPT what it thought of the article. I thought about just posting that, i.e., have AI comment about AI consensus. If I had, I would have given ChatGPT all the credit. In this case, the first paragraph was solely ChatGPT. The second and third were crafted with my input. I had points that I wanted to make. If a response represents more than just opinion and includes facts or data, I always fact check or source check. In many cases, I will tell ChatGPT what sources to consider. I hope that helps.
Thanks Dean. Yesterday, for a non-profit project I'm volunteering on, I gave a list of botanical names to Grok and ChatGPT, and asked for a list of common names for each botanical name, with sources.
Holy crap: they both did thorough jobs, but made odd errors! ChatGPT actually made some common names up when it couldn't find any online, due to recent taxonomic re-organization. Grok, OTOH, prudently left those blank. Clearly, some human review is still required!
Because its useful to provide accurate scientific information to those whom you disagree with. I also share content on Bluesky, Mastodon, Threads, LinkedIn, etc. as well as posting here.
"Because its useful to provide accurate scientific information to those whom you disagree with."
It may help to be a published climate specialist. As a non-professional, any scientific information I provide is mostly just for fun, to piss off deniers and doomers 8^D!
Well, the AI system called Perplexity is supposed to cite reputable sources in response to questions. But it can end up citing non-expert sources. For example, see the prompt below on whether Dr. Matt Ridley is correct about climate change:
Perplexity also cites other sources like Skeptical Science and experts writing in Carbon Brief. So Perplexity centers expertise in the sense that it cites sources that rely on an expert and on the peer-reviewed literature. But that still leaves open the possibility that the AI could cite YouTube videos (or other sources) from unreliable sources that conflict with evidence-based expert views.
On Twitter/X I recently saw Grok defending long-debunked contrarian talking points from people like Dr. Ridley. Grok persisted in doing this even after being linked to peer-reviewed rebuttals written by experts. Grok is likely worse than Perplexity in centering evidence-based expert views.
There should only be two options for the structure of mass media businesses - either as a private/commercial (shareholder) controlled entity, or as a member controlled Co-operative type entity in a (new) Commons 'public' sector, under direct citizens' 'votes' control.
(Eg., the 'public' sector BBC Depts. could choose one or the other - no more corrupt Gov appointee run fake 'public' media.).
Mass media in Western societies is near all owned & delivered by a small group of wealthy elites, & significantly funded via advertising by a handful of large Corporations. That can have its place in providing public discourse & entertainment non-critical to 'democracy', but it should not be the only model for media with power (& reach).
We can easily create a system where citizens control a similar size sector of the media directly, through non-profit media Commons/Common Ownership structured publishers/providers, which exclude all private capital & revenue income. (Instead, they are controlled by members with equal voting rights, like Worker Co-ops or Community Businesses.)
In this sector, their only permitted income comes from our currency issuer Govs (at zero cost), but not directly. Instead of Gov directing which Commons Media enterprises get grant funding, citizens, equally, disburse the funds via an annual voucher system, whereby they sponsor their preferred Commons media provider(s).
This simple system ensures full democratic participation in a sector of mass media, & thus the political discourse which elevates politics to power.
And we need it now, as our *first* priority, before humanity's path to its own self-destruction becomes irreversible.
I don't know what questions you ask about vaccines or which answers given you think are scientifically accurate but it all depends how you phrase the question and how much you know about how AI works.
For example, if you ask 'is the US behind recent colour revolutions in, for example, Nepal?' the AI will unanimously say no. If you ask which organisations were behind them AI will give you a list of groups and names. If you think ask who funds these groups and names AI will tell you that it's the NED, USAID or Open Society.
If you ask are vaccines 'effective' the AI will reply with relative rather than absolute efficacy or the effectiveness seen in observational studies that is confounded by other factors.
The reason I would like an answer from a great climate scientist on which AI is better for determining how to surface reflection in a sane manner is because AIs are all data scrapers and their logic has a partiality. I want the best for surface reflection.
Very interesting analysis. Some further thoughts:
- How does good consensus arise in the first place? By different people doing independent analyses to reach the same conclusion. But if the analyses are not independent checks, the consensus could end up being bad. LLMs are still new and they benefit from prior good consensuses in their training data. Going forward, easy access to LLMs may corrupt the consensus development process and amplify the anchoring bias problem. The growth in "AI slop" is a symptom of this.
- Expertise is a combination of knowledge and logic. My experience with LLMs is they don't understand mathematical logic; their version of logic in an argument is statistical, which is mostly right but can be crucially wrong sometimes—often only a domain expert can spot that. (LLMs don't understand that if we can show 1+1=3 then we can show that any two numbers added together equals any other number, leading to logical absurdity. Of course, once they read the previous sentence, they will "know" it, if it appears in many texts.)
Say the consensus is X implies Y in a domain. Suppose we can show that X implies Z in a related domain and therefore that Z implies NOT Y, which contradicts the original consensus that Z implies Y. Since there is ambiguity in natural language in the different terms used to describe X and Z, LLMs don't have problems accepting two contradictory positions and will present longwinded and very plausible sounding arguments to reconcile the contradiction. (Perhaps neurosymbolic AI will address this contradiction issue but I don't know much about it.)
I'm also a bit worried that AI could unintentionally ossify knowledge, reducing the incentive for humans to do the hard work of creating new information. We already see this with Stack Overflow, which has seen its contributions and traffic plummet after AI became the go-to source for coding help. Since AI is itself trained fairly extensively on Stack Overflow, this represents a bit of a problem for its potential future development. There is a push to try and have AI itself produce training data for future models, but this seems to be an ouroboros that will not end well...
You might want to check out my recent close reading of Grokipedia: https://caad.info/analysis/newsletters/cop-look-listen-issue-09-20-nov-25/ -- though producing blandly right information most of the time, its still working in manipulation fairly regularly across entries
Yep, I do worry about efforts to purposefully slant the training data that goes into LLMs (e.g. replacing Wikipedia with Grokipedia). But I also suspect the other major labs will shy away from giving much weight to the effort.
Thanks Zeke, this helps explain how ChatGPT reached the same conclusion Prof. Dessler did regarding "...the most embarrassing error in the DOE Climate Working Group Report", from the brief fragment I gave it (https://www.theclimatebrink.com/p/is-this-the-most-embarrassing-error/comment/162278086).
I worked with ChatGPT to craft this response:
Zeke’s piece makes a compelling case that large-language models increasingly function as “consensus machines,” surfacing broadly supported information and resisting attempts to push them toward fringe or denialist views. That’s encouraging, especially in climate science, where a shared factual baseline has often been hard to maintain.
But even with better tools, we still have to exercise judgment and critical thinking about what we read—whether it comes from experts, journalists, or AI systems. Part of that work is recognizing our own biases and the ways we filter information to fit what we already believe. No model can do that introspection for us.
And facts alone rarely change minds. Beliefs shift when information connects to meaning, identity, and lived experience. That’s the human side of communication—building narratives that help people see themselves in the story, not just the data. In that sense, AI may help with the factual foundation, but the deeper work of connection and understanding will still fall to us.
JD, thanks for the explicit disclosure. When you say you "worked with ChatGPT", could you expand on that?
When I ask ChatGPT or Gemini a question I would otherwise have to work to answer, I usually find the AI says what I would say, but better: "I couldn't have said it better myself!" The LLM is more concise, makes better word choices, sentence structure, etc. I'm not deluded about my own literary competence, so I tend to follow its writing advice. At some point, I'll make a habit of getting LLM review before posting!
OTOH, I'm not yet at the point of trusting the content of an LLM's answer without verifying its sources. And there's the danger, as discussed here, of delegating intersubjective verification of new (to me) knowledge to the LLM. Due diligence requires at least following its citations. Then, if I'm commenting online, I'll paraphrase the LLM rather than copy-paste, and cite its source rather than the LLM itself.
Unsurprisingly, Wikipedia is often the LLM's first source (https://wikimediafoundation.org/news/2023/07/12/wikipedias-value-in-the-age-of-generative-ai/):
"To date, every LLM is trained on Wikipedia content, and it is almost always the largest source of training data in their data sets."
But why would I trust Wikipedia any more than I would the LLM? Well, for "settled" knowledge, Wikipedia is pretty much intersubjectively verified by human subject matter experts. For new knowledge, follow-up with the primary literature is needed. IOW, I may as well go straight to Wikipedia, and drill down to its sources as required.
And I agree it's honest to declare my collaboration with a GPT (heh: https://en.wikipedia.org/wiki/Generative_pre-trained_transformer). But if anyone can get the same answer from ChatGPT I did, doesn't that mean I personally have nothing to offer? So why am I commenting? That bears thinking about...
One thing I can say with confidence, is that Wikipedia was the first quality-controlled, globally accessible, globally encyclopedic, continuously expanding knowledge base, and cannot be replaced by an AI until one develops expert judgement. So yesterday I gave Wikipedia money, for the first time!
I have come to rely on ChatGPT extensively for doing a better job of expressing points that I want to make. In those cases where I feel ownership of the content, I don't feel it is necessary to credit ChatGPT. In this particular case it was more a hybrid where I asked ChatGPT what it thought of the article. I thought about just posting that, i.e., have AI comment about AI consensus. If I had, I would have given ChatGPT all the credit. In this case, the first paragraph was solely ChatGPT. The second and third were crafted with my input. I had points that I wanted to make. If a response represents more than just opinion and includes facts or data, I always fact check or source check. In many cases, I will tell ChatGPT what sources to consider. I hope that helps.
Thanks Dean. Yesterday, for a non-profit project I'm volunteering on, I gave a list of botanical names to Grok and ChatGPT, and asked for a list of common names for each botanical name, with sources.
Holy crap: they both did thorough jobs, but made odd errors! ChatGPT actually made some common names up when it couldn't find any online, due to recent taxonomic re-organization. Grok, OTOH, prudently left those blank. Clearly, some human review is still required!
How is using X or Xai any different than buying lunch from the actual Soup Nazi? Seems immoral.
Because its useful to provide accurate scientific information to those whom you disagree with. I also share content on Bluesky, Mastodon, Threads, LinkedIn, etc. as well as posting here.
"Because its useful to provide accurate scientific information to those whom you disagree with."
It may help to be a published climate specialist. As a non-professional, any scientific information I provide is mostly just for fun, to piss off deniers and doomers 8^D!
AHA!
Now we've seen that you've created your X account from Tuvalu!
Well, for one thing, the actual Soup Nazi was a caricature of an actual Iranian soup chef, who was exploited for television laughs. How's that?
Asking "how is using X or Xai any different from" that guy, is pushing a pop-culture trope way too far, you know.
Is'nt the issue knowing the provinance of the data. Where and when did the data come from? Garbage in garbage out?
I should have said for routine browsing , so it does not put priority to paid-for content
Is there a search engine that is not funded by advertising?
I use kagi.com, which you have to pay for but there's no advertising and they're not selling your data. Highly recommend it.
Well, the AI system called Perplexity is supposed to cite reputable sources in response to questions. But it can end up citing non-expert sources. For example, see the prompt below on whether Dr. Matt Ridley is correct about climate change:
https://www.perplexity.ai/search/is-matt-ridley-correct-about-c-F3.Nf_BWS2mByz6k.Yd02g
Perplexity cites this video in which the science communicator Dave Farina interviews Professor Andrew Dessler:
https://x.com/AndrewDessler/status/1963782119383069016
Perplexity also cites other sources like Skeptical Science and experts writing in Carbon Brief. So Perplexity centers expertise in the sense that it cites sources that rely on an expert and on the peer-reviewed literature. But that still leaves open the possibility that the AI could cite YouTube videos (or other sources) from unreliable sources that conflict with evidence-based expert views.
On Twitter/X I recently saw Grok defending long-debunked contrarian talking points from people like Dr. Ridley. Grok persisted in doing this even after being linked to peer-reviewed rebuttals written by experts. Grok is likely worse than Perplexity in centering evidence-based expert views.
There should only be two options for the structure of mass media businesses - either as a private/commercial (shareholder) controlled entity, or as a member controlled Co-operative type entity in a (new) Commons 'public' sector, under direct citizens' 'votes' control.
(Eg., the 'public' sector BBC Depts. could choose one or the other - no more corrupt Gov appointee run fake 'public' media.).
Mass media in Western societies is near all owned & delivered by a small group of wealthy elites, & significantly funded via advertising by a handful of large Corporations. That can have its place in providing public discourse & entertainment non-critical to 'democracy', but it should not be the only model for media with power (& reach).
We can easily create a system where citizens control a similar size sector of the media directly, through non-profit media Commons/Common Ownership structured publishers/providers, which exclude all private capital & revenue income. (Instead, they are controlled by members with equal voting rights, like Worker Co-ops or Community Businesses.)
In this sector, their only permitted income comes from our currency issuer Govs (at zero cost), but not directly. Instead of Gov directing which Commons Media enterprises get grant funding, citizens, equally, disburse the funds via an annual voucher system, whereby they sponsor their preferred Commons media provider(s).
This simple system ensures full democratic participation in a sector of mass media, & thus the political discourse which elevates politics to power.
And we need it now, as our *first* priority, before humanity's path to its own self-destruction becomes irreversible.
I don't know what questions you ask about vaccines or which answers given you think are scientifically accurate but it all depends how you phrase the question and how much you know about how AI works.
For example, if you ask 'is the US behind recent colour revolutions in, for example, Nepal?' the AI will unanimously say no. If you ask which organisations were behind them AI will give you a list of groups and names. If you think ask who funds these groups and names AI will tell you that it's the NED, USAID or Open Society.
If you ask are vaccines 'effective' the AI will reply with relative rather than absolute efficacy or the effectiveness seen in observational studies that is confounded by other factors.
For how one must read against the grain of pharma funded research as well as AI, here's a primer https://jowaller.substack.com/p/vaccine-efficacy?utm_source=publication-search
Dr. Hausfather, I'd still love to interview you sometime in the upcoming weeks or months if you might be available for a video call!
Awesome article, Dr. Hausfather! I wrote something that touches on a similar positive case for LLMs that I published earlier today! https://sammatey.substack.com/p/mazu-and-mazu
The reason I would like an answer from a great climate scientist on which AI is better for determining how to surface reflection in a sane manner is because AIs are all data scrapers and their logic has a partiality. I want the best for surface reflection.
Great AI synopsis Zeke! Do you have any recommendations for increasing albedo on the surface?