16 Comments
User's avatar
Barloc Bedlam's avatar

I am in this boat. I wanted to be a scientist growing up. I took engineering, computer science classes and eventually got a degree in Earth and Space science education. I am not a software developer but I have played around with vibe coding and now the possibility of me actually creating something that contributes to my mission of Environmental Sustainability Education seems very doable. I have looked into ArcGIS and Ersi and taught myself the basics of Python from my understanding of C++ and Visual Basic. I am teaching an intro JAVA programming class now and AI makes me feel like I can build something bigger than what runs through the compiler and console. I think technology and AI are great TOOLS but that cannot take the place of the problem finder or sharing the findings to create change. You also mentioned that the data centers take a lot of resources, to me that make the sustainability issues more pronounced. We need alternative renewable energy sources powering these new technologies. We must balance the advancements we have sustainably so we do not cause a deficit in the distribution of energy resources. We have so many on one extreme or the other. We need more people that want to work on the integration and balance of technology advancement and environmental sustainability. Agricultural technology is a great example how automation increases efficiency yet most farmers care about their land as it is their livelihood.

Michael J. Murphy's avatar

You write in regards to Deep Gemini:

"But these tools lack full access to the peer-reviewed academic literature (much of which remains behind journal paywalls)."

How much of a problem is this and how fixable is it? I imagine these journals are going to want money before they let you train on the good stuff.

Zeke Hausfather's avatar

It should be pretty solvable if AI companies pay for access to scientific journals (or offer a specialized product to universities / researchers that comes with that access). The current approach of scraping all the publicly available versions (or possibly training with not very legal and out of date archives from scihub) is far from ideal.

ram1500natureluvr's avatar

"AI tools have gotten quite good at cleaning, merging, and analyzing large datasets"

Have you found that their accuracy in this is good enough (aka perfect)? I've tested this over the past few years, but not recently, and have always found that it introduces an enormous amount of hallucinations as the data sets grow in size. Are you having the LLMs output the cleaned data, or are you having the LLMs write scripts to clean it that you then manually execute and check?

Zeke Hausfather's avatar

I'm having LLMs write the scripts and associated tests and check their outputs. In my experience it works well when the problem is addressable by code. Less so when I try and have the LLM itself look through the entries and determine a pattern or classification.

I'd suggest trying with a recent model like Opus 4.6 and seeing how well it does, as things really are improving quickly here.

Gregory Slater's avatar

...but, you have checked (or spot checked) the results yourself, for large data sets, right?

Jeff Suchon's avatar

Glad you are on the AI train Dr Hausfather! It looks like Clarke's magic. The only caveat we programmers have in all s/w is beware of GIGO. Good Data is the golden juice.

Jeff Suchon's avatar

And Good Questions are the Golden throat. Deep Throat?

Jeff Suchon's avatar

With AI, like normal focused inquisition, you gotta get your question precise so it can recursively answered you given your parameters. And, like Zach does, always have solid data to feed it.

Mal Adapted's avatar

[Disclosure: this comment is wholly organic. No AI assistance whatever. -MA]

Thanks Zeke! I emailed your post to a retired scientist friend, who's been "skeptical" (i.e. negative) about the social cost of LLMs. I was too, until I started experimenting with them. I've gotten used to the hallucinations, and always check the human-generated sources it links. As you say, it's clearly improved in the last couple of years:

"Today the tools are much better than they were in 2023. Hallucinations still exist, but they are much less frequent. As someone who has used these tools more than most in the scientific community, I have a good sense of what they work for and what they do no do well today. The tools I primarily use now are Claude Code (Opus 4.6, via my terminal) and the web-app for Gemini (3.1) for projects where integration with my email, Drive, and other parts of the Google ecosystem is helpful."

I'm a retired systems administrator, who did a lot of ad-hoc coding in Perl but never got past procedural Python. I haven't coded anything in eight years, thankfully. I'm still not willing to pay more for *correct* facts, but my experience with Gemini 3 'fast' (i.e. "free") is otherwise similar to yours.

I've been using the free (i.e. no extra charge on top of my other Google services and my user eyeballs) Gemini 3 web app for advice on writing substack comments! I try to be scrupulous about disclosing how much of my comment is AI-assisted, however. I like to draft a reply to a decarbonization obstructionist's comment, then ask Gemini to evaluate both the comment and my reply for motivated cognition.

Heh. Even when I issue the 'less sycophantic' prompt, it tells me I'm a freakin' genius! It's my new best friend 8^D! This way lies madness, or at least Narcissistic Personality Disorder 8^(.

Jeff Suchon's avatar

Find . -print!cpio -pdumv /wherever/overthere

AI has gotten freakingly humanish. Goal for me is surface albedo smarts. Really cool and super useful!

Am used to old canned UNIX stuff when that was the old HAL now. Dang, old generals don't die.. they just fade away..

Gregory Slater's avatar

Thanks for this. Could you give a list of the exact plan, version, and tools of Claude that you are using?

Thanks.

Zeke Hausfather's avatar

I use Claude Code via my terminal, using Opus 4.6 as the model. I use planning mode initially to write a spec, than have it do the analysis once I'm satisfied with the plan.

Michael J. Murphy's avatar

Thx much. How much of what you describe doing could be squeezed out of a basic chatbot (Grok, ChatGPT, etc)? I remain disappointed in how they perform on, say, basic historical research.

Mal Adapted's avatar

Gemini 3 'fast' (i.e. "free") appears to follow links I give it, and almost always returns up-to-date links for searches I request. I sometimes have to ask it just the right questions, however. My advice is not to settle for the first answers it gives you.