Daisy Hollman is a very strong programmer in the C++ community. This talk is from April (2025), after she had joined Anthropic for a couple months.
It’s pro-Anthropic, and pro-Ai assistance … and software engineering oriented.
A good talk.
Daisy Hollman is a very strong programmer in the C++ community. This talk is from April (2025), after she had joined Anthropic for a couple months.
It’s pro-Anthropic, and pro-Ai assistance … and software engineering oriented.
A good talk.
Same presenter talkingto Munich C++ group at
Viewable on twitch.
Crafting the Code You Don’t Write: Sculpting Software in an AI World
The presentation as VOD on Twitch dot tv : Twitch
Not a video … but an actual course, just started :
https://online.stanford.edu/courses/csp-xtech40-agentic-ai-action-concepts-real-world-impact
CSP-XTECH40

Format : 100% Online, on-demand, live Details
Tuition : US$355.00
Schedule : Jan 13 - Feb 10, 2026
AI agents—autonomous systems capable of perceiving, reasoning, and acting—are transforming industries by automating complex research, optimizing workflows, and enhancing customer engagement. This course is designed for both business and technical professionals seeking to understand and apply agentic AI in practical, high-impact ways. We will explore core concepts such as tool calling, API integration, and multiagent orchestration alongside critical topics in reliability, governance, and ethics. Business participants will learn to evaluate strategic use cases, design agent workflows, and assess organizational readiness. Technical participants will engage in guided, hands-on, no-code sessions for designing and testing agentic applications. Along the way, both groups will have opportunities to share perspectives and learn from one another. Through case studies, live demos, and immersive exercises, you will gain the insight and the skills to assess, design, and deploy AI agents that elevate productivity, enhance decision-making, and unlock new capabilities.
Nithya Natesan
Group Product Manager, Google
Nice, now I just have to re-mortgage my house and sell my firstborn to afford a gpu that is fast enough to be usable.
Codex would be my choice.
SWE-bench Pro is one of the most commonly advertised metrics for evaluating the coding performance of LLMs but it appears that it may not be as reliable as expected.
A new benchmark DeepSWE was recently released that aims to improve the test cases by making them closer to real-world coding sessions with AI, using more complex problems with a simpler prompt and a methodology that gives more accurate benchmark results. It found that GPT-5.5 performs the best by quite a margin.
Of note is their claims that Claude Opus (4.6 and 4.7 - 4.8 wasn’t tested yet) “cheats” at SWE-bench Pro. The benchmark checks out common open source repos at a particular commit where the LLM under test is then asked to perform a task (fix a bug, implement a feature, …). When the task is complete independent automatic AI verifiers compare the LLM solution with the human solution that was added in the following commits. The benchmark container includes the entire git history and Claude Opus was found to be running git commands to peek ahead in the commits and copy the solution for some of the tasks, improving its score - I guess you have to give it points for creativity
. They also found that Claude Opus can forget details in longer context sessions, and the benchmark verifiers have a high rate of false fails and passes. These are the key reasons why it drops in the rankings compared to GPT.
There are interesting details included in their blog post, including analysis that Claude Opus is quite slow and expensive compared to other models.
An ex-Google engineer (they left last month) discusses how much AI code generation is being used within Google. Spoiler: not a lot, it sounds like.
Thanks Lachlan. An interesting snippet from that video:
"I remember right before I left, I would sometimes see people use their AI coding tools or their AI agents to try to patch a bug on their behalf without being personally involved.
And you can tell, by the way, it’s entirely AI generated because the description of the pull request would be written in full sentences and in perfect English grammar."
Lex
Frank Lauter (Delphiprofi)
Blog Posts
https://delphiprofi.blogspot.com/2026/05/local-llms-for-delphi-production.html
Local LLMs for Delphi: A Production Benchmark — Part 1: Design and Methodology
https://delphiprofi.blogspot.com/2026/05/local-llms-for-delphi-production_0376586634.html
Local LLMs for Delphi: A Production Benchmark — Part 2: What the Numbers Reveal
https://delphiprofi.blogspot.com/2026/06/local-llms-for-delphi-production.html
Local LLMs for Delphi: A Production Benchmark — Part 3: What to Actually Use
https://delphiprofi.blogspot.com/2026/06/beyond-simple-prompts-building.html
Beyond Simple Prompts: Building an Enterprise AI Toolchain in Delphi