Videos - ai-related topics

Daisy Hollman is a very strong programmer in the C++ community. This talk is from April (2025), after she had joined Anthropic for a couple months.

It’s pro-Anthropic, and pro-Ai assistance … and software engineering oriented.

A good talk.

1 Like

Same presenter talkingto Munich C++ group at

  • 2am Perth time
  • 5am eastern
  • 7pm GMT

Viewable on twitch.

Crafting the Code You Don’t Write: Sculpting Software in an AI World

The presentation as VOD on Twitch dot tv : Twitch

This feels like it could just be “when they are uncertain” vs “when they are LYING” …

Adding extra intelligence for AI assisted coding … ?

Understanding Types and Effects

Not a video … but an actual course, just started :

https://online.stanford.edu/courses/csp-xtech40-agentic-ai-action-concepts-real-world-impact


Agentic AI in Action: From Concepts to Real-World Impact

CSP-XTECH40

Stanford Continuing Studies

Technology & Data Science - Continuing Studies

Enroll Now

Format : 100% Online, on-demand, live Details

Tuition : US$355.00

Schedule : Jan 13 - Feb 10, 2026

AI agents—autonomous systems capable of perceiving, reasoning, and acting—are transforming industries by automating complex research, optimizing workflows, and enhancing customer engagement. This course is designed for both business and technical professionals seeking to understand and apply agentic AI in practical, high-impact ways. We will explore core concepts such as tool calling, API integration, and multiagent orchestration alongside critical topics in reliability, governance, and ethics. Business participants will learn to evaluate strategic use cases, design agent workflows, and assess organizational readiness. Technical participants will engage in guided, hands-on, no-code sessions for designing and testing agentic applications. Along the way, both groups will have opportunities to share perspectives and learn from one another. Through case studies, live demos, and immersive exercises, you will gain the insight and the skills to assess, design, and deploy AI agents that elevate productivity, enhance decision-making, and unlock new capabilities.

Nithya Natesan
Group Product Manager, Google

Why I Stopped Using MCPs in Claude Code (And What I Use Instead)

Anthropic Just Fixed MCP’s Biggest Problem

Using Claude Code with Ollama to access/work with local LLM models.

Nice, now I just have to re-mortgage my house and sell my firstborn to afford a gpu that is fast enough to be usable.

4 Likes

A Claude focused talk from NDC London

“An ĹLM Engineering Process” could be the title.

GLM-5 sounds like something genuinely to check out.

Codex would be my choice.

SWE-bench Pro is one of the most commonly advertised metrics for evaluating the coding performance of LLMs but it appears that it may not be as reliable as expected.

A new benchmark DeepSWE was recently released that aims to improve the test cases by making them closer to real-world coding sessions with AI, using more complex problems with a simpler prompt and a methodology that gives more accurate benchmark results. It found that GPT-5.5 performs the best by quite a margin.

Of note is their claims that Claude Opus (4.6 and 4.7 - 4.8 wasn’t tested yet) “cheats” at SWE-bench Pro. The benchmark checks out common open source repos at a particular commit where the LLM under test is then asked to perform a task (fix a bug, implement a feature, …). When the task is complete independent automatic AI verifiers compare the LLM solution with the human solution that was added in the following commits. The benchmark container includes the entire git history and Claude Opus was found to be running git commands to peek ahead in the commits and copy the solution for some of the tasks, improving its score - I guess you have to give it points for creativity :smiley:. They also found that Claude Opus can forget details in longer context sessions, and the benchmark verifiers have a high rate of false fails and passes. These are the key reasons why it drops in the rankings compared to GPT.

There are interesting details included in their blog post, including analysis that Claude Opus is quite slow and expensive compared to other models.

An ex-Google engineer (they left last month) discusses how much AI code generation is being used within Google. Spoiler: not a lot, it sounds like.

1 Like

Thanks Lachlan. An interesting snippet from that video:

"I remember right before I left, I would sometimes see people use their AI coding tools or their AI agents to try to patch a bug on their behalf without being personally involved.

And you can tell, by the way, it’s entirely AI generated because the description of the pull request would be written in full sentences and in perfect English grammar."

Lex

Frank Lauter (Delphiprofi)

Blog Posts

https://delphiprofi.blogspot.com/2026/05/local-llms-for-delphi-production.html

Local LLMs for Delphi: A Production Benchmark — Part 1: Design and Methodology


https://delphiprofi.blogspot.com/2026/05/local-llms-for-delphi-production_0376586634.html

Local LLMs for Delphi: A Production Benchmark — Part 2: What the Numbers Reveal


https://delphiprofi.blogspot.com/2026/06/local-llms-for-delphi-production.html

Local LLMs for Delphi: A Production Benchmark — Part 3: What to Actually Use


https://delphiprofi.blogspot.com/2026/06/beyond-simple-prompts-building.html

Beyond Simple Prompts: Building an Enterprise AI Toolchain in Delphi