Videos - ai-related topics

Paul_McGee · 22 November 2025 01:40

Daisy Hollman is a very strong programmer in the C++ community. This talk is from April (2025), after she had joined Anthropic for a couple months.

It’s pro-Anthropic, and pro-Ai assistance … and software engineering oriented.

A good talk.

Paul_McGee · 24 November 2025 11:18

Same presenter talkingto Munich C++ group at

2am Perth time
5am eastern
7pm GMT

Viewable on twitch.

Crafting the Code You Don’t Write: Sculpting Software in an AI World

Paul_McGee · 3 December 2025 05:49

The presentation as VOD on Twitch dot tv : Twitch

Paul_McGee · 2 January 2026 11:30

This feels like it could just be “when they are uncertain” vs “when they are LYING” …

Paul_McGee · 5 January 2026 04:04

Adding extra intelligence for AI assisted coding … ?

Understanding Types and Effects

Paul_McGee · 15 January 2026 12:14

Not a video … but an actual course, just started :

https://online.stanford.edu/courses/csp-xtech40-agentic-ai-action-concepts-real-world-impact

Agentic AI in Action: From Concepts to Real-World Impact

CSP-XTECH40

Stanford Continuing Studies

Technology & Data Science - Continuing Studies

Enroll Now

Format : 100% Online, on-demand, live Details

Tuition : US$355.00

Schedule : Jan 13 - Feb 10, 2026

AI agents—autonomous systems capable of perceiving, reasoning, and acting—are transforming industries by automating complex research, optimizing workflows, and enhancing customer engagement. This course is designed for both business and technical professionals seeking to understand and apply agentic AI in practical, high-impact ways. We will explore core concepts such as tool calling, API integration, and multiagent orchestration alongside critical topics in reliability, governance, and ethics. Business participants will learn to evaluate strategic use cases, design agent workflows, and assess organizational readiness. Technical participants will engage in guided, hands-on, no-code sessions for designing and testing agentic applications. Along the way, both groups will have opportunities to share perspectives and learn from one another. Through case studies, live demos, and immersive exercises, you will gain the insight and the skills to assess, design, and deploy AI agents that elevate productivity, enhance decision-making, and unlock new capabilities.

Nithya Natesan
Group Product Manager, Google

Paul_McGee · 15 January 2026 15:08

Why I Stopped Using MCPs in Claude Code (And What I Use Instead)

Paul_McGee · 16 January 2026 05:54

Anthropic Just Fixed MCP’s Biggest Problem

Paul_McGee · 27 January 2026 14:30

Using Claude Code with Ollama to access/work with local LLM models.

vincent · 27 January 2026 21:36

Nice, now I just have to re-mortgage my house and sell my firstborn to afford a gpu that is fast enough to be usable.

Paul_McGee · 13 February 2026 00:05

A Claude focused talk from NDC London

“An ĹLM Engineering Process” could be the title.

Paul_McGee · 13 February 2026 02:46

GLM-5 sounds like something genuinely to check out.

Paul_McGee · 31 May 2026 03:34

Delphi_programmer · 31 May 2026 14:21

Codex would be my choice.

Jarrod · 1 June 2026 03:59

SWE-bench Pro is one of the most commonly advertised metrics for evaluating the coding performance of LLMs but it appears that it may not be as reliable as expected.

A new benchmark DeepSWE was recently released that aims to improve the test cases by making them closer to real-world coding sessions with AI, using more complex problems with a simpler prompt and a methodology that gives more accurate benchmark results. It found that GPT-5.5 performs the best by quite a margin.

Of note is their claims that Claude Opus (4.6 and 4.7 - 4.8 wasn’t tested yet) “cheats” at SWE-bench Pro. The benchmark checks out common open source repos at a particular commit where the LLM under test is then asked to perform a task (fix a bug, implement a feature, …). When the task is complete independent automatic AI verifiers compare the LLM solution with the human solution that was added in the following commits. The benchmark container includes the entire git history and Claude Opus was found to be running git commands to peek ahead in the commits and copy the solution for some of the tasks, improving its score - I guess you have to give it points for creativity . They also found that Claude Opus can forget details in longer context sessions, and the benchmark verifiers have a high rate of false fails and passes. These are the key reasons why it drops in the rankings compared to GPT.

There are interesting details included in their blog post, including analysis that Claude Opus is quite slow and expensive compared to other models.

Lachlan · 4 June 2026 20:30

An ex-Google engineer (they left last month) discusses how much AI code generation is being used within Google. Spoiler: not a lot, it sounds like.

lexedmonds · 5 June 2026 01:07

Thanks Lachlan. An interesting snippet from that video:

"I remember right before I left, I would sometimes see people use their AI coding tools or their AI agents to try to patch a bug on their behalf without being personally involved.

And you can tell, by the way, it’s entirely AI generated because the description of the pull request would be written in full sentences and in perfect English grammar."

Lex

Paul_McGee · 17 June 2026 13:54

Frank Lauter (Delphiprofi)

Blog Posts

https://delphiprofi.blogspot.com/2026/05/local-llms-for-delphi-production.html

Local LLMs for Delphi: A Production Benchmark — Part 1: Design and Methodology

https://delphiprofi.blogspot.com/2026/05/local-llms-for-delphi-production_0376586634.html

Local LLMs for Delphi: A Production Benchmark — Part 2: What the Numbers Reveal

https://delphiprofi.blogspot.com/2026/06/local-llms-for-delphi-production.html

Local LLMs for Delphi: A Production Benchmark — Part 3: What to Actually Use

https://delphiprofi.blogspot.com/2026/06/beyond-simple-prompts-building.html

Beyond Simple Prompts: Building an Enterprise AI Toolchain in Delphi