Harry Stebbings
๐ค SpeakerAppearances Over Time
Podcast Appearances
They build realistic RL environments, next-generation data quality systems built from real-world operational traces, and coding datasets that stress models under conditions where failures matter.
date changes, workflow branching, brittle tool calls, and the coding errors that break RL agents but never appear in benchmark reports.
In reality, a model may demonstrate correct reasoning in your evaluation setup, yet still select the wrong parameter or mishandle a code update in a realistic interface.
Turing makes that failure visible and gives teams the signal they need to fix it.
For labs advancing agentic systems, Turing provides the structure required to understand why these failures occur.
To find out how, visit Turing.com forward slash 20VC.
That's T-U-R-I-N-G dot com forward slash 20VC.
You have now arrived at your destination.
And I am so looking forward to this, dude.
I have stalked the shit out of you for the last three or four days.
I spoke to Bing Gordon.
I had a catch up with Bing before this.
Very nice to speak to him.
So thank you so much for joining me today, dude.
Dude, I'm confused.
Help me out.
I had Demis on the show the other day from DeepMind.
He was like, yeah, I'm not sure if we're seeing scaling laws, but we are definitely seeing slightly diminishing return slash performance as we scale.
So potentially, are we getting to a stage where increased compute is no longer leading to increased performance?
When we look at the bottlenecks around performance and progression today, what are the bottlenecks that really persist most significantly to you?