Running data-driven evaluations of AI engineering tools

Description

AI engineering tools are evolving fast. New coding assistants, debugging agents, and automation platforms emerge every month. Engineering leaders want to take advantage of these innovations while avoiding costly experiments that create more distraction than impact.In this episode of the Engineering Enablement podcast, host Laura Tacho and Abi Noda outline a practical model for evaluating AI tools with data. They explain how to shortlist tools by use case, run trials that mirror real development work, select representative cohorts, and ensure consistent support and enablement. They also highlight why baselines and frameworks like DX’s Core 4 and the AI Measurement Framework are essential for measuring impact.Where to find Laura Tacho: • LinkedIn: https://www.linkedin.com/in/lauratacho/• X: https://x.com/rhein_wein• Website: https://lauratacho.com/• Laura’s course (Measuring Engineering Performance and AI Impact): https://lauratacho.com/developer-productivity-metrics-courseWhere to find Abi Noda:• LinkedIn: https://www.linkedin.com/in/abinoda • Substack: https://substack.com/@abinoda In this episode, we cover:(00:00) Intro: Running a data-driven evaluation of AI tools(02:36) Challenges in evaluating AI tools(06:11) How often to reevaluate AI tools(07:02) Incumbent tools vs challenger tools(07:40) Why organizations need disciplined evaluations before rolling out tools(09:28) How to size your tool shortlist based on developer population(12:44) Why tools must be grouped by use case and interaction mode(13:30) How to structure trials around a clear research question(16:45) Best practices for selecting trial participants(19:22) Why support and enablement are essential for success(21:10) How to choose the right duration for evaluations(22:52) How to measure impact using baselines and the AI Measurement Framework(25:28) Key considerations for an AI tool evaluation(28:52) Q&A: How reliable is self-reported time savings from AI tools?(32:22) Q&A: Why not adopt multiple tools instead of choosing just one?(33:27) Q&A: Tool performance differences and avoiding vendor lock-inReferenced:Measuring AI code assistants and agentsQCon conferencesDX Core 4 engineering metricsDORA’s 2025 research on the impact of AIUnpacking METR’s findings: Does AI slow developers down?METR’s study on how AI affects developer productivityClaude CodeCursorWindsurfDo newer AI-native IDEs outperform other AI coding assistants?

Audio

Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes

🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Other recent transcribed episodes

Transcribed and ready to explore now

3ª PARTE | 17 DIC 2025 | EL PARTIDAZO DE COPE

01 Jan 1970

El Partidazo de COPE

13:00H | 21 DIC 2025 | Fin de Semana

01 Jan 1970

Fin de Semana

12:00H | 21 DIC 2025 | Fin de Semana

01 Jan 1970

Fin de Semana

10:00H | 21 DIC 2025 | Fin de Semana

01 Jan 1970

Fin de Semana

13:00H | 20 DIC 2025 | Fin de Semana

01 Jan 1970

Fin de Semana

12:00H | 20 DIC 2025 | Fin de Semana

01 Jan 1970

Fin de Semana

Comments

There are no comments yet.

Please log in to write the first comment.

Engineering Enablement by DX

This episode hasn't been transcribed yet

Other recent transcribed episodes

3ª PARTE | 17 DIC 2025 | EL PARTIDAZO DE COPE

13:00H | 21 DIC 2025 | Fin de Semana

12:00H | 21 DIC 2025 | Fin de Semana

10:00H | 21 DIC 2025 | Fin de Semana

13:00H | 20 DIC 2025 | Fin de Semana

12:00H | 20 DIC 2025 | Fin de Semana

Sign in to Audioscrape

Share this moment