Menu
Sign In Pricing Add Podcast

Tool Use Co-Host

👤 Person
11 appearances

Podcast Appearances

Absolutely. I've kind of tried to teach people to view it as like Wikipedia. Use it to get started, but it's not something you can put as a reference in your paper. In regards to the hallucinations, a lot of people try to solve this with evals or just build enough of a robust eval set that they're able to kind of mitigate against some of the risks of hallucinations.

Do you find businesses are implementing any other types of strategies or are they even following through with the evals or just kind of yellowing it? What's the vibe in the business community?

I wouldn't mind diving into the security aspect a little bit. We're familiar with this one project, a code gate, which kind of acts as a local proxy that your LLM requests route through so it can redact PII and stuff like that. But it just seems to be just getting started. Do you have any either tools or advice for companies that are concerned about security and bringing LLMs into their workflow?

This week, we're joined by Nathaniel Whittlemore, also known as NLW, the founder and CEO of Superintelligent, as well as the host of my favorite daily AI podcast, the AI Daily Brief. NLW, welcome to Tool Use.

Yeah, I found it something similar to where people say, oh, the newest agent from OpenAI deep research, which I've used and is great. And other people say like, well, what about code interpreter? Is that an agent? And ultimately it doesn't matter whether it's a tool or a workflow, as long as it solves a certain task for you. Through your use of them, what type of use cases are you excited for?

What have you found to be actually helpful in the current state?

Yeah, absolutely. And I've even seen the progression where you have those chats with Cloud or ChatGPT to get some input, help with the brainstorming, coming up with titles, to creating a Cloud project when you can upload a bunch of documents, a bunch of standards and best practices, so you can get more consistent results over time.

We've also experimented with the AI editors and we've yet to find success there. But it's interesting how the chasm between what works today and what is, you know, not quite working, what's a little ways off is just shrinking by the day.

Have you noticed any tools in your workflow that have really allowed you to completely offset a process or are you still a human in the loop a lot of the time for these type of things?

Yeah, absolutely. And as a longtime listener, I can tell you that the added personality, the added perspective always helps besides just, you know, an information dump. I actually wouldn't mind double click on deep research because I've also used it, had positive results. But as you mentioned, the Twitter vibe test, a lot of people didn't seem to like it.

A lot of people did, but it was one of those right down the middle ones. What's your experience been like with it? Do you think it's a step in the right direction? And even just like long running AI processes in general? Do you think that's the future?