Steve Hsu
๐ค SpeakerAppearances Over Time
Podcast Appearances
And the way we program, we use sort of old-style programming in this platform.
We sort of force the model to only use that knowledge base in answering the fact part of questions.
And so that, that extra constraint, it solves the hallucination problem and makes both the behavior and the knowledge base of the AI reliable.
So if you look at the architecture, it is actually a piece of it is RAG.
And interestingly, we actually, when we founded the startup, we actually filed a patent.
The company filed a patent on our architecture.
And that was actually before the word RAG was in wide usage.
So it is possible, who knows how the USPTO, Patent and Trademark Office, operates, but we might be issued a patent on RAG.
Of course, it's not referred to RAG in the patent filing.
It has a lot of similarity.
Another thing you might do is you might have multiple models involved in the generation of the response in which some models are just error-checking
the proposed response of the big model against what the little models can see in the knowledge base.
And all that, if you're doing voice, which we do, all that has to happen in a latency time of less than two seconds.
So humans, if I stop speaking and I'm waiting for you to respond to me, if it goes more than a couple seconds, it's kind of strange.
And so all of that stuff that I just described to you is engineered down so that the latency is between one and two seconds, so it sounds natural.
Well, it is improved.
The situation is improved if you're using a model that has reasoning capabilities.
What's going on in the reasoning is the model has been taught as it sort of talks to itself in trying to solve, generate a good response to your query.
It has been taught to double check facts or components of the reasoning.
However, if the model doesn't really have access to the actual ground truth, it can still go off the rails because it can think X is true.