Navrina Singh
👤 PersonAppearances Over Time
Podcast Appearances
When DeepSeek R1 came out earlier this year, there was a lot of, I would say, excitement.
Wow, there's a highly capable, low-cost model that is open source that we can use within our business.
Now, on benchmarks, it did really well, especially these static benchmarks.
However, what it missed was the business context.
And as an example, the business context could be, I'm a financial services organization.
If I do plan to use DeepSea Carbon for a fraud model, it better meet my security thresholds.
It better meet my SR-11-7 compliance requirements.
And guess what?
DeepSea Carbon failed at all those.
It did not do well not only in the world of safety, but security, compliance, all those which are really critical, even non-negotiable requirements, it failed on that.
So I think one of the core things at Credo AI that we launched earlier this year is this notion of model trust scores.
which is essentially how do you augment your static benchmark information with very industry specific use case specific evaluations, which are fit for purpose so that you can actually make a decision.
Do I want to use deep seek or do I rather go with a very safe model that might be proprietary coming from one of the foundation model providers?
Exactly, Corey.
And I think it is, you know, I think one of the things I and then it's very exciting to see that in the action plan that just came out 10 days ago.
Again, it is not the dearth of AI capabilities or models or application.
It is how are you going to build trust in these systems?
Are they secure?
Are they compliant?
Are they safe?