Dwarkesh Patel
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
I find it super interesting that the progress in different areas of AI is just based on not only the same techniques, but literally the same model that you can just use an open source LLM and then add a section expert on top.
It is notable that like you naively might think that, oh, there's a separate area of research which is robotics and there's a separate area of research called LLMs and natural language processing.
And no, it's like it's literally the same.
It's like the considerations are the same.
The architectures are the same.
Even the weights are the same.
I know you do more training on top of these open source models, but that I find super interesting.
Today, I'm here with Mark, who is a senior researcher at Hudson River Trading.
He has prepared for us a big data set of market prices and historical market data.
And we're going to try to figure out what's going on and whether we can predict future prices from historical market data.
Mark, let's dig in.
Happy to do it.
So it sounds like the first fun thing to do is probably to start looking at what an order book actually looks like.
If this sounds interesting to you, you should consider working at Hudson River Trading.
I was talking to this researcher, Sander, at GDM, and he works on video and audio models.
And he made the interesting point that the reason, in his view, we aren't seeing that much transfer learning between different modalities, that is to say, like training a language model on video and images, doesn't seem to necessarily make it that much better at textual learning.
questions and tasks, is that images are represented at a different semantic level than text.
And so his argument is that text has this high-level semantic representation within the model, whereas images and videos are just like compressed pixels.
There's not really a semantic... When they're embedded, they don't represent some high-level semantic information.
They're just like compressed pixels.