Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Marcus Hutter

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
912 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 1
Confidence: Medium

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

And we first, I should say, you know, what is, you know, how do we measure performance?

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

So we measure performance by giving the agent reward.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

That's the so-called reinforcement learning framework.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

So every time step, you can give it a positive reward, a negative reward, or maybe no reward.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

It could be very scarce, right?

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

Like if you play chess, just at the end of the game, you give plus one for winning or minus one for losing.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

So in the IXE framework, that's completely sufficient.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

So occasionally you give a reward signal and you ask the agent to maximize reward, but not greedily sort of, you know, the next one, next one, because that's very bad in the long run if you're greedy.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

But over the lifetime of the agent.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

So let's assume the agent lives for m time steps, let's say dies in sort of 100 years sharp.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

That's just the simplest model to explain.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

So it looks at the future reward sum and asks, what is my action sequence, or actually more precisely my policy, which leads in expectation

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

because I don't know the world, to the maximum reward sum.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

Let me give you an analogy.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

In chess, for instance, we know how to play optimally in theory.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

It's just a minimax strategy.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

I play the move which seems best to me under the assumption that the opponent plays the move which is best

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

for him, so worst for me, under the assumption that I play, again, the best move, and then you have this expected max three to the end of the game, and then you backpropagate, and then you get the best possible move.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

So that is the optimal strategy, which von Neumann already figured out a long time ago, for playing adversarial games.

Lex Fridman Podcast
#75 โ€“ Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI

Luckily, or maybe unluckily for the theory, it becomes harder.