Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI Engineering Now

#1: Chatbot Arenaのデータを使ってドメイン独自の評価データセットを作る

08 Sep 2024

Description

Chatbot Arenaのデータを使ってドメイン独自の評価データセットを作るという論文、Judging LLM-as-a-Judge with MT-Bench and Chatbot Arenaを題材に話しました。 ポッドキャストの書き起こしサービス「LISTEN」はこちら Shownotes: Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference Chat with Open Large Language Models From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline | LMSYS Org Benchmarks 201: Why Leaderboards > Arenas >> LLM-as-Judge https://x.com/karpathy/status/1737544497016578453 https://github.com/lm-sys/arena-hard-auto/tree/main/BenchBuilder 出演者: seya(@sekikazu01) kagaya(@ry0_kaga)

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.