Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

LlamaCast

Technology News Science

Activity Overview

Episode publication activity over the past year

Episodes

Marco-o1

23 Nov 2024

Contributed by Lukas

🤖 Marco-o1: Towards Open Reasoning Models for Open-Ended SolutionsThe Alibaba MarcoPolo team presents Marco-o1, a large reasoning model designed to...

Scaling Laws for Precision

18 Nov 2024

Contributed by Lukas

⚖️ Scaling Laws for PrecisionThis research paper investigates the impact of precision in training and inference on the performance of large langua...

Test-Time Training

14 Nov 2024

Contributed by Lukas

⌛️ The Surprising Effectiveness of Test-Time Training for Abstract ReasoningThis paper examines how test-time training (TTT) can enhance the abstr...

Qwen2.5-Coder

12 Nov 2024

Contributed by Lukas

🔷 Qwen2.5-Coder Technical ReportThe report introduces the Qwen2.5-Coder series, which includes the Qwen2.5-Coder-1.5B and Qwen2.5-Coder-7B models. ...

Attacking Vision-Language Computer Agents via Pop-ups

09 Nov 2024

Contributed by Lukas

😈 Attacking Vision-Language Computer Agents via Pop-upsThis research paper examines vulnerabilities in vision-language models (VLMs) that power aut...

Number Cookbook

08 Nov 2024

Contributed by Lukas

📓 Number Cookbook: Number Understanding of Language Models and How to Improve ItThis research paper examines the numerical understanding and proces...

Jigsaw Puzzles

07 Nov 2024

Contributed by Lukas

🧩 Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language ModelsThis research paper investigates the vulnerabilities of large langu...

Multi-expert Prompting with LLMs

05 Nov 2024

Contributed by Lukas

🤝 Multi-expert Prompting with LLMsThe research paper presents Multi-expert Prompting, a novel method for improving the reliability, safety, and use...

Investigating the Role of Prompting and External Tools in Hallucination Rates of LLMs

03 Nov 2024

Contributed by Lukas

🔎 Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language ModelsThis paper examines the effectiveness of di...

Mind Your Step (by Step)

02 Nov 2024

Contributed by Lukas

🌀 Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans WorseThis research paper examines how chai...

SimpleQA

31 Oct 2024

Contributed by Lukas

❓Measuring short-form factuality in large language modelsThis document introduces SimpleQA, a new benchmark for evaluating the factuality of large l...

GPT-4o System Card

30 Oct 2024

Contributed by Lukas

📜 GPT-4o System CardThis technical document is the System Card for OpenAI's GPT-4o, a multimodal, autoregressive language model that can process an...

Mixture of Parrots

29 Oct 2024

Contributed by Lukas

🦜 Mixture of Parrots: Experts improve memorization more than reasoningThis research paper investigates the effectiveness of Mixture-of-Experts (MoE...

Improve Vision Language Model Chain-of-thought Reasoning

28 Oct 2024

Contributed by Lukas

🖼 Improve Vision Language Model Chain-of-thought ReasoningThis research paper investigates how to improve the chain-of-thought (CoT) reasoning capa...

Breaking the Memory Barrier

27 Oct 2024

Contributed by Lukas

🧠 Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive LossThis research paper introduces Inf-CL, a novel approach for con...

LLMs Reflect the Ideology of their Creators

26 Oct 2024

Contributed by Lukas

⚖️ Large Language Models Reflect the Ideology of their CreatorsThis study examines the ideological stances of large language models (LLMs) by anal...

LongRAG

25 Oct 2024

Contributed by Lukas

📜 LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question AnsweringThe source is a research paper that propos...

A Theoretical Understanding of Chain-of-Thought

24 Oct 2024

Contributed by Lukas

⛓️ A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware DemonstrationThe paper explores Chain-of-Thought (CoT) prom...

A Survey on Data Synthesis and Augmentation for Large Language Models

23 Oct 2024

Contributed by Lukas

📚 A Survey on Data Synthesis and Augmentation for Large Language ModelsThis research paper examines the use of synthetic and augmented data to enha...

Revealing the Barriers of Language Agents in Planning

22 Oct 2024

Contributed by Lukas

🤔 Revealing the Barriers of Language Agents in PlanningThis research paper examines the challenges faced by language agents in planning tasks. The ...

Intelligence at the Edge of Chaos

21 Oct 2024

Contributed by Lukas

🔀 Intelligence at the Edge of ChaosThis research investigates how intelligent behavior emerges in artificial systems by studying the connection bet...

Inference Scaling for Long-Context RAG

20 Oct 2024

Contributed by Lukas

🗓 Inference Scaling for Long-Context Retrieval Augmented GenerationThis research paper explores the effectiveness of inference scaling for retrieva...

Model Swarms

19 Oct 2024

Contributed by Lukas

🤝 Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm IntelligenceThis paper presents a new method called MODEL SWARMS, a collaborati...

Agent-as-a-Judge

18 Oct 2024

Contributed by Lukas

🤖 Agent-as-a-Judge: Evaluate Agents with AgentsThe paper detail a new framework for evaluating agentic systems called Agent-as-a-Judge, which uses ...

First-Person Fairness in Chatbots

18 Oct 2024

Contributed by Lukas

⚖️ First-Person Fairness in ChatbotsThis paper from OpenAI examines potential bias in chatbot systems like ChatGPT, specifically focusing on how a...

Thinking LLMs

18 Oct 2024

Contributed by Lukas

🤔 Thinking LLMs: General Instruction Following with Thought GenerationThis research paper explores the concept of "Thinking LLMs," or large languag...

Addition is All You Need

18 Oct 2024

Contributed by Lukas

🔋 Addition is All You Need for Energy-efficient Language ModelsThis research paper introduces a novel algorithm called Linear-Complexity Multiplica...

MLE-bench

18 Oct 2024

Contributed by Lukas

🤖 MLE-bench: Evaluating Machine Learning Agents on Machine Learning EngineeringThe paper introduces MLE-bench, a benchmark designed to evaluate AI ...

Long-Context LLMs Meet RAG

18 Oct 2024

Contributed by Lukas

📈 Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAGThis paper explores the challenges and opportunities of using long-contex...

GSM-Symbolic

18 Oct 2024

Contributed by Lukas

📊 GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language ModelsThe paper investigates the mathematical reasoning a...

Anti-Social LLM

18 Oct 2024

Contributed by Lukas

😶 Anti-Social Behavior and Persuasion Ability of LLMsThis study explores the behavior of Large Language Models (LLMs) in a simulated prison environ...

Differential Transformer

18 Oct 2024

Contributed by Lukas

🎧 Differential TransformerThe paper introduces the Differential Transformer, a new architecture for large language models (LLMs) that aims to impro...

ToolGen

18 Oct 2024

Contributed by Lukas

🛠 ToolGen: Unified Tool Retrieval and Calling via GenerationThis research paper introduces ToolGen, a novel framework that enables LLMs to directly...

LangGPT

18 Oct 2024

Contributed by Lukas

👨‍🔧 Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI ExpertsThis research proposes LangGPT, a structural promp...

Movie Gen

18 Oct 2024

Contributed by Lukas

🎞 Movie Gen: A Cast of Media Foundation ModelsMeta AI researchers have introduced Movie Gen, a suite of foundation models capable of generating hig...

LLMs Know More Than They Show

18 Oct 2024

Contributed by Lukas

🕵️‍♀️ LLMs Know More Than They ShowThis research examines the inner workings of large language models (LLMs) to understand and reduce their...

Were RNNs All We Needed?

18 Oct 2024

Contributed by Lukas

🔁 Were RNNs All We Needed?The paper "Were RNNs All We Needed?" examines the efficiency of traditional recurrent neural networks (RNNs), specificall...

SLMs, A Survey

18 Oct 2024

Contributed by Lukas

📱 Small Language Models: Survey, Measurements, and InsightsThis research paper reviews small language models (SLMs), which are optimized for use on...

o1 in Medicine

18 Oct 2024

Contributed by Lukas

💊 A Preliminary Study of o1 in MedicineThe research paper focuses on the performance of a new large language model (LLM) called o1 in the medical d...

RAG and Beyond

18 Oct 2024

Contributed by Lukas

📑 RAG and BeyondThis paper provides a comprehensive survey of the current state of data-augmented Large Language Models (LLMs), focusing on Retriev...

Molmo and PixMo

18 Oct 2024

Contributed by Lukas

🔓 Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal ModelsThis research paper introduces Molmo, a new family of vision-la...

Self-Taught Evaluators

18 Oct 2024

Contributed by Lukas

🔄 Self-Taught EvaluatorsThis research paper explores the development of self-taught language model evaluators. Instead of relying on costly human a...

Larger LLMs Become Less Reliable

18 Oct 2024

Contributed by Lukas

⚠️ Larger and more instructable language models become less reliableThis research paper from Nature explores the relationship between the size and...

Logic-of-Thought

18 Oct 2024

Contributed by Lukas

💭 Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in LLMsThis research paper introduces Logic-of-Thought (LoT), a novel promptin...

Moshi

18 Oct 2024

Contributed by Lukas

🟢 Moshi: a speech-text foundation model for real-time dialogueThe paper discusses a new multimodal foundation model called Moshi designed for real-...

Jailbreaking Large Language Models with Symbolic Mathematics

18 Oct 2024

Contributed by Lukas

🔑 Jailbreaking Large Language Models with Symbolic MathematicsThis research paper investigates a new vulnerability in AI safety mechanisms by intro...

LLMs Still Can't Plan; Can LRMs?

18 Oct 2024

Contributed by Lukas

📈 LLMs Still Can't Plan; Can LRMs?The paper "LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench" investigates th...

A Comprehensive Evaluation of Quantized Instruction-Tuned LLMs

18 Oct 2024

Contributed by Lukas

📏 A Comprehensive Evaluation of Quantized Instruction-Tuned LLMsThis paper, titled "A Comprehensive Evaluation of Quantized Instruction-Tuned Large...

On the Diagram of Thought

17 Oct 2024

Contributed by Lukas

🧠 On the Diagram of ThoughtThis paper introduces a new framework called Diagram of Thought (DoT) that models how large language models (LLMs) reaso...