LlamaCast
Activity Overview
Episode publication activity over the past year
Episodes
Marco-o1
23 Nov 2024
Contributed by Lukas
🤖 Marco-o1: Towards Open Reasoning Models for Open-Ended SolutionsThe Alibaba MarcoPolo team presents Marco-o1, a large reasoning model designed to...
Scaling Laws for Precision
18 Nov 2024
Contributed by Lukas
⚖️ Scaling Laws for PrecisionThis research paper investigates the impact of precision in training and inference on the performance of large langua...
Test-Time Training
14 Nov 2024
Contributed by Lukas
⌛️ The Surprising Effectiveness of Test-Time Training for Abstract ReasoningThis paper examines how test-time training (TTT) can enhance the abstr...
Qwen2.5-Coder
12 Nov 2024
Contributed by Lukas
🔷 Qwen2.5-Coder Technical ReportThe report introduces the Qwen2.5-Coder series, which includes the Qwen2.5-Coder-1.5B and Qwen2.5-Coder-7B models. ...
Attacking Vision-Language Computer Agents via Pop-ups
09 Nov 2024
Contributed by Lukas
😈 Attacking Vision-Language Computer Agents via Pop-upsThis research paper examines vulnerabilities in vision-language models (VLMs) that power aut...
Number Cookbook
08 Nov 2024
Contributed by Lukas
📓 Number Cookbook: Number Understanding of Language Models and How to Improve ItThis research paper examines the numerical understanding and proces...
Jigsaw Puzzles
07 Nov 2024
Contributed by Lukas
🧩 Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language ModelsThis research paper investigates the vulnerabilities of large langu...
Multi-expert Prompting with LLMs
05 Nov 2024
Contributed by Lukas
🤝 Multi-expert Prompting with LLMsThe research paper presents Multi-expert Prompting, a novel method for improving the reliability, safety, and use...
Investigating the Role of Prompting and External Tools in Hallucination Rates of LLMs
03 Nov 2024
Contributed by Lukas
🔎 Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language ModelsThis paper examines the effectiveness of di...
Mind Your Step (by Step)
02 Nov 2024
Contributed by Lukas
🌀 Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans WorseThis research paper examines how chai...
SimpleQA
31 Oct 2024
Contributed by Lukas
❓Measuring short-form factuality in large language modelsThis document introduces SimpleQA, a new benchmark for evaluating the factuality of large l...
GPT-4o System Card
30 Oct 2024
Contributed by Lukas
📜 GPT-4o System CardThis technical document is the System Card for OpenAI's GPT-4o, a multimodal, autoregressive language model that can process an...
Mixture of Parrots
29 Oct 2024
Contributed by Lukas
🦜 Mixture of Parrots: Experts improve memorization more than reasoningThis research paper investigates the effectiveness of Mixture-of-Experts (MoE...
Improve Vision Language Model Chain-of-thought Reasoning
28 Oct 2024
Contributed by Lukas
🖼 Improve Vision Language Model Chain-of-thought ReasoningThis research paper investigates how to improve the chain-of-thought (CoT) reasoning capa...
Breaking the Memory Barrier
27 Oct 2024
Contributed by Lukas
🧠 Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive LossThis research paper introduces Inf-CL, a novel approach for con...
LLMs Reflect the Ideology of their Creators
26 Oct 2024
Contributed by Lukas
⚖️ Large Language Models Reflect the Ideology of their CreatorsThis study examines the ideological stances of large language models (LLMs) by anal...
LongRAG
25 Oct 2024
Contributed by Lukas
📜 LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question AnsweringThe source is a research paper that propos...
A Theoretical Understanding of Chain-of-Thought
24 Oct 2024
Contributed by Lukas
⛓️ A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware DemonstrationThe paper explores Chain-of-Thought (CoT) prom...
A Survey on Data Synthesis and Augmentation for Large Language Models
23 Oct 2024
Contributed by Lukas
📚 A Survey on Data Synthesis and Augmentation for Large Language ModelsThis research paper examines the use of synthetic and augmented data to enha...
Revealing the Barriers of Language Agents in Planning
22 Oct 2024
Contributed by Lukas
🤔 Revealing the Barriers of Language Agents in PlanningThis research paper examines the challenges faced by language agents in planning tasks. The ...
Intelligence at the Edge of Chaos
21 Oct 2024
Contributed by Lukas
🔀 Intelligence at the Edge of ChaosThis research investigates how intelligent behavior emerges in artificial systems by studying the connection bet...
Inference Scaling for Long-Context RAG
20 Oct 2024
Contributed by Lukas
🗓 Inference Scaling for Long-Context Retrieval Augmented GenerationThis research paper explores the effectiveness of inference scaling for retrieva...
Model Swarms
19 Oct 2024
Contributed by Lukas
🤝 Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm IntelligenceThis paper presents a new method called MODEL SWARMS, a collaborati...
Agent-as-a-Judge
18 Oct 2024
Contributed by Lukas
🤖 Agent-as-a-Judge: Evaluate Agents with AgentsThe paper detail a new framework for evaluating agentic systems called Agent-as-a-Judge, which uses ...
First-Person Fairness in Chatbots
18 Oct 2024
Contributed by Lukas
⚖️ First-Person Fairness in ChatbotsThis paper from OpenAI examines potential bias in chatbot systems like ChatGPT, specifically focusing on how a...
Thinking LLMs
18 Oct 2024
Contributed by Lukas
🤔 Thinking LLMs: General Instruction Following with Thought GenerationThis research paper explores the concept of "Thinking LLMs," or large languag...
Addition is All You Need
18 Oct 2024
Contributed by Lukas
🔋 Addition is All You Need for Energy-efficient Language ModelsThis research paper introduces a novel algorithm called Linear-Complexity Multiplica...
MLE-bench
18 Oct 2024
Contributed by Lukas
🤖 MLE-bench: Evaluating Machine Learning Agents on Machine Learning EngineeringThe paper introduces MLE-bench, a benchmark designed to evaluate AI ...
Long-Context LLMs Meet RAG
18 Oct 2024
Contributed by Lukas
📈 Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAGThis paper explores the challenges and opportunities of using long-contex...
GSM-Symbolic
18 Oct 2024
Contributed by Lukas
📊 GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language ModelsThe paper investigates the mathematical reasoning a...
Anti-Social LLM
18 Oct 2024
Contributed by Lukas
😶 Anti-Social Behavior and Persuasion Ability of LLMsThis study explores the behavior of Large Language Models (LLMs) in a simulated prison environ...
Differential Transformer
18 Oct 2024
Contributed by Lukas
🎧 Differential TransformerThe paper introduces the Differential Transformer, a new architecture for large language models (LLMs) that aims to impro...
ToolGen
18 Oct 2024
Contributed by Lukas
🛠 ToolGen: Unified Tool Retrieval and Calling via GenerationThis research paper introduces ToolGen, a novel framework that enables LLMs to directly...
LangGPT
18 Oct 2024
Contributed by Lukas
👨🔧 Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI ExpertsThis research proposes LangGPT, a structural promp...
Movie Gen
18 Oct 2024
Contributed by Lukas
🎞 Movie Gen: A Cast of Media Foundation ModelsMeta AI researchers have introduced Movie Gen, a suite of foundation models capable of generating hig...
LLMs Know More Than They Show
18 Oct 2024
Contributed by Lukas
🕵️♀️ LLMs Know More Than They ShowThis research examines the inner workings of large language models (LLMs) to understand and reduce their...
Were RNNs All We Needed?
18 Oct 2024
Contributed by Lukas
🔁 Were RNNs All We Needed?The paper "Were RNNs All We Needed?" examines the efficiency of traditional recurrent neural networks (RNNs), specificall...
SLMs, A Survey
18 Oct 2024
Contributed by Lukas
📱 Small Language Models: Survey, Measurements, and InsightsThis research paper reviews small language models (SLMs), which are optimized for use on...
o1 in Medicine
18 Oct 2024
Contributed by Lukas
💊 A Preliminary Study of o1 in MedicineThe research paper focuses on the performance of a new large language model (LLM) called o1 in the medical d...
RAG and Beyond
18 Oct 2024
Contributed by Lukas
📑 RAG and BeyondThis paper provides a comprehensive survey of the current state of data-augmented Large Language Models (LLMs), focusing on Retriev...
Molmo and PixMo
18 Oct 2024
Contributed by Lukas
🔓 Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal ModelsThis research paper introduces Molmo, a new family of vision-la...
Self-Taught Evaluators
18 Oct 2024
Contributed by Lukas
🔄 Self-Taught EvaluatorsThis research paper explores the development of self-taught language model evaluators. Instead of relying on costly human a...
Larger LLMs Become Less Reliable
18 Oct 2024
Contributed by Lukas
⚠️ Larger and more instructable language models become less reliableThis research paper from Nature explores the relationship between the size and...
Logic-of-Thought
18 Oct 2024
Contributed by Lukas
💭 Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in LLMsThis research paper introduces Logic-of-Thought (LoT), a novel promptin...
Moshi
18 Oct 2024
Contributed by Lukas
🟢 Moshi: a speech-text foundation model for real-time dialogueThe paper discusses a new multimodal foundation model called Moshi designed for real-...
Jailbreaking Large Language Models with Symbolic Mathematics
18 Oct 2024
Contributed by Lukas
🔑 Jailbreaking Large Language Models with Symbolic MathematicsThis research paper investigates a new vulnerability in AI safety mechanisms by intro...
LLMs Still Can't Plan; Can LRMs?
18 Oct 2024
Contributed by Lukas
📈 LLMs Still Can't Plan; Can LRMs?The paper "LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench" investigates th...
A Comprehensive Evaluation of Quantized Instruction-Tuned LLMs
18 Oct 2024
Contributed by Lukas
📏 A Comprehensive Evaluation of Quantized Instruction-Tuned LLMsThis paper, titled "A Comprehensive Evaluation of Quantized Instruction-Tuned Large...
On the Diagram of Thought
17 Oct 2024
Contributed by Lukas
🧠 On the Diagram of ThoughtThis paper introduces a new framework called Diagram of Thought (DoT) that models how large language models (LLMs) reaso...