AI: post transformers
Episodes
Parallel-R1: Reinforcement Learning for Parallel Thinking in LLMs
12 Sep 2025
Contributed by Lukas
This September 10, 2025 technical report from Tencent AI Lab introduces Parallel-R1, a novel reinforcement learning (RL) framework designed to en...
Explaining AI for Digital Advertising with LLMs
11 Sep 2025
Contributed by Lukas
This April 2025 paper introduces SODA, a novel framework designed to enhance digital advertising strategies by making opaque AI systems more understa...
AdLlama: Boosting Ad Performance with Reinforcement Learning
11 Sep 2025
Contributed by Lukas
This July 2025 paper introduces AdLlama, a new large language model (LLM) for generating Facebook ad text, trained using Reinforcement Learning with P...
ByteCheckpoint: A Unified LLM Checkpointing System
11 Sep 2025
Contributed by Lukas
This July 2024 paper introduces ByteCheckpoint, a novel PyTorch-native system designed for Large Language Model (LLM) development. This system addre...
Darling: Reinforcing Diversity and Quality in Language Models
10 Sep 2025
Contributed by Lukas
This September 2025 paper introduces Diversity-Aware Reinforcement Learning (Darling), a novel framework designed to enhance both the quality and sema...
INF2: Near-Storage LLM Inference for High Throughput
10 Sep 2025
Contributed by Lukas
This February 2025 paper introduces INF2, a novel framework designed to enhance the generative inference throughput of large language models (LLMs) by...
K2-Think: A Parameter-Efficient Reasoning System
10 Sep 2025
Contributed by Lukas
The September 9 2025 press release and paper announce and detail K2 Think, an advanced open-source AI reasoning system developed by the Mohamed bi...
AlphaEvolve: AI for Scientific and Algorithmic Discovery
10 Sep 2025
Contributed by Lukas
The May - June 2025 sources introduce AlphaEvolve, a novel AI coding agent developed by Google DeepMind in collaboration with mathematicians like Jav...
BLEU: Automatic Machine Translation Evaluation
10 Sep 2025
Contributed by Lukas
This July 2002 paper introduced BLEU (Bilingual Evaluation Understudy), an automatic and inexpensive method for evaluating machine translation (MT) ...
Mini-o3: Scaling Reasoning for Visual Search
10 Sep 2025
Contributed by Lukas
This September 2025 paper introduces Mini-o3, a Vision-Language Model (VLM) designed to overcome the limitations of existing VLMs in handling complex ...
Masked Diffusion Models: Performance and Theory
10 Sep 2025
Contributed by Lukas
This September 2025 paper analyzes the theoretical benefits and limitations of Masked Diffusion Models (MDMs) for text generation, contrasting them w...
TraceRL: Reinforcement Learning for Diffusion Language Models
09 Sep 2025
Contributed by Lukas
This September 2025 paper introduces TraceRL, a novel reinforcement learning framework designed to enhance diffusion language models (DLMs) across ...
LLM Benchmark Robustness to Linguistic Variation
09 Sep 2025
Contributed by Lukas
This September 2025 paper investigates the reliability and robustness of Large Language Models (LLMs) when evaluated using traditional benchmarks. Th...
Behavioral Fingerprinting of Large Language Models
09 Sep 2025
Contributed by Lukas
This September 2025 paper introduces "Behavioral Fingerprinting," a novel framework designed to evaluate Large Language Models (LLMs) beyond traditi...
Offloading LLM Models and KV Caches to NVMe SSDs
08 Sep 2025
Contributed by Lukas
This March 2025 paper examines the input/output (I/O) characteristics of offloading large language model (LLM) components to NVMe SSDs during inferen...
GPT-NeoX: Large-Scale Autoregressive Language Modeling in PyTorch
07 Sep 2025
Contributed by Lukas
Thus describes EleutherAI's GPT-NeoX library, a robust open-source framework for training large-scale autoregressive language models on GPUs, buildi...
SGLang: Efficient Language Model Program Execution
07 Sep 2025
Contributed by Lukas
This June 2024 paper introduces SGLang, a framework designed to enhance the efficiency of Large Language Model (LLM) and Vision Language Model (VLM) ...
Eleuther: evaluating LLMs
07 Sep 2025
Contributed by Lukas
These sources collectively explore various approaches to evaluating and improving Large Language Models (LLMs). Several papers introduce new benchmark...
OpenELM: Apple's Open Language Model Family
07 Sep 2025
Contributed by Lukas
The provided May 2024 sources center around CoreNet, an Apple-developed library for training deep neural networks, and OpenELM, an efficient language ...
FineVision: Open Data for Computer Vision
07 Sep 2025
Contributed by Lukas
These September 2025 posts describe HuggingFaceM4/FineVision, a large dataset designed for image and text modalities. It features a substantial size...
Evaluating Large Language Models Trained on Code
07 Sep 2025
Contributed by Lukas
This July 2021 paper documents the development and evaluation of OpenAI's Codex models, which are large language models specialized in code generation...
Democratizing AI Compute: The Modular Vision
07 Sep 2025
Contributed by Lukas
This blog post series from Chris Lattner extensively examines CUDA's pervasive dominance in AI compute, detailing its evolution from a graphics proces...
Limitations of Embedding-Based Retrieval
06 Sep 2025
Contributed by Lukas
This August 2025 paper from Google DeepMind, titled "On the Theoretical Limitations of Embedding-Based Retrieval," explores the fundamental constrain...
SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence
06 Sep 2025
Contributed by Lukas
This September 2025 paper describe SAIR, the Structurally Augmented IC50 Repository, a groundbreaking open-source dataset developed by SandboxAQ in c...
EmbeddingGemma: On-Device AI for High-Quality Embeddings
05 Sep 2025
Contributed by Lukas
This document announces EmbeddingGemma, a new open embedding model from Google, specifically designed for on-device artificial intelligence (AI). ...
MTEB & MMTEB: The Massive Text Embedding Benchmark
05 Sep 2025
Contributed by Lukas
These academic papers introduce and detail the Massive Multilingual Text Embedding Benchmark (MMTEB), a comprehensive evaluation framework for text em...
DeepResearch Arena: Benchmarking LLMs' Research Abilities
05 Sep 2025
Contributed by Lukas
This September 2025 paper introduces DeepResearch Arena, a novel benchmark designed to evaluate the research capabilities of large language models (LL...
Inverse IFEval: Unlearning LLM Cognitive Inertia
05 Sep 2025
Contributed by Lukas
This September 2025 paper introduces Inverse IFEval, a novel benchmark designed to evaluate Large Language Models (LLMs) for their Counter-intuiti...
The Rise of Physical Neural Networks
04 Sep 2025
Contributed by Lukas
This June 2024 paper examines the current state and future potential of Physical Neural Networks (PNNs), which are AI systems implemented directly in ...
FastVLM: Efficient Vision Encoding for Language Models
04 Sep 2025
Contributed by Lukas
This May 2025 paper introduces FastVLM, an innovative approach designed to enhance the efficiency of Vision Language Models (VLMs). The authors explai...
Apertus Tech Report Overview
04 Sep 2025
Contributed by Lukas
This paper introduces Apertus, a large language model developed by the Swiss AI Initiative, a partnership between ETH Zurich and EPFL. The GitHub re...
Supervised Learning in DNA Neural Networks
04 Sep 2025
Contributed by Lukas
This September 2025 paper article from Nature, authored by Kevin M. Cherry and Lulu Qian, introduces a novel DNA-based neural network capable of supe...
FusionANNS: Billion-Scale ANNS with SSD and GPU
03 Sep 2025
Contributed by Lukas
This September 2024 paper introduces FusionANNS, a novel system designed to improve Approximate Nearest Neighbor Search (ANNS) for extremely large...
rStar2-Agent: Smarter Math Reasoning Through Agentic RL
03 Sep 2025
Contributed by Lukas
This August 2025 paper introduces rStar2-Agent, a 14B math reasoning model developed by Microsoft Research that achieves state-of-the-art performanc...
Scientific LLMs: A Data-Centric Survey and Roadmap
03 Sep 2025
Contributed by Lukas
This August 2025 paper offers an extensive overview of the evolution and application of Large Language Models (LLMs) and Multimodal Large Language Mo...
Pimba: Processing-in-Memory for LLM Serving
27 Aug 2025
Contributed by Lukas
This August 2025 paper introduces Pimba, a novel Processing-in-Memory (PIM) accelerator designed to enhance the efficiency of Large Language Model...
Oaken: Fast, Efficient LLM Serving with Hybrid KV Cache Quantization
27 Aug 2025
Contributed by Lukas
This August 2025 paper introduces Oaken, a novel acceleration solution for serving Large Language Models (LLMs) that addresses the significant challe...
AdamW: Decoupled Weight Decay Regularization for Adaptive Gradient Algorithms
27 Aug 2025
Contributed by Lukas
This January 2019 academic paper addresses the common issue of poor generalization in adaptive gradient optimization methods like Adam, compared to t...
Training Recurrent Neural Networks: Vanishing and Exploding Gradients
27 Aug 2025
Contributed by Lukas
This academic paper addresses the inherent challenges in training Recurrent Neural Networks (RNNs), specifically the vanishing and exploding gradie...
Adafactor: Memory-Efficient Adaptive Learning Rates
27 Aug 2025
Contributed by Lukas
This April 2018 paper introduces Adafactor, a novel optimization method designed to reduce the memory footprint of adaptive learning rate algorithms ...
SPAM: Stabilizing LLM Training with Spike-Aware Optimization
27 Aug 2025
Contributed by Lukas
This February 2025 research addresses the critical issue of training instability in Large Language Models (LLMs), which often stems from sudden, mass...
Google: Measuring AI's Environmental Impact at Scale
26 Aug 2025
Contributed by Lukas
This August 2025 paper presents Google's comprehensive methodology for measuring the environmental impact of AI inference workloads in a large-sca...
ComoRAG: Cognitively Inspired Narrative Reasoning
26 Aug 2025
Contributed by Lukas
This August 2025 paper introduces ComoRAG, a novel framework designed to enhance long-context narrative comprehension in Large Language Models (LLM...
Quantizing Diffusion LLMs: A Systematic Study
26 Aug 2025
Contributed by Lukas
This August 2025 academic paper explores the application of post-training quantization (PTQ) to diffusion large language models (dLLMs), a promising a...
ODYSSEY: Unified Mobile Manipulation for Agile Quadruped Robots
26 Aug 2025
Contributed by Lukas
This August 2025 paper introduces ODYSSEY, a comprehensive framework for open-world mobile manipulation that integrates robotic mobility, manipulat...
GPT-5 Spatial Intelligence: An Empirical Study
24 Aug 2025
Contributed by Lukas
This August 2025 academic paper, titled "Has GPT-5 Achieved Spatial Intelligence? An Empirical Study," examines the spatial understanding and reasonin...
DeepSeek-V3.1: A Hybrid AI Model with Enhanced Reasoning
23 Aug 2025
Contributed by Lukas
This is a review of DeepSeek's latest release announced on Hugging Face on August 21, 2025. The source introduces DeepSeek-V3.1, a hybrid large langua...
Compressed Experts: Efficient MoE Model Editing
23 Aug 2025
Contributed by Lukas
This March 2025 paper introduces compressed experts, an innovative method to enhance the efficiency of Mixture-of-Experts (MoE) models by reducing ...
Genie 3: A New Frontier for World Models
22 Aug 2025
Contributed by Lukas
The source provides an overview of Google DeepMind's AI research and models, highlighting various applications across different scientific disciplines...
Los Alamos: overcoming the memory wall fighting sparse memory access
21 Aug 2025
Contributed by Lukas
We review Los Alamos National Laboratory advancements in managing indirect memory accesses in high-performance computing and it's relationship to over...
Switch Transformers: Trillion Parameter Models with Sparsity
20 Aug 2025
Contributed by Lukas
This June 2022 paper introduces Switch Transformers, a novel architecture designed to enhance the efficiency and scalability of large-scale language m...
Linear Transformers: Faster Than RNNs
20 Aug 2025
Contributed by Lukas
This August 2020 paper introduces linear transformers, a novel approach to addressing the computational and memory inefficiencies of traditional tr...
Speed Always Wins: Efficient Large Language Model Architectures
20 Aug 2025
Contributed by Lukas
This August 2025 survey paper explores efficient architectures for large language models (LLMs), addressing the computational challenges of models li...
Atom: Low-Bit Quantization for LLM Serving
18 Aug 2025
Contributed by Lukas
This April 2024 paper introduces Atom, a novel low-bit quantization method designed to enhance the efficiency and accuracy of Large Language Model (...
Continuous Batching for LLM Inference: Throughput and Latency Gains
18 Aug 2025
Contributed by Lukas
The source analyzes Large Language Model (LLM) inference, specifically focusing on how continuous batching significantly improves efficiency compar...
Self-Search Reinforcement Learning for LLMs
18 Aug 2025
Contributed by Lukas
This August 2025 paper introduces Self-Search Reinforcement Learning (SSRL), a novel method that enables Large Language Models (LLMs) to access and...
Diffusion Language Models: Principles, Techniques, and Applications
18 Aug 2025
Contributed by Lukas
This August 2025 paper offers a comprehensive overview of diffusion language models (DLMs), contrasting them with traditional autoregressive (AR) and ...
The Mapped Memory Mistake: Why DBMSs Should Avoid MMAP
13 Aug 2025
Contributed by Lukas
This 2022 paper is a reminder of issues with mmap() for databases. Yet many Vector Databases today rely on mmap().This academic paper critically evalu...
NVIDIA GDS, BAM Vs RocM solutions
13 Aug 2025
Contributed by Lukas
This is a huge review of 13 different sources on advancements in GPU-accelerated computing, focusing on data access, memory management, and performa...
pNFS Flex Files
13 Aug 2025
Contributed by Lukas
This reviews the IETF Parallel Network File System (pNFS), an extension to NFS that separates file metadata from data storage. Specifically, "RFC 843...
ELMo-Tune-V2: LLM-Assisted Auto-Tuning for Key-Value Stores
13 Aug 2025
Contributed by Lukas
This February 2025 paper introduces ELMo-Tune-V2, a novel framework that leverages Large Language Models (LLMs) to fully automate the optimization ...
fMoE: Fine-Grained Expert Offloading for MoE Serving
13 Aug 2025
Contributed by Lukas
This February 2025 paper introduces fMoE, a novel fine-grained expert offloading system designed to optimize the serving efficiency of Mixture-of-Expe...
AiSAQ: DRAM-free ANNS with Product Quantization
13 Aug 2025
Contributed by Lukas
This paper February 2025 paper introduces AiSAQ (All-in-Storage ANNS with Product Quantization), a novel method designed for Approximate Nearest Nei...
Scaling PostgreSQL at OpenAI: Read-Heavy Workloads and Optimizations - PGConf.dev 2025
13 Aug 2025
Contributed by Lukas
A video transcript is used to review the PostgreSQL Development Conference (PGConf.dev 2025) presentation titled "Scaling Postgres to the Next Level ...
NVMe Offload on Colossal AI: Breaking the GPU Memory Wall
13 Aug 2025
Contributed by Lukas
We review Colossal-AI's NVMe offload functionality, designed to overcome GPU memory limitations when training large-scale models by transferring optim...
Mem0: Scalable Long-Term Memory for AI Agents
12 Aug 2025
Contributed by Lukas
The provided source introduces Mem0 and Mem0g, two novel memory architectures designed to enhance Large Language Models (LLMs) by overcoming their inh...
Qwen-Image: Generation and Editing with Precision
12 Aug 2025
Contributed by Lukas
This academic paper introduces Qwen-Image, an open-source model designed for generating high-quality images from text. It details the multi-stage data...
Chain-of-Thought Reasoning: A Brittle Mirage?
11 Aug 2025
Contributed by Lukas
This August 2025 paper from Arizona State University's Data Mining and Machine Learning Lab investigates whether Chain-of-Thought (CoT) reasoning in ...
DroidSpeak: Cross-LLM KV Cache Sharing
08 Aug 2025
Contributed by Lukas
The provided text introduces DroidSpeak, a novel distributed Large Language Model (LLM) inference system designed to enhance the efficiency of compoun...
Dynamic Tanh: Transformers Without Normalization
08 Aug 2025
Contributed by Lukas
The paper introduces Dynamic Tanh (DyT), a novel element-wise operation designed to replace normalization layers in Transformer models. Traditionally,...
Movement Pruning: Adaptive Sparsity by Fine-Tuning
08 Aug 2025
Contributed by Lukas
This academic paper introduces movement pruning, a novel method for reducing the size of large pre-trained language models like BERT during fine-tunin...
Kaiming Initialization and PReLU
08 Aug 2025
Contributed by Lukas
This academic paper explores rectified activation units (rectifiers) in neural networks, which are crucial for advanced image classification. The auth...
Xavier Initialization: Deep Feedforward Networks: Training Difficulties and Solutions
08 Aug 2025
Contributed by Lukas
This document explores the challenges associated with training deep feedforward neural networks, specifically investigating why standard gradient desc...
MEGABYTE: Multiscale Transformers for Million-byte Sequences
08 Aug 2025
Contributed by Lukas
The research paper introduces MEGABYTE, a novel multi-scale transformer architecture designed to efficiently process exceptionally long sequences, exc...
Gemma: Google DeepMind's Open Language Models
08 Aug 2025
Contributed by Lukas
These sources collectively introduce and explain MedGemma and MedSigLIP, two collections of open-source AI models developed by Google Health for healt...
The Elements of Differentiable Programming
08 Aug 2025
Contributed by Lukas
This document provides a comprehensive overview of differentiable programming, a paradigm enabling gradient-based optimization of computer programs, e...
DiMSUM: Image Generation with Diffusion Mamba
08 Aug 2025
Contributed by Lukas
This academic paper introduces DiMSUM, a novel architecture for image generation that enhances diffusion models by integrating both spatial and freque...
LMCache: Supercharging LLM Performance with KV Cache Management
08 Aug 2025
Contributed by Lukas
The provided texts discuss LMCache, an open-source library designed to enhance the efficiency of large language models (LLMs) by optimizing Key-Value ...
AI and the Memory Wall: Overcoming Bottlenecks
08 Aug 2025
Contributed by Lukas
The provided text, titled "AI and Memory Wall," examines the growing disparity between computational power and memory bandwidth in AI, particularly fo...
DyNN-Offload: Efficient Memory for Dynamic Neural Networks
08 Aug 2025
Contributed by Lukas
This document introduces DyNN-Offload, a novel memory management system designed to overcome the GPU memory limitations faced when training large dyna...
TierTrain: Proactive Memory Tiering for DNN Training
08 Aug 2025
Contributed by Lukas
The provided text describes TierTrain: Proactive Memory Tiering for CPU-Based DNN Training, a paper presented at the International Symposium on Memory...
MoE Offloaded
08 Aug 2025
Contributed by Lukas
The sources discuss Mixture-of-Experts (MoE) models, a type of neural network that selectively activates different parameters for incoming data, offer...
CODEGEN: Open Language Model for Code Synthesis
08 Aug 2025
Contributed by Lukas
This source introduces CODEGEN, a family of large language models developed by Salesforce Research, designed for program synthesis. The models, varyin...
DeepSeekMoE: Scalable Mixture-of-Experts Language Models
08 Aug 2025
Contributed by Lukas
The provided text introduces DeepSeekMoE, an innovative Mixture-of-Experts (MoE) architecture designed to enhance expert specialization in large langu...
DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis
08 Aug 2025
Contributed by Lukas
This reviews a document dated January 27, 2025, from Daniel and Michael at Unsloth, details their work on quantizing DeepSeek-R1's 671B parameter mod...
DeepSeek Safety Concerns
08 Aug 2025
Contributed by Lukas
This research paper focuses on a safety evaluation of DeepSeek-R1 and DeepSeek-V3 models within Chinese language contexts, an area previously underexp...
DeepSeek-V3: A Technical Report
08 Aug 2025
Contributed by Lukas
This paper introduces DeepSeek-V3, a large Mixture-of-Experts (MoE) model designed to advance open-source language model capabilities with improve...
DeepSeek-R1: Incentivizing Reasoning in LLMs
08 Aug 2025
Contributed by Lukas
This paper introduces DeepSeek-R1, a new suite of large language models developed by DeepSeek-AI, focusing on enhancing reasoning capabilities through...
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
08 Aug 2025
Contributed by Lukas
Nine different sources on Mamba are reviewed, including the paper that introduced it.The provided sources explore Mamba, a linear recurrent neural net...
Demystifying Mamba: Architecture and Capabilities
08 Aug 2025
Contributed by Lukas
This document explores the Mamba architecture, a novel approach to sequence modeling that offers an efficient alternative to Transformers. It primaril...
MetaScale: Test-Time Scaling with Evolving Meta-Thoughts
08 Aug 2025
Contributed by Lukas
The source introduces MetaScale, a novel framework designed to enhance Large Language Models' (LLMs) complex reasoning capabilities during inference. ...
Test-Time Scaling
08 Aug 2025
Contributed by Lukas
The provided sources discuss advancements in large language models (LLMs), specifically focusing on test-time compute scaling to enhance reasoning per...
Chain of thought
08 Aug 2025
Contributed by Lukas
This reviews two papers on Chain of Thought:1) https://arxiv.org/pdf/2201.11903 - Chain-of-Thought Prompting Elicits Reasoning in Large Language Model...
LoRA: Low-Rank Adaptation of Large Language Models
08 Aug 2025
Contributed by Lukas
This reviews the paper which introduces Low-Rank Adaptation (LoRA), a novel method designed to efficiently adapt large language models for specific...
Reinforcement Learning
08 Aug 2025
Contributed by Lukas
This reviews the public second edition book by Richard Sutton and Andrew Barton on "Reinforcement learning".This document serves as an expanded second...
Concept Drift
08 Aug 2025
Contributed by Lukas
Five different sources are reviewed to understand Concept Drift in neural networks.1) https://www.nature.com/articles/s41467-024-46142-w - Empirical d...
Multi Query Attention: PaLM: Scaling Language Modeling with Pathways
08 Aug 2025
Contributed by Lukas
67 authors were involved in this research!This source is an academic paper titled "PaLM: Scaling Language Modeling with Pathways," authored by Aakanks...
Reinforcement Pre-Training for Language Models
08 Aug 2025
Contributed by Lukas
The source introduces Reinforcement Pre-Training (RPT), a novel approach that redefines next-token prediction in large language models (LLMs) as a ver...
Multiagent Debate Improves Language Model Reasoning
08 Aug 2025
Contributed by Lukas
This paper introduces a multi-agent debate framework designed to enhance the factuality and reasoning capabilities of large language models (LLMs). Th...
KVQuant: LLM Inference with KV Cache Quantization
08 Aug 2025
Contributed by Lukas
Three research papers are reviewed:1) https://arxiv.org/pdf/2401.18079 - KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quanti...