AI: post transformers

Parallel-R1: Reinforcement Learning for Parallel Thinking in LLMs

12 Sep 2025

Contributed by Lukas

This September 10, 2025 technical report from Tencent AI Lab introduces Parallel-R1, a novel reinforcement learning (RL) framework designed to en...

Explaining AI for Digital Advertising with LLMs

11 Sep 2025

Contributed by Lukas

This April 2025 paper introduces SODA, a novel framework designed to enhance digital advertising strategies by making opaque AI systems more understa...

AdLlama: Boosting Ad Performance with Reinforcement Learning

11 Sep 2025

Contributed by Lukas

This July 2025 paper introduces AdLlama, a new large language model (LLM) for generating Facebook ad text, trained using Reinforcement Learning with P...

ByteCheckpoint: A Unified LLM Checkpointing System

11 Sep 2025

Contributed by Lukas

This July 2024 paper introduces ByteCheckpoint, a novel PyTorch-native system designed for Large Language Model (LLM) development. This system addre...

Darling: Reinforcing Diversity and Quality in Language Models

10 Sep 2025

Contributed by Lukas

This September 2025 paper introduces Diversity-Aware Reinforcement Learning (Darling), a novel framework designed to enhance both the quality and sema...

INF2: Near-Storage LLM Inference for High Throughput

10 Sep 2025

Contributed by Lukas

This February 2025 paper introduces INF2, a novel framework designed to enhance the generative inference throughput of large language models (LLMs) by...

K2-Think: A Parameter-Efficient Reasoning System

10 Sep 2025

Contributed by Lukas

The September 9 2025 press release and paper announce and detail K2 Think, an advanced open-source AI reasoning system developed by the Mohamed bi...

AlphaEvolve: AI for Scientific and Algorithmic Discovery

10 Sep 2025

Contributed by Lukas

The May - June 2025 sources introduce AlphaEvolve, a novel AI coding agent developed by Google DeepMind in collaboration with mathematicians like Jav...

BLEU: Automatic Machine Translation Evaluation

10 Sep 2025

Contributed by Lukas

This July 2002 paper introduced BLEU (Bilingual Evaluation Understudy), an automatic and inexpensive method for evaluating machine translation (MT) ...

Mini-o3: Scaling Reasoning for Visual Search

10 Sep 2025

Contributed by Lukas

This September 2025 paper introduces Mini-o3, a Vision-Language Model (VLM) designed to overcome the limitations of existing VLMs in handling complex ...

Masked Diffusion Models: Performance and Theory

10 Sep 2025

Contributed by Lukas

This September 2025 paper analyzes the theoretical benefits and limitations of Masked Diffusion Models (MDMs) for text generation, contrasting them w...

TraceRL: Reinforcement Learning for Diffusion Language Models

09 Sep 2025

Contributed by Lukas

This September 2025 paper introduces TraceRL, a novel reinforcement learning framework designed to enhance diffusion language models (DLMs) across ...

LLM Benchmark Robustness to Linguistic Variation

09 Sep 2025

Contributed by Lukas

This September 2025 paper investigates the reliability and robustness of Large Language Models (LLMs) when evaluated using traditional benchmarks. Th...

Behavioral Fingerprinting of Large Language Models

09 Sep 2025

Contributed by Lukas

This September 2025 paper introduces "Behavioral Fingerprinting," a novel framework designed to evaluate Large Language Models (LLMs) beyond traditi...

Offloading LLM Models and KV Caches to NVMe SSDs

08 Sep 2025

Contributed by Lukas

This March 2025 paper examines the input/output (I/O) characteristics of offloading large language model (LLM) components to NVMe SSDs during inferen...

GPT-NeoX: Large-Scale Autoregressive Language Modeling in PyTorch

07 Sep 2025

Contributed by Lukas

Thus describes EleutherAI's GPT-NeoX library, a robust open-source framework for training large-scale autoregressive language models on GPUs, buildi...

SGLang: Efficient Language Model Program Execution

07 Sep 2025

Contributed by Lukas

This June 2024 paper introduces SGLang, a framework designed to enhance the efficiency of Large Language Model (LLM) and Vision Language Model (VLM) ...

Eleuther: evaluating LLMs

07 Sep 2025

Contributed by Lukas

These sources collectively explore various approaches to evaluating and improving Large Language Models (LLMs). Several papers introduce new benchmark...

OpenELM: Apple's Open Language Model Family

07 Sep 2025

Contributed by Lukas

The provided May 2024 sources center around CoreNet, an Apple-developed library for training deep neural networks, and OpenELM, an efficient language ...

FineVision: Open Data for Computer Vision

07 Sep 2025

Contributed by Lukas

These September 2025 posts describe HuggingFaceM4/FineVision, a large dataset designed for image and text modalities. It features a substantial size...

Evaluating Large Language Models Trained on Code

07 Sep 2025

Contributed by Lukas

This July 2021 paper documents the development and evaluation of OpenAI's Codex models, which are large language models specialized in code generation...

Democratizing AI Compute: The Modular Vision

07 Sep 2025

Contributed by Lukas

This blog post series from Chris Lattner extensively examines CUDA's pervasive dominance in AI compute, detailing its evolution from a graphics proces...

Limitations of Embedding-Based Retrieval

06 Sep 2025

Contributed by Lukas

This August 2025 paper from Google DeepMind, titled "On the Theoretical Limitations of Embedding-Based Retrieval," explores the fundamental constrain...

SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence

06 Sep 2025

Contributed by Lukas

This September 2025 paper describe SAIR, the Structurally Augmented IC50 Repository, a groundbreaking open-source dataset developed by SandboxAQ in c...

EmbeddingGemma: On-Device AI for High-Quality Embeddings

05 Sep 2025

Contributed by Lukas

This document announces EmbeddingGemma, a new open embedding model from Google, specifically designed for on-device artificial intelligence (AI). ...

MTEB & MMTEB: The Massive Text Embedding Benchmark

05 Sep 2025

Contributed by Lukas

These academic papers introduce and detail the Massive Multilingual Text Embedding Benchmark (MMTEB), a comprehensive evaluation framework for text em...

DeepResearch Arena: Benchmarking LLMs' Research Abilities

05 Sep 2025

Contributed by Lukas

This September 2025 paper introduces DeepResearch Arena, a novel benchmark designed to evaluate the research capabilities of large language models (LL...

Inverse IFEval: Unlearning LLM Cognitive Inertia

05 Sep 2025

Contributed by Lukas

This September 2025 paper introduces Inverse IFEval, a novel benchmark designed to evaluate Large Language Models (LLMs) for their Counter-intuiti...

The Rise of Physical Neural Networks

04 Sep 2025

Contributed by Lukas

This June 2024 paper examines the current state and future potential of Physical Neural Networks (PNNs), which are AI systems implemented directly in ...

FastVLM: Efficient Vision Encoding for Language Models

04 Sep 2025

Contributed by Lukas

This May 2025 paper introduces FastVLM, an innovative approach designed to enhance the efficiency of Vision Language Models (VLMs). The authors explai...

Apertus Tech Report Overview

04 Sep 2025

Contributed by Lukas

This paper introduces Apertus, a large language model developed by the Swiss AI Initiative, a partnership between ETH Zurich and EPFL. The GitHub re...

Supervised Learning in DNA Neural Networks

04 Sep 2025

Contributed by Lukas

This September 2025 paper article from Nature, authored by Kevin M. Cherry and Lulu Qian, introduces a novel DNA-based neural network capable of supe...

FusionANNS: Billion-Scale ANNS with SSD and GPU

03 Sep 2025

Contributed by Lukas

This September 2024 paper introduces FusionANNS, a novel system designed to improve Approximate Nearest Neighbor Search (ANNS) for extremely large...

rStar2-Agent: Smarter Math Reasoning Through Agentic RL

03 Sep 2025

Contributed by Lukas

This August 2025 paper introduces rStar2-Agent, a 14B math reasoning model developed by Microsoft Research that achieves state-of-the-art performanc...

Scientific LLMs: A Data-Centric Survey and Roadmap

03 Sep 2025

Contributed by Lukas

This August 2025 paper offers an extensive overview of the evolution and application of Large Language Models (LLMs) and Multimodal Large Language Mo...

Pimba: Processing-in-Memory for LLM Serving

27 Aug 2025

Contributed by Lukas

This August 2025 paper introduces Pimba, a novel Processing-in-Memory (PIM) accelerator designed to enhance the efficiency of Large Language Model...

Oaken: Fast, Efficient LLM Serving with Hybrid KV Cache Quantization

27 Aug 2025

Contributed by Lukas

This August 2025 paper introduces Oaken, a novel acceleration solution for serving Large Language Models (LLMs) that addresses the significant challe...

AdamW: Decoupled Weight Decay Regularization for Adaptive Gradient Algorithms

27 Aug 2025

Contributed by Lukas

This January 2019 academic paper addresses the common issue of poor generalization in adaptive gradient optimization methods like Adam, compared to t...

Training Recurrent Neural Networks: Vanishing and Exploding Gradients

27 Aug 2025

Contributed by Lukas

This academic paper addresses the inherent challenges in training Recurrent Neural Networks (RNNs), specifically the vanishing and exploding gradie...

Adafactor: Memory-Efficient Adaptive Learning Rates

27 Aug 2025

Contributed by Lukas

This April 2018 paper introduces Adafactor, a novel optimization method designed to reduce the memory footprint of adaptive learning rate algorithms ...

SPAM: Stabilizing LLM Training with Spike-Aware Optimization

27 Aug 2025

Contributed by Lukas

This February 2025 research addresses the critical issue of training instability in Large Language Models (LLMs), which often stems from sudden, mass...

Google: Measuring AI's Environmental Impact at Scale

26 Aug 2025

Contributed by Lukas

This August 2025 paper presents Google's comprehensive methodology for measuring the environmental impact of AI inference workloads in a large-sca...

ComoRAG: Cognitively Inspired Narrative Reasoning

26 Aug 2025

Contributed by Lukas

This August 2025 paper introduces ComoRAG, a novel framework designed to enhance long-context narrative comprehension in Large Language Models (LLM...

Quantizing Diffusion LLMs: A Systematic Study

26 Aug 2025

Contributed by Lukas

This August 2025 academic paper explores the application of post-training quantization (PTQ) to diffusion large language models (dLLMs), a promising a...

ODYSSEY: Unified Mobile Manipulation for Agile Quadruped Robots

26 Aug 2025

Contributed by Lukas

This August 2025 paper introduces ODYSSEY, a comprehensive framework for open-world mobile manipulation that integrates robotic mobility, manipulat...

GPT-5 Spatial Intelligence: An Empirical Study

24 Aug 2025

Contributed by Lukas

This August 2025 academic paper, titled "Has GPT-5 Achieved Spatial Intelligence? An Empirical Study," examines the spatial understanding and reasonin...

DeepSeek-V3.1: A Hybrid AI Model with Enhanced Reasoning

23 Aug 2025

Contributed by Lukas

This is a review of DeepSeek's latest release announced on Hugging Face on August 21, 2025. The source introduces DeepSeek-V3.1, a hybrid large langua...

Compressed Experts: Efficient MoE Model Editing

23 Aug 2025

Contributed by Lukas

This March 2025 paper introduces compressed experts, an innovative method to enhance the efficiency of Mixture-of-Experts (MoE) models by reducing ...

Genie 3: A New Frontier for World Models

22 Aug 2025

Contributed by Lukas

The source provides an overview of Google DeepMind's AI research and models, highlighting various applications across different scientific disciplines...

Los Alamos: overcoming the memory wall fighting sparse memory access

21 Aug 2025

Contributed by Lukas

We review Los Alamos National Laboratory advancements in managing indirect memory accesses in high-performance computing and it's relationship to over...

Switch Transformers: Trillion Parameter Models with Sparsity

20 Aug 2025

Contributed by Lukas

This June 2022 paper introduces Switch Transformers, a novel architecture designed to enhance the efficiency and scalability of large-scale language m...

Linear Transformers: Faster Than RNNs

20 Aug 2025

Contributed by Lukas

This August 2020 paper introduces linear transformers, a novel approach to addressing the computational and memory inefficiencies of traditional tr...

Speed Always Wins: Efficient Large Language Model Architectures

20 Aug 2025

Contributed by Lukas

This August 2025 survey paper explores efficient architectures for large language models (LLMs), addressing the computational challenges of models li...

Atom: Low-Bit Quantization for LLM Serving

18 Aug 2025

Contributed by Lukas

This April 2024 paper introduces Atom, a novel low-bit quantization method designed to enhance the efficiency and accuracy of Large Language Model (...

Continuous Batching for LLM Inference: Throughput and Latency Gains

18 Aug 2025

Contributed by Lukas

The source analyzes Large Language Model (LLM) inference, specifically focusing on how continuous batching significantly improves efficiency compar...

Self-Search Reinforcement Learning for LLMs

18 Aug 2025

Contributed by Lukas

This August 2025 paper introduces Self-Search Reinforcement Learning (SSRL), a novel method that enables Large Language Models (LLMs) to access and...

Diffusion Language Models: Principles, Techniques, and Applications

18 Aug 2025

Contributed by Lukas

This August 2025 paper offers a comprehensive overview of diffusion language models (DLMs), contrasting them with traditional autoregressive (AR) and ...

The Mapped Memory Mistake: Why DBMSs Should Avoid MMAP

13 Aug 2025

Contributed by Lukas

This 2022 paper is a reminder of issues with mmap() for databases. Yet many Vector Databases today rely on mmap().This academic paper critically evalu...

NVIDIA GDS, BAM Vs RocM solutions

13 Aug 2025

Contributed by Lukas

This is a huge review of 13 different sources on advancements in GPU-accelerated computing, focusing on data access, memory management, and performa...

pNFS Flex Files

13 Aug 2025

Contributed by Lukas

This reviews the IETF Parallel Network File System (pNFS), an extension to NFS that separates file metadata from data storage. Specifically, "RFC 843...

ELMo-Tune-V2: LLM-Assisted Auto-Tuning for Key-Value Stores

13 Aug 2025

Contributed by Lukas

This February 2025 paper introduces ELMo-Tune-V2, a novel framework that leverages Large Language Models (LLMs) to fully automate the optimization ...

fMoE: Fine-Grained Expert Offloading for MoE Serving

13 Aug 2025

Contributed by Lukas

This February 2025 paper introduces fMoE, a novel fine-grained expert offloading system designed to optimize the serving efficiency of Mixture-of-Expe...

AiSAQ: DRAM-free ANNS with Product Quantization

13 Aug 2025

Contributed by Lukas

This paper February 2025 paper introduces AiSAQ (All-in-Storage ANNS with Product Quantization), a novel method designed for Approximate Nearest Nei...

Scaling PostgreSQL at OpenAI: Read-Heavy Workloads and Optimizations - PGConf.dev 2025

13 Aug 2025

Contributed by Lukas

A video transcript is used to review the PostgreSQL Development Conference (PGConf.dev 2025) presentation titled "Scaling Postgres to the Next Level ...

NVMe Offload on Colossal AI: Breaking the GPU Memory Wall

13 Aug 2025

Contributed by Lukas

We review Colossal-AI's NVMe offload functionality, designed to overcome GPU memory limitations when training large-scale models by transferring optim...

Mem0: Scalable Long-Term Memory for AI Agents

12 Aug 2025

Contributed by Lukas

The provided source introduces Mem0 and Mem0g, two novel memory architectures designed to enhance Large Language Models (LLMs) by overcoming their inh...

Qwen-Image: Generation and Editing with Precision

12 Aug 2025

Contributed by Lukas

This academic paper introduces Qwen-Image, an open-source model designed for generating high-quality images from text. It details the multi-stage data...

Chain-of-Thought Reasoning: A Brittle Mirage?

11 Aug 2025

Contributed by Lukas

This August 2025 paper from Arizona State University's Data Mining and Machine Learning Lab investigates whether Chain-of-Thought (CoT) reasoning in ...

DroidSpeak: Cross-LLM KV Cache Sharing

08 Aug 2025

Contributed by Lukas

The provided text introduces DroidSpeak, a novel distributed Large Language Model (LLM) inference system designed to enhance the efficiency of compoun...

Dynamic Tanh: Transformers Without Normalization

08 Aug 2025

Contributed by Lukas

The paper introduces Dynamic Tanh (DyT), a novel element-wise operation designed to replace normalization layers in Transformer models. Traditionally,...

Movement Pruning: Adaptive Sparsity by Fine-Tuning

08 Aug 2025

Contributed by Lukas

This academic paper introduces movement pruning, a novel method for reducing the size of large pre-trained language models like BERT during fine-tunin...

Kaiming Initialization and PReLU

08 Aug 2025

Contributed by Lukas

This academic paper explores rectified activation units (rectifiers) in neural networks, which are crucial for advanced image classification. The auth...

Xavier Initialization: Deep Feedforward Networks: Training Difficulties and Solutions

08 Aug 2025

Contributed by Lukas

This document explores the challenges associated with training deep feedforward neural networks, specifically investigating why standard gradient desc...

MEGABYTE: Multiscale Transformers for Million-byte Sequences

08 Aug 2025

Contributed by Lukas

The research paper introduces MEGABYTE, a novel multi-scale transformer architecture designed to efficiently process exceptionally long sequences, exc...

Gemma: Google DeepMind's Open Language Models

08 Aug 2025

Contributed by Lukas

These sources collectively introduce and explain MedGemma and MedSigLIP, two collections of open-source AI models developed by Google Health for healt...

The Elements of Differentiable Programming

08 Aug 2025

Contributed by Lukas

This document provides a comprehensive overview of differentiable programming, a paradigm enabling gradient-based optimization of computer programs, e...

DiMSUM: Image Generation with Diffusion Mamba

08 Aug 2025

Contributed by Lukas

This academic paper introduces DiMSUM, a novel architecture for image generation that enhances diffusion models by integrating both spatial and freque...

LMCache: Supercharging LLM Performance with KV Cache Management

08 Aug 2025

Contributed by Lukas

The provided texts discuss LMCache, an open-source library designed to enhance the efficiency of large language models (LLMs) by optimizing Key-Value ...

AI and the Memory Wall: Overcoming Bottlenecks

08 Aug 2025

Contributed by Lukas

The provided text, titled "AI and Memory Wall," examines the growing disparity between computational power and memory bandwidth in AI, particularly fo...

DyNN-Offload: Efficient Memory for Dynamic Neural Networks

08 Aug 2025

Contributed by Lukas

This document introduces DyNN-Offload, a novel memory management system designed to overcome the GPU memory limitations faced when training large dyna...

TierTrain: Proactive Memory Tiering for DNN Training

08 Aug 2025

Contributed by Lukas

The provided text describes TierTrain: Proactive Memory Tiering for CPU-Based DNN Training, a paper presented at the International Symposium on Memory...

MoE Offloaded

08 Aug 2025

Contributed by Lukas

The sources discuss Mixture-of-Experts (MoE) models, a type of neural network that selectively activates different parameters for incoming data, offer...

CODEGEN: Open Language Model for Code Synthesis

08 Aug 2025

Contributed by Lukas

This source introduces CODEGEN, a family of large language models developed by Salesforce Research, designed for program synthesis. The models, varyin...

DeepSeekMoE: Scalable Mixture-of-Experts Language Models

08 Aug 2025

Contributed by Lukas

The provided text introduces DeepSeekMoE, an innovative Mixture-of-Experts (MoE) architecture designed to enhance expert specialization in large langu...

DeepSeek-R1 Dynamic 1.58-bit Quantization: A Performance Analysis

08 Aug 2025

Contributed by Lukas

This reviews a document dated January 27, 2025, from Daniel and Michael at Unsloth, details their work on quantizing DeepSeek-R1's 671B parameter mod...

DeepSeek Safety Concerns

08 Aug 2025

Contributed by Lukas

This research paper focuses on a safety evaluation of DeepSeek-R1 and DeepSeek-V3 models within Chinese language contexts, an area previously underexp...

DeepSeek-V3: A Technical Report

08 Aug 2025

Contributed by Lukas

This paper introduces DeepSeek-V3, a large Mixture-of-Experts (MoE) model designed to advance open-source language model capabilities with improve...

DeepSeek-R1: Incentivizing Reasoning in LLMs

08 Aug 2025

Contributed by Lukas

This paper introduces DeepSeek-R1, a new suite of large language models developed by DeepSeek-AI, focusing on enhancing reasoning capabilities through...

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

08 Aug 2025

Contributed by Lukas

Nine different sources on Mamba are reviewed, including the paper that introduced it.The provided sources explore Mamba, a linear recurrent neural net...

Demystifying Mamba: Architecture and Capabilities

08 Aug 2025

Contributed by Lukas

This document explores the Mamba architecture, a novel approach to sequence modeling that offers an efficient alternative to Transformers. It primaril...

MetaScale: Test-Time Scaling with Evolving Meta-Thoughts

08 Aug 2025

Contributed by Lukas

The source introduces MetaScale, a novel framework designed to enhance Large Language Models' (LLMs) complex reasoning capabilities during inference. ...

Test-Time Scaling

08 Aug 2025

Contributed by Lukas

The provided sources discuss advancements in large language models (LLMs), specifically focusing on test-time compute scaling to enhance reasoning per...

Chain of thought

08 Aug 2025

Contributed by Lukas

This reviews two papers on Chain of Thought:1) https://arxiv.org/pdf/2201.11903 - Chain-of-Thought Prompting Elicits Reasoning in Large Language Model...

LoRA: Low-Rank Adaptation of Large Language Models

08 Aug 2025

Contributed by Lukas

This reviews the paper which introduces Low-Rank Adaptation (LoRA), a novel method designed to efficiently adapt large language models for specific...

Reinforcement Learning

08 Aug 2025

Contributed by Lukas

This reviews the public second edition book by Richard Sutton and Andrew Barton on "Reinforcement learning".This document serves as an expanded second...

Concept Drift

08 Aug 2025

Contributed by Lukas

Five different sources are reviewed to understand Concept Drift in neural networks.1) https://www.nature.com/articles/s41467-024-46142-w - Empirical d...

Multi Query Attention: PaLM: Scaling Language Modeling with Pathways

08 Aug 2025

Contributed by Lukas

67 authors were involved in this research!This source is an academic paper titled "PaLM: Scaling Language Modeling with Pathways," authored by Aakanks...

Reinforcement Pre-Training for Language Models

08 Aug 2025

Contributed by Lukas

The source introduces Reinforcement Pre-Training (RPT), a novel approach that redefines next-token prediction in large language models (LLMs) as a ver...

Multiagent Debate Improves Language Model Reasoning

08 Aug 2025

Contributed by Lukas

This paper introduces a multi-agent debate framework designed to enhance the factuality and reasoning capabilities of large language models (LLMs). Th...

KVQuant: LLM Inference with KV Cache Quantization

08 Aug 2025

Contributed by Lukas

Three research papers are reviewed:1) https://arxiv.org/pdf/2401.18079 - KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quanti...

Activity Overview

Episodes

Parallel-R1: Reinforcement Learning for Parallel Thinking in LLMs

Explaining AI for Digital Advertising with LLMs

AdLlama: Boosting Ad Performance with Reinforcement Learning

ByteCheckpoint: A Unified LLM Checkpointing System

Darling: Reinforcing Diversity and Quality in Language Models

INF2: Near-Storage LLM Inference for High Throughput

K2-Think: A Parameter-Efficient Reasoning System

AlphaEvolve: AI for Scientific and Algorithmic Discovery

BLEU: Automatic Machine Translation Evaluation

Mini-o3: Scaling Reasoning for Visual Search

Masked Diffusion Models: Performance and Theory

TraceRL: Reinforcement Learning for Diffusion Language Models

LLM Benchmark Robustness to Linguistic Variation

Behavioral Fingerprinting of Large Language Models

Offloading LLM Models and KV Caches to NVMe SSDs

GPT-NeoX: Large-Scale Autoregressive Language Modeling in PyTorch

SGLang: Efficient Language Model Program Execution

Eleuther: evaluating LLMs

OpenELM: Apple's Open Language Model Family

FineVision: Open Data for Computer Vision

Evaluating Large Language Models Trained on Code

Democratizing AI Compute: The Modular Vision

Limitations of Embedding-Based Retrieval

SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence

EmbeddingGemma: On-Device AI for High-Quality Embeddings

MTEB & MMTEB: The Massive Text Embedding Benchmark

DeepResearch Arena: Benchmarking LLMs' Research Abilities

Inverse IFEval: Unlearning LLM Cognitive Inertia

The Rise of Physical Neural Networks

FastVLM: Efficient Vision Encoding for Language Models

Apertus Tech Report Overview

Supervised Learning in DNA Neural Networks

FusionANNS: Billion-Scale ANNS with SSD and GPU

rStar2-Agent: Smarter Math Reasoning Through Agentic RL

Scientific LLMs: A Data-Centric Survey and Roadmap

Pimba: Processing-in-Memory for LLM Serving

Oaken: Fast, Efficient LLM Serving with Hybrid KV Cache Quantization

AdamW: Decoupled Weight Decay Regularization for Adaptive Gradient Algorithms

Training Recurrent Neural Networks: Vanishing and Exploding Gradients

Adafactor: Memory-Efficient Adaptive Learning Rates

SPAM: Stabilizing LLM Training with Spike-Aware Optimization

Google: Measuring AI's Environmental Impact at Scale

ComoRAG: Cognitively Inspired Narrative Reasoning

Quantizing Diffusion LLMs: A Systematic Study

ODYSSEY: Unified Mobile Manipulation for Agile Quadruped Robots

GPT-5 Spatial Intelligence: An Empirical Study

DeepSeek-V3.1: A Hybrid AI Model with Enhanced Reasoning

Compressed Experts: Efficient MoE Model Editing

Genie 3: A New Frontier for World Models

Los Alamos: overcoming the memory wall fighting sparse memory access

Switch Transformers: Trillion Parameter Models with Sparsity

Linear Transformers: Faster Than RNNs

Speed Always Wins: Efficient Large Language Model Architectures

Atom: Low-Bit Quantization for LLM Serving

Continuous Batching for LLM Inference: Throughput and Latency Gains

Self-Search Reinforcement Learning for LLMs

Diffusion Language Models: Principles, Techniques, and Applications

The Mapped Memory Mistake: Why DBMSs Should Avoid MMAP

NVIDIA GDS, BAM Vs RocM solutions

pNFS Flex Files

ELMo-Tune-V2: LLM-Assisted Auto-Tuning for Key-Value Stores

fMoE: Fine-Grained Expert Offloading for MoE Serving

AiSAQ: DRAM-free ANNS with Product Quantization

Scaling PostgreSQL at OpenAI: Read-Heavy Workloads and Optimizations - PGConf.dev 2025

NVMe Offload on Colossal AI: Breaking the GPU Memory Wall

Mem0: Scalable Long-Term Memory for AI Agents

Qwen-Image: Generation and Editing with Precision

Chain-of-Thought Reasoning: A Brittle Mirage?

DroidSpeak: Cross-LLM KV Cache Sharing

Dynamic Tanh: Transformers Without Normalization

Movement Pruning: Adaptive Sparsity by Fine-Tuning

Kaiming Initialization and PReLU

Xavier Initialization: Deep Feedforward Networks: Training Difficulties and Solutions

MEGABYTE: Multiscale Transformers for Million-byte Sequences

Gemma: Google DeepMind's Open Language Models

The Elements of Differentiable Programming

DiMSUM: Image Generation with Diffusion Mamba