Links.
- Note: Local archival copyA 2025 review of large language models, from DeepSeek R1 and RLVR to inference-time scaling, benchmarks, architectures, and predictions for 2026.
- Note: local archival copy. updated 2025-12-20From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design
- Note: Local archival copyYouβre staring at perf top showing 60% CPU time in pthread_mutex_lock. Your latency is in the toilet. Someone suggests βjust use a spinlockβ and suddenly your 16-core server is pegged at 100% doing nothing useful. This is the synchronization primitive trap, and most engineers step right into it because nobody explains when each primitive actually makes sense.
- Note: beyond standard llmsUnderstanding GRPO and New Insights from Reasoning Model Papers
- Note: beyond standard llmsLinear Attention Hybrids, Text Diffusion, Code World Models, and Small Recursive Transformers
- Note: beyond standard llmsAnd How They Stack Up Against Qwen3
- Note: thorough article on deepseek archUnderstanding How DeepSeek's Flagship Open-Weight Models Evolved