Posts
- August 3, 2025 T5 - The Transformer That Zigged When Others Zagged - An Architectural Deep Dive
- August 2, 2025 A Deep Dive into PPO for Language Models
- July 2, 2025 Mixture-of-Experts (MoE) Models Challenges & Solutions in Practice
- October 26, 2020 Some useful files