Exploring the gradient noise scaleA demonstration of critical batch size, and a tribute to open-access machine learning. Aug 28, 2025 Machine Learning
Multidimensional RoPENon-Axial Rotary Positional Embeddings in higher dimensions. Jul 31, 2025 Machine Learning, Transformers