b1172
35195689 · 2x faster (rms) norm cuda kernels (3.7% e2e improvement) (#2985) · Sep 04, 2023