b1795
8f900abf · CUDA: faster softmax via shared memory + fp16 math (#4742) · Jan 09, 2024