Tags

Tags give the ability to mark specific points in history as being important

b2122

85910c5b · main : ctrl+C print timing in non-interactive mode (#3873) · Feb 11, 2024
b2121

139b62a8 · common : fix compile warning · Feb 11, 2024
b2119

a07d0fee · ggml : add mmla kernels for quantized GEMM (#4966) · Feb 11, 2024
b2118

e4640d8f · lookup: add print for drafting performance (#5450) · Feb 11, 2024
b2117

907e08c1 · server : add llama2 chat template (#5425) · Feb 11, 2024
b2116

f026f812 · metal : use autoreleasepool to avoid memory leaks (#5437) · Feb 10, 2024
b2114

43b65f5e · sync : ggml · Feb 10, 2024
b2110

7c777fcd · server : fix prompt caching for repeated prompts (#5420) · Feb 09, 2024
b2109

e5ca3937 · llama : do not cap thread count when MoE on CPU (#5419) · Feb 09, 2024
b2107

b2f87cb6 · ggml : fix `error C2078: too many initializers` for MSVC ARM64 (#5404) · Feb 09, 2024
b2106

44fbe343 · Fix Vulkan crash on APUs with very little device memory (#5424) · Feb 09, 2024
b2105

8e6a9d2d · CUDA: more warps for mmvq on NVIDIA (#5394) · Feb 08, 2024
b2104

41f308f5 · llama : do not print "offloading layers" message in CPU-only builds (#5416) · Feb 08, 2024
b2103

6e99f2a0 · Fix f16_sycl cpy call from Arc (#5411) · Feb 08, 2024
b2101

b7b74cef · fix trailing whitespace (#5407) · Feb 08, 2024
b2100

4aa43fab · llama : fix MiniCPM (#5392) · Feb 08, 2024
b2098

26d4efd1 · sampling: fix top_k <= 0 (#5388) · Feb 08, 2024
b2096

c4fbb671 · CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (#5393) · Feb 07, 2024
b2093

aa7ab99b · CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (#5386) · Feb 07, 2024
b2091

0ef46da6 · llava-cli : always tokenize special tokens (#5382) · Feb 07, 2024

Previous
1
…
26
27
28
29
30
31
32
33
34
…
99
Next