Skip to content
GitLab
Explore
Sign in
Tags
Tags give the ability to mark specific points in history as being important
b2122
85910c5b
·
main : ctrl+C print timing in non-interactive mode (#3873)
·
Feb 11, 2024
b2121
139b62a8
·
common : fix compile warning
·
Feb 11, 2024
b2119
a07d0fee
·
ggml : add mmla kernels for quantized GEMM (#4966)
·
Feb 11, 2024
b2118
e4640d8f
·
lookup: add print for drafting performance (#5450)
·
Feb 11, 2024
b2117
907e08c1
·
server : add llama2 chat template (#5425)
·
Feb 11, 2024
b2116
f026f812
·
metal : use autoreleasepool to avoid memory leaks (#5437)
·
Feb 10, 2024
b2114
43b65f5e
·
sync : ggml
·
Feb 10, 2024
b2110
7c777fcd
·
server : fix prompt caching for repeated prompts (#5420)
·
Feb 09, 2024
b2109
e5ca3937
·
llama : do not cap thread count when MoE on CPU (#5419)
·
Feb 09, 2024
b2107
b2f87cb6
·
ggml : fix `error C2078: too many initializers` for MSVC ARM64 (#5404)
·
Feb 09, 2024
b2106
44fbe343
·
Fix Vulkan crash on APUs with very little device memory (#5424)
·
Feb 09, 2024
b2105
8e6a9d2d
·
CUDA: more warps for mmvq on NVIDIA (#5394)
·
Feb 08, 2024
b2104
41f308f5
·
llama : do not print "offloading layers" message in CPU-only builds (#5416)
·
Feb 08, 2024
b2103
6e99f2a0
·
Fix f16_sycl cpy call from Arc (#5411)
·
Feb 08, 2024
b2101
b7b74cef
·
fix trailing whitespace (#5407)
·
Feb 08, 2024
b2100
4aa43fab
·
llama : fix MiniCPM (#5392)
·
Feb 08, 2024
b2098
26d4efd1
·
sampling: fix top_k <= 0 (#5388)
·
Feb 08, 2024
b2096
c4fbb671
·
CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (#5393)
·
Feb 07, 2024
b2093
aa7ab99b
·
CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (#5386)
·
Feb 07, 2024
b2091
0ef46da6
·
llava-cli : always tokenize special tokens (#5382)
·
Feb 07, 2024
Prev
1
…
26
27
28
29
30
31
32
33
34
…
99
Next