Skip to content
GitLab
Explore
Sign in
Tags
Tags give the ability to mark specific points in history as being important
b2282
cb49e0f8
·
Attempt to fix android build (#5752)
·
Feb 27, 2024
b2281
0becb22a
·
IQ4_XS: a 4.25 bpw quantization (#5747)
·
Feb 27, 2024
b2280
c24a2a6e
·
cuda : replace remaining shfl_xor with calls to warp_reduce functions (#5744)
·
Feb 27, 2024
b2279
1f30b7a9
·
ggml-quants : fix avx2 iq1_s vec_dot when compiled with gcc (#5742)
·
Feb 27, 2024
b2278
9d533a77
·
llama : fix defrag bugs + add parameter (#5735)
·
Feb 27, 2024
b2277
cbbd1efa
·
Makefile: use variables for cublas (#5689)
·
Feb 27, 2024
b2276
b11a93df
·
fix server hangs on empty prompt (#5733)
·
Feb 26, 2024
b2275
a33e6a0d
·
Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range (#5721)
·
Feb 26, 2024
b2274
47bb7b48
·
CUDA: fix DEBUG_CUDA_MALLOC (#5729)
·
Feb 26, 2024
b2272
e849078c
·
[SYCL] Add support for soft_max ALiBi (#5639)
·
Feb 26, 2024
b2271
67fd3313
·
unicode : reuse iterator (#5726)
·
Feb 26, 2024
b2270
4804215c
·
server: CI fix trailing space (#5728)
·
Feb 26, 2024
b2269
8a533f0d
·
server: CI tests reduce build matrix (#5725)
·
Feb 26, 2024
b2268
269de86b
·
llama : fix Gemma rope type (#5691)
·
Feb 26, 2024
b2266
e3965cf3
·
server: tests - slow inference causes timeout on the CI (#5715)
·
Feb 25, 2024
b2264
bf08e006
·
llama : refactor k-shift implementation + KV defragmentation (#5691)
·
Feb 25, 2024
b2263
f7625019
·
server : fix crash when system prompt is bigger than batch size (#5714)
·
Feb 25, 2024
b2262
abbabc5e
·
ggml-quants : provide ggml_vqtbl1q_u8 for 64bit compatibility (#5711)
·
Feb 25, 2024
b2261
f1a98c52
·
make : fix nvcc version is empty (#5713)
·
Feb 25, 2024
b2259
930b1780
·
server: logs - unified format and --log-format option (#5700)
·
Feb 25, 2024
Prev
1
…
21
22
23
24
25
26
27
28
29
…
99
Next