Skip to content
GitLab
Explore
Sign in
Tags
Tags give the ability to mark specific points in history as being important
b2090
ee1628bd
·
Basic Vulkan Multi-GPU implementation (#5321)
·
Feb 07, 2024
b2087
316c7faf
·
llama : add MiniCPM support (#5346)
·
Feb 07, 2024
b2086
f3e2b4fa
·
server : update `/props` with "total_slots" value (#5373)
·
Feb 07, 2024
b2084
213d1439
·
server : remove model.json endpoint (#5371)
·
Feb 06, 2024
b2083
17c97fb0
·
CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370)
·
Feb 06, 2024
b2081
f57fadc0
·
Slight quantization improvement for Q4_K and Q5_K (#5361)
·
Feb 06, 2024
b2079
2c516611
·
CUDA: mul_mat_vec_q for batch sizes > 1 (#5351)
·
Feb 06, 2024
b2078
8a79c591
·
server : include total "num_slots" in props endpoint (#5349)
·
Feb 06, 2024
b2077
31e79032
·
server : add `dynatemp_range` and `dynatemp_exponent` (#5352)
·
Feb 06, 2024
b2076
4ffc7a17
·
server : various fixes for the prompt field in /completion (#5300)
·
Feb 06, 2024
b2074
098f6d73
·
make: Use ccache for faster compilation (#5318)
·
Feb 05, 2024
b2072
c6b39553
·
ggml : make use of ggml-quants.h possible in C++ code (#5338)
·
Feb 05, 2024
b2071
abb61944
·
ggml : avoid duplicating function calls using MIN/MAX macros (#5325)
·
Feb 05, 2024
b2070
89503dcb
·
iq3_xxs: quards for the no-imatrix situation (#5334)
·
Feb 05, 2024
b2068
6fdfa2ec
·
iq2_xxs: tune quantization (#5320)
·
Feb 05, 2024
b2067
a2d60c91
·
server : allow to get default generation settings for completion (#5307)
·
Feb 05, 2024
b2066
e6f81775
·
common : add dynamic temperature parameters to main example cli (#5295)
·
Feb 05, 2024
b2062
4833ac20
·
[SYCL] Fix cpy with dims of 3 (#5289)
·
Feb 05, 2024
b2060
5ed26e1f
·
Adding some imatrix tools (#5302)
·
Feb 04, 2024
b2059
277fad30
·
cmake : use set() for LLAMA_WIN_VER (#5298)
·
Feb 03, 2024
Prev
1
…
27
28
29
30
31
32
33
34
35
…
99
Next