Skip to content
GitLab
Explore
Sign in
Tags
Tags give the ability to mark specific points in history as being important
b2258
d52d7819
·
server: concurrency fix + monitoring - add /metrics prometheus compatible endpoint (#5708)
·
Feb 25, 2024
b2257
12894088
·
cmake : fix compilation for Android armeabi-v7a (#5702)
·
Feb 25, 2024
b2256
ab336a9d
·
code : normalize enum names (#5697)
·
Feb 25, 2024
b2254
9e359a4f
·
server: continue to update other slots on embedding concurrent request (#5699)
·
Feb 24, 2024
b2253
4c4cb307
·
IQ3_S: a much better alternative to Q3_K (#5676)
·
Feb 24, 2024
b2252
525213d2
·
server: init functional tests (#5566)
·
Feb 24, 2024
b2251
fd43d66f
·
server : add KV cache quantization options (#5684)
·
Feb 23, 2024
b2249
15499eb9
·
mpt : do not duplicate token_embd.weight on disk (#5670)
·
Feb 22, 2024
b2248
96633eec
·
gemma : use more bits for the token_embd.weight tensor (#5650)
·
Feb 22, 2024
b2247
847eedbd
·
py : add Gemma conversion from HF models (#5647)
·
Feb 22, 2024
b2246
7e4f339c
·
ggml : always define ggml_fp16_t as uint16_t (#5666)
·
Feb 22, 2024
b2245
334f76fa
·
sync : ggml
·
Feb 22, 2024
b2241
373ee3fb
·
Add Gemma chat template (#5665)
·
Feb 22, 2024
b2240
4cb4d8b2
·
workflows: nix: hardcode cachix ids, build unconditionally (#5663)
·
Feb 22, 2024
b2239
3a03541c
·
minor : fix trailing whitespace (#5638)
·
Feb 22, 2024
b2237
a46f5074
·
server : fallback to chatml, add AlphaMonarch chat template (#5628)
·
Feb 22, 2024
b2235
4ef245a9
·
mpt : add optional bias tensors (#5638)
·
Feb 22, 2024
b2234
973053d8
·
llama : fix loading models with shared tok_embd and output (#5651)
·
Feb 22, 2024
b2233
7c8bcc11
·
Add docs for llama_chat_apply_template (#5645)
·
Feb 22, 2024
b2232
7fe4678b
·
llama : fix session save/load with quantized KV (#5649)
·
Feb 21, 2024
Prev
1
…
22
23
24
25
26
27
28
29
30
…
99
Next