Skip to content
GitLab
Explore
Sign in
Tags
Tags give the ability to mark specific points in history as being important
b2397
ecab1c75
·
cmake : fix subdir for `LLAMA_METAL_EMBED_LIBRARY` (#5985)
·
Mar 11, 2024
b2396
ee35600b
·
llama : fix F16/F32 downcast + improve names (#5980)
·
Mar 11, 2024
b2395
be858f62
·
Better 1.5 bit quantization (#5971)
·
Mar 11, 2024
b2394
ef3ced26
·
[SYCL] Add q3_s and q1_s (#5886)
·
Mar 11, 2024
b2393
3814a073
·
[SYCL] Add support for SYCL Nvidia target (#5738)
·
Mar 11, 2024
b2392
bb6d00bb
·
metal : move mm_id indices to shared mem (#5982)
·
Mar 10, 2024
b2391
7ab7b733
·
android : fix utf8 decoding error (#5935)
·
Mar 10, 2024
b2389
b838b53a
·
sync : ggml
·
Mar 10, 2024
b2387
bf47a5ee
·
ggml : remove __constant__ specifier for CUDA tables (#5940)
·
Mar 10, 2024
b2386
fa8a809a
·
server: ci: windows build and tests (#5968)
·
Mar 10, 2024
b2385
bcebd7db
·
llama : add support for GritLM (#5959)
·
Mar 10, 2024
b2384
2960eae8
·
grammar : verify parsed state (#5950)
·
Mar 10, 2024
b2382
621e86b3
·
server: benchmark: chat/completions scenario and other llm servers comparison (#5941)
·
Mar 09, 2024
b2381
77d1ac7e
·
server : print chat template info
·
Mar 09, 2024
b2380
d894f352
·
perplexity : support using multiple sequences to allow larger batch sizes (#5946)
·
Mar 09, 2024
b2378
8380ecfb
·
ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (#5951)
·
Mar 09, 2024
b2377
58308a0e
·
server : fix metrics init (#5964)
·
Mar 09, 2024
b2376
5b097973
·
ggml : remove old quantization functions (#5942)
·
Mar 09, 2024
b2374
fb215c38
·
server : normalize embeddings (#5956)
·
Mar 09, 2024
b2372
0db32bea
·
server : fix passing prompt as tokens (#5955)
·
Mar 09, 2024
Prev
1
…
17
18
19
20
21
22
23
24
25
…
99
Next