Skip to content
GitLab
Explore
Sign in
Tags
Tags give the ability to mark specific points in history as being important
b1868
a128c38d
·
Fix ffn_down quantization mix for MoE models (#4927)
·
Jan 14, 2024
b1867
5f5fe1bd
·
metal : correctly set SIMD support flags on iOS (#4923)
·
Jan 14, 2024
b1866
ac32902a
·
llama : support WinXP build with MinGW 8.1.0 (#3419)
·
Jan 14, 2024
b1865
147b17ac
·
2-bit quantizations (#4897)
·
Jan 14, 2024
b1864
807179ec
·
Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906)
·
Jan 14, 2024
b1862
c71d608c
·
ggml: cache sin/cos for RoPE (#4908)
·
Jan 13, 2024
b1861
4be5ef55
·
metal : remove old API (#4919)
·
Jan 13, 2024
b1860
0ea069b8
·
server : fix prompt caching with system prompt (#4914)
·
Jan 13, 2024
b1859
f172de03
·
llama : fix detokenization of non-special added-tokens (#4916)
·
Jan 13, 2024
b1858
2d57de52
·
metal : disable log for loaded kernels (#4794)
·
Jan 13, 2024
b1857
df845cc9
·
llama : minimize size used for state save/load (#4820)
·
Jan 13, 2024
b1856
6b48ed08
·
workflows: unbreak nix-build-aarch64, and split it out (#4915)
·
Jan 13, 2024
b1855
722d33f3
·
main : add parameter --no-display-prompt (#4541)
·
Jan 13, 2024
b1854
c30b1ef3
·
gguf : fix potential infinite for-loop (#4600)
·
Jan 13, 2024
b1853
b38b5e93
·
metal : refactor kernel loading code (#4794)
·
Jan 13, 2024
b1851
356327fe
·
server : fix deadlock that occurs in multi-prompt scenarios (#4905)
·
Jan 13, 2024
b1850
ee8243ad
·
server : fix crash with multimodal models without BOS token (#4904)
·
Jan 13, 2024
b1849
15ebe592
·
convert : update phi-2 to latest HF repo (#4903)
·
Jan 13, 2024
b1848
de473f5f
·
sync : ggml
·
Jan 12, 2024
b1844
3fe81781
·
CUDA: faster q8_0 -> f16 dequantization (#4895)
·
Jan 12, 2024
Prev
1
…
33
34
35
36
37
38
39
40
41
…
99
Next