Tags

Tags give the ability to mark specific points in history as being important

b1868

a128c38d · Fix ffn_down quantization mix for MoE models (#4927) · Jan 14, 2024
b1867

5f5fe1bd · metal : correctly set SIMD support flags on iOS (#4923) · Jan 14, 2024
b1866

ac32902a · llama : support WinXP build with MinGW 8.1.0 (#3419) · Jan 14, 2024
b1865

147b17ac · 2-bit quantizations (#4897) · Jan 14, 2024
b1864

807179ec · Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906) · Jan 14, 2024
b1862

c71d608c · ggml: cache sin/cos for RoPE (#4908) · Jan 13, 2024
b1861

4be5ef55 · metal : remove old API (#4919) · Jan 13, 2024
b1860

0ea069b8 · server : fix prompt caching with system prompt (#4914) · Jan 13, 2024
b1859

f172de03 · llama : fix detokenization of non-special added-tokens (#4916) · Jan 13, 2024
b1858

2d57de52 · metal : disable log for loaded kernels (#4794) · Jan 13, 2024
b1857

df845cc9 · llama : minimize size used for state save/load (#4820) · Jan 13, 2024
b1856

6b48ed08 · workflows: unbreak nix-build-aarch64, and split it out (#4915) · Jan 13, 2024
b1855

722d33f3 · main : add parameter --no-display-prompt (#4541) · Jan 13, 2024
b1854

c30b1ef3 · gguf : fix potential infinite for-loop (#4600) · Jan 13, 2024
b1853

b38b5e93 · metal : refactor kernel loading code (#4794) · Jan 13, 2024
b1851

356327fe · server : fix deadlock that occurs in multi-prompt scenarios (#4905) · Jan 13, 2024
b1850

ee8243ad · server : fix crash with multimodal models without BOS token (#4904) · Jan 13, 2024
b1849

15ebe592 · convert : update phi-2 to latest HF repo (#4903) · Jan 13, 2024
b1848

de473f5f · sync : ggml · Jan 12, 2024
b1844

3fe81781 · CUDA: faster q8_0 -> f16 dequantization (#4895) · Jan 12, 2024

Previous
1
…
33
34
35
36
37
38
39
40
41
…
99
Next