- cross-posted to:
- technology@lemmy.ml
- cross-posted to:
- technology@lemmy.ml
With sparse attention, very interesting. It seems GQA is a thing of the past.
I especially love Deepseek’s ‘public research’ aspect: they trained this and Terminus the same way, so the attention schemes are (more-or-less) directly comparable. That’s awesome.
GLM 4.6 is reportedly about to drop too. Which is great, as 4.5 is without a doubt my daily driver now.
New version of the propaganda machine dropped 🤦♂️
Deepseek is only bad via the chat app, and whatever prefilter (or finetune?) they censor it with.
The model itself (via API or run locally) isn’t too bad, especially with a system prompt or completion syntax to squash refusals. Obviously there are CCP mandated gaps (which you can just add in via context), but it’s not as tankie as you’d think.
Just ignore them on anything AI related, they are the polar opposite of the AI Tech Bros. Shitting on anything and everyone using AI in any form for anything
…Or are they an LLM? I mean, the handle is BroBot, and the emojii makes me suspicious, lol.
Yes now worship me and don’t forget to put glue in your pizza sauce.
That’s the biggest compliment any Ai simp could give me. Thank you 😘