• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    16 days ago

    With sparse attention, very interesting. It seems GQA is a thing of the past.

    I especially love Deepseek’s ‘public research’ aspect: they trained this and Terminus the same way, so the attention schemes are (more-or-less) directly comparable. That’s awesome.

    GLM 4.6 is reportedly about to drop too. Which is great, as 4.5 is without a doubt my daily driver now.

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      1
      ·
      16 days ago

      Deepseek is only bad via the chat app, and whatever prefilter (or finetune?) they censor it with.

      The model itself (via API or run locally) isn’t too bad, especially with a system prompt or completion syntax to squash refusals. Obviously there are CCP mandated gaps (which you can just add in via context), but it’s not as tankie as you’d think.

      • cm0002@lemmy.worldOP
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        16 days ago

        Just ignore them on anything AI related, they are the polar opposite of the AI Tech Bros. Shitting on anything and everyone using AI in any form for anything