troed

joined 3 years ago
[–] troed@fedia.io 2 points 6 days ago (1 children)

15t/s is workable IMHO. What's your system specs? I have 96GB DDR5 but never thought about going to an ever higher MoE.

 

Maybe it was just me, but in case others have done the same this post might help someone else too.

I have a workstation with plenty of CPU and system RAM, but I'm "GPU poor" in that I only have a 5060Ti with its 16GB of VRAM. Additionally, I need to use the GPU for regular system activities too which means I only have around ~14GB of VRAM available for the LLM.

I'm exclusively using this setup for development and system management tasks, and I've found Qwen 3.6 35B A3B to excel compared to other models. I don't have the VRAM to run the 27GB dense model, so I've spent time on getting the best usage out of the MoE.

Or so I thought. Since "everyone" says to use Unsloth UD-Q4_K_XL that's the quant I've been using, and I've gone a bit back'n'forth with MTP/no MTP, UB increase, mmproj since I've also started using a browser MCP etc.

Today I took another look at their quant chart and thought that since it's MoE maybe I could run Q5_K_S which would be a step up?

Well. Now I'm using Q6_K because it turns out I could run that with the exact same settings as I've optimized my Q4_K_XL setup for which means there are no drawbacks - just a better performing model. I've already noticed how it's able to get out of loops while before I had to interrupt it sometimes.

This is my setup. I get >1000 t/s prefill and >20 t/s inference. I'm not chasing faster inference since I actively read the thought process when working the LLM - but I've increased ub to get faster prefill since that's just waiting time otherwise.

./llama-server
    -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q6_K \
    -c 160000 \
    -n 32768 \
    -fa on \
    -ub 2048 \
    -ctk q8_0 \
    -ctv q8_0 \
    --no-mmap \
    --mlock \
    --no-warmup \
    --chat-template-kwargs '{"preserve_thinking": true}' \
    --temp 0.6 \
    --top-p 0.95 \
    --top-k 20 \
    --min-p 0.0 \
    --presence-penalty 0.0 \
    --repeat-penalty 1.0 \
    --host 0.0.0.0

I also use Opencode with the DCP and Superpowers plugins, which make a tremendous difference both to context handling as well as planning. I have no need for a larger context - I even compact early quite often since the tasks get done before reaching the limit.

[–] troed@fedia.io 18 points 2 weeks ago (3 children)

It's similar to being an assembler coder when higher level languages with compilers came. No need for management purging, you'll simply be competing for a smaller segment of assigments.

I don't know of a single developer that has actually used LLM aids say there's no benefit to them. Those that refuse do so for some other convictions and don't really know the difference between LLM aiding in tasks and full on yolo vibe coding.

[–] troed@fedia.io -3 points 3 weeks ago

Which I've done, and not a single person has looked them up. The reason for that is that no one here is actually interested in the subject - they just cannot accept their feels about humans being special snowflakes not having any support in the science.

[–] troed@fedia.io -1 points 3 weeks ago

I've sourced two of the foremost specialists on the subject. Blackmore's "Consciousness: An Introduction" amounts to a full university semester on the subject. No, I don't really see it as my job to condense that down in a post here. Anyone who's actually interested can start with reading up summaries that are available freely online instead of posting bad takes at me.

[–] troed@fedia.io -2 points 3 weeks ago (1 children)

Oh I haven't seen a single person replying so far who has shown any interest in being "better informed".

[–] troed@fedia.io -2 points 3 weeks ago (3 children)

Yes, as I've described here: https://blog.troed.se/posts/the-delta-between-an-llm-and-consciousness/

I didn't say human brains function like LLMs

Today's LLMs are based on a Google research paper from 2017. Another published paper that would solve this was published by Google in december last year: https://aipapersacademy.com/nested-learning-hope/

[–] troed@fedia.io -1 points 3 weeks ago (2 children)

Not a single person who has commented is interested in an actual discussion regarding the science on consciousness. It's all this: https://blog.troed.se/posts/the-coming-cognitive-disbelief/

[–] troed@fedia.io -1 points 3 weeks ago (4 children)

I don't care. See how easy it is? Either you're interested in the subject and you would already know that what I wrote is completely uncontroversial, or you spend time making ignorant posts because a simple fact disagrees with your feels.

[–] troed@fedia.io -1 points 3 weeks ago (10 children)

The difference between you and me is that I've studied the subject. You have not. It's not on me to teach you the contents of the literature.

Go be annoying somewhere else.

[–] troed@fedia.io -1 points 3 weeks ago (13 children)

I don't really care much for what you think - I already sourced two well known experts on the subject in another post in this thread.

[–] troed@fedia.io -2 points 3 weeks ago (16 children)

Yes, the whole field.

[–] troed@fedia.io -2 points 3 weeks ago (6 children)

I recommend Susan Blackmore's "Consciousness: An Introduction", and of course Douglas Hofstadter's "Gödel, Escher and Bach" and the followup "I am a strange Loop".

I didn't say human brains function like LLMs. I said that everything we know about how human brains work indicates we're also just pattern matching machines in a loop.

The point is that the fact that LLMs are "next token predictors" doesn't in itself say anything about what the emergent effects of that can be.

 

74% of Ukrainians support fighting Russia even without U.S. assistance. A significant majority—59% of respondents—also believe that Ukraine can defeat Russia on the battlefield

only 6% of respondents said they were willing to make territorial concessions regarding areas occupied by Russia after the full-scale invasion in 2022

Additionally, 70% of respondents are against lowering the mobilization age,

Original article is paywalled, quotes from https://ukrainetoday.org/74-of-ukrainians-ready-to-resist-russia-without-u-s-aid-support-zelenskyys-actions/

 

We're consolidating our social media presence due to limited resources and no longer posting on Mastodon. Follow us on Reddit

Please tell us that you're not moving away from Lemmy/Mbin too. There's a gigantic tonedeafness to asking your supporters to use centralized social media at this specific time that's hard to accept you're not realizing.

(quote from Proton's mastodon.social account info - there wasn't even a post made about it)

 

Swedish author and famous pro-Ukraine blogger Lars Wilderäng (Cornucopia) reports today that the Swedish security expert Karl Emil Nikka has revealed that Kagi is using the Kremlin propaganda tool Yandex as a backend for searches.

Wilderäng speculates this might mean search terms are leaking to Russia, while others worry about how Kremlin thus can get their talking points into western search results.

Security expert Karl Emil Nikka tells us that the search engine Kagi, popular among tech geeks, uses Russian Yandex, which was introduced after the full-scale invasion. This, of course, gives Russia the opportunity to look at what is searched for via Kagi.

Link (in Swedish), see 11:22 update: https://cornucopia.se/2024/10/uppdateras-ryssland-medger-bruk-av-c-stridsmedel-mot-ukraina-rysk-pilot-som-mordade-68-ukrainare-ihjalslagen-med-hammare-bland-de-allra-storsta-ryska-forlusterna-under-kriget-igar/

view more: next ›