82
submitted 4 months ago* (last edited 4 months ago) by Ghostalmedia@lemmy.world to c/apple_enthusiast@lemmy.world
you are viewing a single comment's thread
view the rest of the comments
[-] chrash0@lemmy.world 10 points 4 months ago

you’d be surprised how fast a model can be if you narrow the scope, quantize, and target specific hardware, like the AI hardware features they’re announcing.

not a 1-1, but a quantized Mistral 7B runs at ~35 tokens/sec on my M2. that’s not even as optimized as it could be. it can write simple scripts and do some decent writing prompts.

they could get really narrow in scope (super simple RAG, limited responses, etc), quantize down to even something like 4 bit, and run it on custom accelerated hardware. it doesn’t have to reproduce Shakespeare, but i can imagine a PoC that runs circles around Siri in semantic understanding and generated responses. being able to reach out on Slack to the engineers that built the NPU stack ain’t bad neither.

this post was submitted on 16 Apr 2024
82 points (95.6% liked)

Apple

17108 readers
21 users here now

Welcome

to the largest Apple community on Lemmy. This is the place where we talk about everything Apple, from iOS to the exciting upcoming Apple Vision Pro. Feel free to join the discussion!

Rules:
  1. No NSFW Content
  2. No Hate Speech or Personal Attacks
  3. No Ads / Spamming
    Self promotion is only allowed in the pinned monthly thread

Lemmy Code of Conduct

Communities of Interest:

Apple Hardware
Apple TV
Apple Watch
iPad
iPhone
Mac
Vintage Apple

Apple Software
iOS
iPadOS
macOS
tvOS
watchOS
Shortcuts
Xcode

Community banner courtesy of u/Antsomnia.

founded 1 year ago
MODERATORS