33
submitted 10 months ago by hedge@beehaw.org to c/technology@beehaw.org
you are viewing a single comment's thread
view the rest of the comments
[-] AceFuzzLord@lemm.ee 18 points 10 months ago* (last edited 10 months ago)

Calling the over glorified chatbots and LLMs like GPT or Claude AGI would be like me calling a preschool finger painting a master class work of art, from my understanding of them. Though, I can't say I'm anywhere near an expert, so definitely take what I say with a major grain of salt.

What these AI chatbots and LLMs can do is sometimes impressive, but that's all I can say about them. Intelligence is definitely not their strong suit when half of the time you'll ask for a summary of a well known and loved TV show only for it to just make up anything that sounds right.

[-] ConsciousCode@beehaw.org 5 points 10 months ago

LLMs are not chatbots, they're models. ChatGPT/Claude/Bard are chatbots which use LLMs as part of their implementation. I would argue in favor of the article because, while they aren't particularly intelligent, they are general-purpose and exhibit some level of intelligence and thus qualify as "general intelligence". Compare this to the opposite, an expert system like a chess computer. You can't even begin to ask a chess computer to explain what a SQL statement does, the question doesn't even make sense. But LLMs are capable of being applied to virtually any task which can be transcribed. Even if they aren't particularly good, compared to GPT-2 which read more like a markov chain they at least attempt to complete the task, and are often correct.

[-] jarfil@beehaw.org 4 points 10 months ago* (last edited 10 months ago)

LLMs are capable of being applied to virtually any task which can be transcribed

Where "transcribed" means using any set of tokens, be it extracted from human written languages, emojis, pieces of images, audio elements, spatial positions, or any other thing in existence that can be divided and represented by tokens.

PS: actually... why "in existence"? Why not throw in some "customizable tokens" into an LLM, for it to come up with whatever meaning it fancies for them?

[-] ConsciousCode@beehaw.org 1 points 10 months ago* (last edited 10 months ago)

There’s a lot of papers which propose adding new tokens to elicit some behavior or another, though I haven't seen them catch on for some reason. A new token would mean adding a new trainable static vector which would initially be something nonsensical, and you would want to retrain it on a comparably sized corpus. This is a bit speculative, but I think the introduction of a token totally orthogonal to the original (something like eg smell, which has no textual analog) would require compressing some of the dimensions to make room for that subspace, otherwise it would have a form of synesthesia, relating that token to the original neighboring subspaces. If it was just a new token still within the original domain though, you could get a good enough initial approximation by a linear combination of existing token embeddings - eg a monkey with a hat emoji comes out, you add tokens for monkey emoji + hat emoji, then finetune it.

Most extreme option, you could increase the embedding dimensionality so the original subspaces are unaffected and the new tokens can take up those new dimensions. This is extreme because it means resizing every matrix in the model, which even for smaller models would be many thousands of parameters, and the performance would tank until it got a lot more retraining.

[-] ConsciousCode@beehaw.org 1 points 10 months ago

There’s a lot of papers which propose adding new tokens to elicit some behavior or another, though I haven't seen them catch on for some reason. A new token would mean adding a new trainable static vector which would initially be something nonsensical, and you would want to retrain it on a comparably sized corpus. This is a bit speculative, but I think the introduction of a token totally orthogonal to the original (something like eg smell, which has no textual analog) would require compressing some of the dimensions to make room for that subspace, otherwise it would have a form of synesthesia, relating that token to the original neighboring subspaces. If it was just a new token still within the original domain though, you could get a good enough initial approximation by a linear combination of existing token embeddings - eg a monkey with a hat emoji comes out, you add tokens for monkey emoji + hat emoji, then finetune it.

Most extreme option, you could increase the embedding dimensionality so the original subspaces are unaffected and the new tokens can take up those new dimensions. This is extreme because it means resizing every matrix in the model, which even for smaller models would be many thousands of parameters, and the performance would tank until it got a lot more retraining.

(deleted original because I got token embeddings and the embedding dimensions mixed up, essentially assuming a new token would use the "extreme option").

this post was submitted on 12 Oct 2023
33 points (92.3% liked)

Technology

37443 readers
243 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS