18
Reddit is OpenAI’s Moat (www.cyberdemon.org)
submitted 1 year ago* (last edited 1 year ago) by southernwolf@pawb.social to c/tech@pawb.social

If this is true, then some things start to make a lot of sense. By making the user calls so incredibly expensive for the API, Reddit makes it so it would be come prohibitively expensive for basically anyone else to be able to access. Google could likely still afford it (but they would certainly pay a lot to do so), but an upstart that would be more likely to wreck OpenAI, like MidJourney did with DALL-E, becomes far less likely to be able to afford the cost.

It basically gives OpenAI padding between itself and upstart competitors where it matters most, training data. Further still, these changes might also explain some of the changes to the API, like the blocking of NSFW material. Doing that makes it easier for OpenAI to train on it, without needing to worry as much about filtering. It also explains the urgency of it too, as OpenAI is desperately seeking to keep upstarts, especially open-source ones, from being able to compete with them. It's why they are lobbying governments around the world to allow only them to be kings of AI. Rapidly closing off any new Reddit data, or access to old data for new upstarts, would explain why there was such short notice.

Now, does this completely stop the ability to train on Reddit data? No. Web scraping is certainly always an option, but that's a lot more computationally expensive on the front and back-end, the data will be very dirty (more computational work), and Reddit can combat this with de-indexing techniques. For the data sizes that OpenAI, or other seeking to make a GPT-like LLM, use for training their AI, web scraping likely isn't feasible for the whole of Reddit.

It should be mentioned too that this doesn't have to be the only reason for Reddit to make these changes. It still shuts down 3rd party apps and forces (those that remain) to use their ad-ridden stock app, it gives them greater control over how people interact with the site, and now it seems it gives the Reddit admins reason to directly intervene in subreddits to control how they operate after the protests. This combined with making OpenAI/Sam Altman happy, things start to add up.

It kills multiple birds with one stone, if you will.

you are viewing a single comment's thread
view the rest of the comments
[-] awooo@pawb.social 1 points 1 year ago

OpenAI definitely owes a lot to Reddit threads, I've even been able to trace a GPT-4 hallucination to a single thread where the things it was talking about appeared, but it seemed to have merge two completely different names together.

It definitely could be a contributing factor, the biggest players are caught in a war among themselves while trying to fend off open models at the same time. That may explain why everyone values their publicly accessible data all of a sudden.

Maybe even the recent stuff with YouTube (invidious and ad blockers) can be explained by this, maybe they want to set the stage for restricting access to videos. Why? Videos have proven to be a good way of training open-ended agents that play Minecraft for example. Google has PALM-E (which is based on an LM and another transformer for performing physical movements) and is working on general-purpose robots. They also said they were training on some kind of model that's built to be multimodal from the very start, which will probably be a successor to that.

[-] bersl2@furry.engineer 6 points 1 year ago

@awooo @southernwolf So instead of making training data opt-in, everyone's just going to enclose the Internet even further.

Wonderful.

[-] awooo@pawb.social 1 points 1 year ago

Yeah, pretty much...

tbh training robots on videos wouldn't even be bad copyright wise, there's nothing copyrightable about the way people move and do things, and people mostly want boring manual jobs to be automated (at least if we get rid of capitalism first so we don't fucking starve). But of course Google wants to have an edge on its robots and they can get that by siloing off the data from everyone else...

AI research should be public and the results made as accessible as possible. I hate the intersection of AI and capitalism.

[-] southernwolf@pawb.social 4 points 1 year ago

It's less so purely capitalism as it is corporatism. Especially so with Altman running around demanding they, and they alone, be given the ability to make AI's. Emad and Stability AI prove you absolutely don't need that model whatsoever. Further still, the potential commercial projects born out of what Stability released are... Many.

What can absolutely not be allowed is for OpenAI, or Google, to be given the sole right to create AI's, enforced by law. That's a scary world I 100% do not want to live in...

[-] awooo@pawb.social 2 points 1 year ago

Meh, that's the logical conclusion of capitalism.

I suspect these supposedly good companies will either rise and fall as they run out of VC money, or become another OpenAI or Google at one point, only using their initial investment to kick start their tech.

But also we have to think about getting replaced by automation anyway (large corporations having exclusive access to it only exacerbates it). It's different from previous forms of technology, because it won't really create enough jobs for people.

And while we're at it, if we can get abundance of labour, why not give people a bit more agency over everything than just delegating it to some rich fucks who will turn around the moment they sniff out a way to make extra money and abuse in a multitude of ways to keep their influence?

load more comments (2 replies)
this post was submitted on 16 Jun 2023
18 points (100.0% liked)

Furry Technologists

1277 readers
1 users here now

Science, Technology, and pawbs

founded 1 year ago
MODERATORS