[-] coolin@beehaw.org 5 points 8 months ago

Current LLMs are manifestly different from Cortana (🤢) because they are actually somewhat intelligent. Microsoft's copilot can do web search and perform basic tasks on the computer, and because of their exclusive contract with OpenAI they're gonna have access to more advanced versions of GPT which will be able to do more high level control and automation on the desktop. It will 100% be useful for users to have this available, and I expect even Linux desktops will eventually add local LLM support (once consumer compute and the tech matures). It is not just glorified auto complete, it is actually fairly correlated with outputs of real human language cognition.

The main issue for me is that they get all the data you input and mine it for better models without your explicit consent. This isn't an area where open source can catch up without significant capital in favor of it, so we have to hope Meta, Mistral and government funded projects give us what we need to have a competitor.

[-] coolin@beehaw.org 4 points 1 year ago

For the love of God please stop posting the same story about AI model collapse. This paper has been out since May, been discussed multiple times, and the scenario it presents is highly unrealistic.

Training on the whole internet is known to produce shit model output, requiring humans to produce their own high quality datasets to feed to these models to yield high quality results. That is why we have techniques like fine-tuning, LoRAs and RLHF as well as countless datasets to feed to models.

Yes, if a model for some reason was trained on the internet for several iterations, it would collapse and produce garbage. But the current frontier approach for datasets is for LLMs (e.g. GPT4) to produce high quality datasets and for new LLMs to train on that. This has been shown to work with Phi-1 (really good at writing Python code, trained on high quality textbook level content and GPT3.5) and Orca/OpenOrca (GPT-3.5 level model trained on millions of examples from GPT4 and GPT-3.5). Additionally, GPT4 has itself likely been trained on synthetic data and future iterations will train on more and more.

Notably, by selecting a narrow range of outputs, instead of the whole range, we are able to avoid model collapse and in fact produce even better outputs.

[-] coolin@beehaw.org 6 points 1 year ago

I've never used Manjaro but the perception I get from it is that it is a noob friendly distro with good GUI and config (good) but then catastrophically fails when monkeying around with updates and the AUR. This is a pain for technical users and a back-to-Windows experience for the people it's targeted towards. Overall, significantly worse than EndeavorOS or plain 'ol vanilla Arch Linux.

[-] coolin@beehaw.org 5 points 1 year ago

Shit anyone working for less than $20 packing boxes is getting scammed cause I know for a fact several places offer more than that. It just goes to show the importance of having a union to bargain for higher wages.

[-] coolin@beehaw.org 6 points 1 year ago

This makes sense for any other company but OpenAI is still technically a non profit in control of the OpenAI corporation, the part that is actually a business and can raise capital. Considering Altman claims literal trillions in wealth would be generated by future GPT versions, I don't think OpenAI the non profit would ever sell the company part for a measly few billions.

[-] coolin@beehaw.org 4 points 1 year ago

Lmao Twitter is not that hard to create. Literally look at the Mastodon code base and "transform" it and you're already most of the way there.

[-] coolin@beehaw.org 5 points 1 year ago

FediSearch I guess is similar to your idea, though I think the goal would be to make a new and open search index specifically containing fediverse websites instead of just using Google. I also feel like the formatting should be more like Lemmy, with the particular post title and short description showing instead of the generic search UI.

The idea of a fediverse search is really cool though. If things like news and academic papers ever got their own fediverse-connected service, I could see a FediSearch being a great alternative to the AI sludge of Google.

[-] coolin@beehaw.org 4 points 1 year ago

Basically he is pro-privacy, somewhere in the libertarian space, supports usage of monero, recommends you move to a rural area, etc.

[-] coolin@beehaw.org 4 points 1 year ago

I definitely agree. The vast majority of people still left on Reddit are those who are corporate bootlickers and those who do not care and just want to doom scroll.

Neither type adds anything to an online community

[-] coolin@beehaw.org 4 points 1 year ago

The one SIMPLE trick crypto bros HATE: Blockchain -> "Distributed Ledger" NFT -> "Unique Identifier"

Like and share with your friends

[-] coolin@beehaw.org 7 points 1 year ago

There are some in the research community that agree with your take: THE CURSE OF RECURSION: TRAINING ON GENERATED DATA MAKES MODELS FORGET

Basically the long and short of that paper is that LLMs are inherently biased towards likely responses. The more their training set is LLM generated, and thus contains that bias, the less the LLM will be able to produce unlikely responses, over time degrading the model quality throughout successive generations.

However, I tend to think this viewpoint is probably missing something important. Can you train a new LLM on today's internet? Probably not, at least without some heavy cleaning. Can you train a multimodal model on video, audio, the chat logs of people talking to it, and even other better LLMs? Yes, and you will get a much higher quality model and likely won't get the same model collapse implied by the paper.

This is more or less what OpenAI has done. All the conversations with 100M+ users are saved and used to further train the AI. Their latest GPT4 is also trained on video and image recognition, and they have also been exploring ways for LLMs to train new ones, especially to aid in alignment of these models.

Another recent example is Orca, a fine tune of the open source llama model, which is trained by GPT-3.5 and GPT-4 as teachers, and retains ~90% of GPT-3.5's performance though it uses a factor of 10 less parameters.

[-] coolin@beehaw.org 5 points 1 year ago

Lemmygrad is specifically problematic for being predominantly Marxist Leninist (as the .ml suggests). I think you're probably right that people just reject them outright because of AH THE COMMUNISTS WANT TO END CAPITALISM red scare type stuff present in Western countries, but where I specifically find Lemmygrad (and other tankies) being way too negative to interact with is when they get into defending Communist regimes.

If you asked the average Lemmygrad user, they too would be enveloped in propaganda, though this time coming from communist regimes and praxis they've read. They have been deluded into believing Stalin and Mao were good leaders, that authoritarianism is okay if it advances their favorite political agenda (though for some reason also claim that these countries aren't authoritarian), and that these regimes should be implemented everywhere.

The worst of it all is their constant genocide denial. Yes, the USA and other Western countries have done a similar amount (maybe even more?) of really bad stuff in this area (e.g. natives, apartheids, roma, etc. 💀), but I think broadly a well educated Western citizen, especially a leftist one, should be able to understand and admit that what their country did was wrong and should never be done again. A Lemmygrad user instead defends things like the Uighur genocide and Holodomor, saying both that they don't exist and are "western propaganda" while at the same time entertaining the counterfactual and saying if they did happen it was justified because the West did it too and they were being very mean to communism 😡.

When you get to that level of malevolent stupidity, you start to look more and more like a fascist that supports genocide and absolute power of the state and that uses strategic ambiguity to express your toxic beliefs, than you do a leftist. I don't think anyone suggests we stay federated with a fascist instance because fascists are misunderstood after "years of propaganda pushed by western countries" to discredit Hitler and Mussolini, but here you are doing the moral equivalent.

view more: ‹ prev next ›

coolin

joined 1 year ago