1
31
2
8
submitted 3 weeks ago* (last edited 3 weeks ago) by ericjmorey@programming.dev to c/machine_learning@programming.dev
3
2
4
6
5
9

Video description:

We reproduce the GPT-2 (124M) from scratch.

This video covers the whole process:

First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations.

Keep in mind that in some places this video builds on the knowledge from earlier videos in the Zero to Hero Playlist (see my channel). You could also see this video as building my nanoGPT repo, which by the end is about 90% similar.

6
3
7
4

Bayman, Emine Ozgur PhD*; Dexter, Franklin MD, PhD, FASA†. Multicollinearity in Logistic Regression Models. Anesthesia & Analgesia 133(2):p 362-365, August 2021. | DOI: 10.1213/ANE.0000000000005593

8
6

cross-posted from: https://lemmy.one/post/13942290

Abstract: We present Scallop, a language which combines the benefits of deep learning and logical reasoning. Scallop enables users to write a wide range of neurosymbolic applications and train them in a data- and compute-efficient manner. It achieves these goals through three key features: 1) a flexible symbolic representation that is based on the relational data model; 2) a declarative logic programming language that is based on Datalog and supports recursion, aggregation, and negation; and 3) a framework for automatic and efficient differentiable reasoning that is based on the theory of provenance semirings. We evaluate Scallop on a suite of eight neurosymbolic applications from the literature. Our evaluation demonstrates that Scallop is capable of expressing algorithmic reasoning in diverse and challenging AI tasks, provides a succinct interface for machine learning programmers to integrate logical domain knowledge, and yields solutions that are comparable or superior to state-of-the-art models in terms of accuracy. Furthermore, Scallop's solutions outperform these models in aspects such as runtime and data efficiency, interpretability, and generalizability.

9
3

Original post on r/learnmachinelearning

10
6

Apr 18, 2022 | Tarique Anwar Writes:

The main reason for ReLu being used is that it is simple, fast, and empirically it seems to work well.

But with the emergence of Transformer based models, different variants of activation functions and GLU have been experimented with and do seem to perform better. Some of them are:

  • GeLU²
  • Swish¹
  • GLU³
  • GEGLU⁴
  • SwiGLU⁴

We will go over some of these in detail but before that let’s see where exactly are these activations utilized in a Transformer architecture.

Read Activation function and GLU variants for Transformer models

11
6

Summary

Activation functions are crucial in neural networks, introducing non-linearity and enabling the modeling of complex patterns across varied tasks. This guide delves into the evolution, characteristics, and applications of state-of-the-art activation functions, illustrating their role in enhancing neural network performance. It discusses the transition from classic functions like sigmoid and tanh to advanced ones such as ReLU and its variants, addressing challenges like the vanishing gradient problem and the dying ReLU issue. Concluding with practical heuristics for selecting activation functions, the article emphasizes the importance of considering network architecture and task specifics, highlighting the rich diversity of activation functions available for optimizing neural network designs.

12
20
13
3
14
5

Dawn Wages writes:

Python Data Science Day is a full day of 25 min and 5 min community contributed content March 14th, 2024 streaming on the VS Code YouTube channel.

15
3

Start 2024 with a new goal: become an expert with Python in the cloud. Join us this quarter as we challenge ourselves with Python, Machine Learning and Data Science.

7 hr 1 min | 10 Modules

16
14

cross-posted from: https://lemmy.ml/post/13088944

17
5
The Annotated S4 (srush.github.io)
18
5
19
8

June 21, 2023 | Fabrizio Musacchio writes:

In this post, we will explore the potential of PCA [Principal Component Analysis], denoising autoencoders and Convolutional Neural Networks (CNN) for restoring noisy images using Python. We will examine their performance, advantages, and disadvantages to determine the most effective method for image denoising.

20
6
21
10

cross-posted from: https://programming.dev/post/11034601

There's a lot, and specifically a lot of machine learning talk and features in the 1.5 release of Opus - the free and open audio codec.

Audible and continuous (albeit jittery) talk on 90% packet loss is crazy.

Section WebRTC IntegrationSamples has an example where you can test out the 90 % packet loss audio.

22
4
23
6

2024-02-29 | Christopher Gadzinski writes:

Physics likes optimization! Subject to its boundary conditions, the time evolution of a physical system is a critical point for a quantity called an action. This point of view sets the stage for Noether's principle, a remarkable correspondence between continuous invariances of the action and conservation laws of the system.

In machine learning, we often deal with discrete "processes" whose control parameters are chosen to minimize some quantity. For example, we can see a deep residual network as a process where the role of "time" is played by depth. We may ask:

  1. Does Noether's theorem apply to these processes?
  2. Can we find meaningful conserved quantities?

Our answers: "yes," and "not sure!"

24
4

XAI = Explainable Artificial Intelligence

Dec 14 2023

Alessio Malizia and Fabio Paternò write:

Numerous papers argue for using XAI methods in the literature, as well as multiple suggestions for brand-new XAI family approaches. Nevertheless, finding instances of practical XAI technique implementations that have enhanced the business in industry/societal/real-world applications is more challenging, even if some interesting work in this area has been put forward, for example in the health domain

Read Why Is the Current XAI Not Meeting the Expectations?

25
5

Apr 17, 2017 Matt Brems writes:

Principal component analysis (PCA) is an important technique to understand in the fields of statistics and data science… but when putting a lesson together for my General Assembly students, I found that the resources online were too technical, didn’t fully address our needs, and/or provided conflicting information. It’s safe to say that I’m not “entirely satisfied with the available texts” here.

As a result, I wanted to put together the “What,” “When,” “How,” and “Why” of PCA as well as links to some of the resources that can help to further explain this topic. Specifically, I want to present the rationale for this method, the math under the hood, some best practices, and potential drawbacks to the method.

While I want to make PCA as accessible as possible, the algorithm we’ll cover is pretty technical. Being familiar with some or all of the following will make this article and PCA as a method easier to understand: matrix operations/linear algebra (matrix multiplication, matrix transposition, matrix inverses, matrix decomposition, eigenvectors/eigenvalues) and statistics/machine learning (standardization, variance, covariance, independence, linear regression, feature selection). I’ve embedded links to illustrations of these topics throughout the article, but hopefully these will serve as a reminder rather than required reading to get through the article.

Read A One-Stop Shop for Principal Component Analysis

view more: next ›

Machine Learning

439 readers
1 users here now

A community for posting things related to machine learning

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient

founded 1 year ago
MODERATORS