kogasa

joined 2 years ago
[–] kogasa@programming.dev 1 points 12 hours ago (2 children)

I know, I'm just saying it's not theoretically impossible to have a phone number as a token. It's just probably not what happened here.

the choice of the next token is really random

It's not random in the sense of a uniform distribution which is what is implied by "generate a random [phone] number".

[–] kogasa@programming.dev 2 points 13 hours ago (4 children)

A full phone number could be in the tokenizer vocabulary, but any given one probably isn't in there

[–] kogasa@programming.dev 9 points 21 hours ago* (last edited 21 hours ago) (6 children)

I mean the latter statement is not true at all. I'm not sure why you think this. A basic GPT model reads a sequence of tokens and predicts the next one. Any sequence of tokens is possible, and each digit 0-9 is likely its own token, as is the case in the GPT2 tokenizer.

An LLM can't generate random numbers in the sense of a proper PRNG simulating draws from a uniform distribution, the output will probably have some kind of statistical bias. But it doesn't have to produce sequences contained in the training data.

[–] kogasa@programming.dev 6 points 2 days ago

It's a number and complexity refers to functions. The natural inclusion of numbers into functions maps pi to the constant function x -> pi which is O(1).

If you want the time complexity of an algorithm that produces the nth digit of pi, the best known ones are something like O(n log n) with O(1) being impossible.

[–] kogasa@programming.dev 2 points 3 days ago (1 children)

The direct connection is cool, I just wonder if a P2P connection is actually any better than going through a data center. There's gonna be intermediate servers right?

Do you need to have Tailscale set up on any network you want to use this on? Because I'm a fan of being able to just throw my domain or IP into any TV and log in

[–] kogasa@programming.dev 3 points 3 days ago (3 children)

I just use nginx on a tiny Hetzner vps acting as a reverse proxy for my home server. I dunno what the point of Tailscale is here, maybe better latency and fewer network hops in some cases if a p2p connection is possible? But I've never had any bandwidth or latency issues doing this

[–] kogasa@programming.dev 12 points 3 days ago (2 children)

It gets around port forwarding/firewall issues that most people don't know how to deal with. But putting it behind a paywall kinda kills any chance of it being a benevolent feature.

[–] kogasa@programming.dev 14 points 4 days ago (1 children)

Possible reasons include:

  • fun

  • inflicting needless suffering on fish [applies if you hate fish]

[–] kogasa@programming.dev 7 points 4 days ago

It's got a very high barrier to entry. You kinda have to suffer through it for a while before you get it. And then you unlock a totally different kind of suffering.

[–] kogasa@programming.dev 27 points 5 days ago (1 children)

The last time I had fun with LLMs was back when GPT2 was cutting-edge, I fine-tuned GPT2-Medium on Twitch chat logs and it alternates between emote spam, complete incoherence, blatantly unhinged comments, and suspiciously normal ones. The bot is still in use as a toy, specifically because it's deranged and unpredictable. It's like a kaleidoscope for the slice of internet subculture it was trained on, much more fun than a plain flawless mirror.

[–] kogasa@programming.dev 1 points 5 days ago* (last edited 5 days ago)

1000 or so moves

Ok, this nerd-snipes me a bit. The real point of the Towers of Hanoi problem, at least if you're doing it in a discrete math class, is to count exactly the number of moves for n discs. Call this number f(n).

To make this post easier to read, we will label the discs from 1 to n, with 1 being the bottom disc and n being the top. We will also label the pegs A, B, and C, with A being the starting peg and C being the target peg. This lets us describe moves in a compact notation: 3: A -> C means disc 3 moves from A to C.

For n=0, we vacuously need 0 moves to win.

For n=1, we need just 1 move: 1:A->C.

For n=2, we can do it in 3 moves, so f(2) = 3. The sequence is:

  • 2: A->B # unblock the next move
  • 1: A->C # move 1 to its final location
  • 2: B->C # move 2 to its final location

Now suppose we know f(n) for n between 1 and N, with N >= 2. For n=N+1, we can move N+1: A -> C and attempt to use our strategy for moving the remaining N discs from A to C, shuffling N+1 around as needed. Since it's the smallest disc, we can always move it to any pillar, and any time we want to move something to the pillar it's on, it must be moved first. Let's see how that plays out for N=2:

  • 3: A->C # move the N+1 disc
  • 2: A->B # start the N-disc strategy
  • 3: C->B # unblock the next step
  • 1: A->C # next step of N-disc strategy
  • 3: B->A # unblock the next step
  • 2: B->C # last step of N-disc strategy
  • 3: A->C # finish the N+1-disc problem

We hazard a guess that every step of the N-disc strategy is preceded by a move of the N+1 disc, plus one final move. In other words, f(N+1) = 2f(N) + 1. Some careful analysis will justify this guess.

Now we have a recurrence relation for f(n), but how to solve it? A classical technique is "guess the formula magically, then prove it by induction." It's certainly doable here if you compute a few values by hand:

f(2) = 3

f(3) = 2(3) + 1 = 7

f(4) = 2(7) + 1 = 15

f(5) = 2(15) + 1 = 31

f(6) = 2(31) + 1 = 63

You'll probably see it right away: f(n) = 2^n - 1. Indeed, we can prove this by induction: the base case n=2 holds as f(2) = 3 = 2^(2) - 1, and if f(n) = 2^n - 1 then f(n+1) = 2f(n) + 1 = 2(2^n - 1) + 1 = (2^(n+1) - 2) + 1 = 2^(n+1) - 1 as desired.

We may conclude f(12) = 2^(12) - 1 = 4095. In other words, with 12 discs, exactly 4095 moves are required.

--

Bonus: an alternative to the "guess and prove" technique that is generalizable to a broad class of recurrence relations. The technique is called "generating functions." Given the sequence f(n), we consider the formal power series

F(x) = f(0) + f(1)x + f(2)x^(2) + ... + f(n)x^(n) + ...

This F(x) is the "ordinary generating function" for the sequence f(n). Strictly speaking, it may not be a well-defined function of x since we haven't made any assumptions about convergence, but we will assume (for now, and prove later) that the set of such formal power series behaves algebraically much like we expect. Namely, given the recurrence relation above, we can write:

F(x) - f(0) - f(1)x = f(2)x^(2) + f(3)x^(3) + f(4)x^(4) + ... + f(n)x^n + ...

= (2f(1) + 1)x^(2) + (2f(2) + 1)x^(3) + ... + (2f(n-1) + 1)x^(n) + ...

= 2x(f(1)x + f(2)x^(2) + ... + f(n)x^(n)) + (x^(2) + x^(3) + ... + x^(n) + ...)

= 2x(F(x) - f(0)) + x^(2)/(1-x)

In our case, we have f(0) = 0, f(1) = 1, f(2) = 3 so we can write more succinctly:

F(x) - x = 2xF(x) + x^(2)/(1-x)

Solving for F,

F(x)(1 - 2x) = x + x^(2)/(1-x)

= x(1 + x/(1-x))

F(x) = x(1 + x/(1-x))/(1 - 2x)

= x/(2x^(2) - 3x + 1)

Ok, great. We've found that our generating function, convergence notwithstanding, is that rational function. We can use partial fraction decomposition to write it as

F(x) = 1/(1 - 2x) - 1/(1-x)

which has the advantage of telling us exactly how to compute the coefficients of the Taylor series for F(x). Namely,

1/(1-x) = 1 + x + x^(2) + ... + x^(n) + ...

1/(1 - 2x) = 1 + 2x + 4x^(2) + ... + 2^(n) x^(n) + ...

So F(x) = (1-1) + (2-1)x + (4-1)x^(2) + ... + (2^(n)-1)x^(n) + ...

The nth coefficient of the Taylor series for F about 0 is 2^(n)-1, and by the definition of F as the ordinary generating function for f(n), we have f(n) = 2^(n) - 1. (The rigorous justification for ignoring convergence here still needs to be done; for now, this can be seen as a useful magic trick.)

[–] kogasa@programming.dev 3 points 5 days ago* (last edited 5 days ago)

Not necessarily do logic, but mimic it, like it can mimic coherent writing and basic conversation despite only being a statistical token muncher. The hope is that there's sufficient information in the syntax to model the semantics, in which case a sufficiently complex and well-trained model of the syntax is also an effective model of the semantics. This apparently holds up well for general language tasks, meaning "what we mean" is well-modeled by "how we say it." It's plausible, at face value, that rigorous argumentation is also a good candidate, which would give language models some way of mimicking logic by talking through a problem. It's just not very good in practice right now. Maybe a better language model could do better, maybe not for a reasonable cost.

view more: next ›