Support NeoGAF

hyperbertha · Saturday at 10:47 PM

ResurrectedContrarian said:
Absolutely not. The only way to have that absurd confusion is if you define “distribution” in such a way that it’s meaningless. It abstracts from what it has seen to a higher order prior that it can leverage, just as humans do when we make analogies or combine ideas in novel situations.

I don’t agree because it isn’t memorization, it is abstractive compression, much like the way much of our human language knowledge works in our own brain. Calling it mere memorization is a total conceptual misunderstanding of the model, its architecture, its capabilities, and even the math.

You could say this of human neurons too.

The architecture is designed to output a set of tokens that are most likely to follow previous set of tokens. how does such a thing ever generalize out of distribution?

And your last sentence is absurd to me. Humans output based on statistics prediction?

Tams · Sunday at 1:13 AM

rm082e said:
I just went to GPT and got the following:

OpenAI's ChatGPT-4o:

Anthropic's Claude 3 Sonnet:

SJRB · Monday at 10:15 AM

THEY DISABLED THE SCARLETT JOHANSSON SOUND-A-LIKE VOICE

This is the stuff men will go to war for. Bring her back! She was ours, damnit!

gaming_again · Monday at 2:24 PM

wtf. I love Sky.

gaming_again · Monday at 2:26 PM

The default woman voice now is so bland

newtypepilot · Tuesday at 6:24 AM

SJRB · Tuesday at 6:59 AM

Either the Sky voice is voiced by someone else like OpenAI claims and this thing will be over in no time, or they actually did train a model on her which would be the dumbest fucking blunder imaginable.

Either way this is handled poorly by Altman.

navii · Tuesday at 7:02 AM

There is irony somewhere about the whole world hating it when she played a robot and now they are dying for her to voice one.

gaming_again · Tuesday at 7:04 AM

SJRB said:
Either the Sky voice is voiced by someone else like OpenAI claims and this thing will be over in no time, or they actually did train a model on her which would be the dumbest fucking blunder imaginable.

Either way this is handled poorly by Altman.

Altman is fascinating. I can't work out if his humbleness is genuine or a tool to disarm people. I've never heard somebody say 'I don't know' so often in interviews.

ResurrectedContrarian · Tuesday at 7:13 AM

hyperbertha said:
The architecture is designed to output a set of tokens that are most likely to follow previous set of tokens. how does such a thing ever generalize out of distribution?

It sounds like you're describing "distribution" as the language itself at this point. As in, "my friend never replies out of distribution even in complex debates... every turn of phrase he uses already exists, every rhetorical technique is well known, he just assembles these pieces in new sequences to build out his answers!" Well yeah, that's language.

When a transformer outputs a series of tokens, it's not like autocomplete in familiar sense, where you just guess at a purely surface level.

Transformer networks use causal attention across the body of text, which means that the full in-context meaning and impact of each word in a given piece of text is understood by tracing its situational context back across that whole body of words backwards, a bit like a lot of intertwined threads across the text.

Those threads of meaning and intention are "loaded" and pointing somewhere... so if you give it a long passage and ask for it to reply, it doesn't simply look at the surface of which words are here and which are next; no, it contextually grasps the arguments, tones, intentions, contradictions, etc and then unrolls where that text is heading as an extremely high-level abstraction over language that is very much like thought. Change even one subtle part of the argument several paragraphs back, and its entire understanding of the text correctly shifts with it. It's not a "bag of words" nor an ngram like approach where words simply predict words on a flat plane, but learns successively higher abstractions over the language.

And just to state the obvious: these networks are highly nonlinear. They don't "sum up" probabilities or anything of the sort. They have nonlinear activations everywhere, which act like decision boundaries at every layer, as well as countless attention heads, which are like information routing mechanisms. These combined make it an entirely different ballgame from what you seem to think it's doing with statistical prediction. It's not at all like "this word now adds to the probability that you'll use that word next," but more "ah, with the addition of this new word, a whole new direction opens up on where the prior words are going and what might be thinkable or logical next."

How can pure language use go "out of distribution" as you ask? We do it all the time. Go to a philosophy class, and watch how you teach people to reason: it's by following the threads of the argument up to the present moment in the conversation, grasping where that is headed (including possible contradictions, intentions, etc along the way), and then use a logical language and form to verbally unroll the next logical direction. Teaching students to "talk out" a problem via Socratic discourse or other methods is teaching them to solve novel problems. No empirically new data is needed at all for this exercise in reason, just learning to talk well through problems.

hyperbertha said:
And your last sentence is absurd to me. Humans output based on statistics prediction?

The idea of neurons in machine learning is of course an attempt to mimic neuron activations in the brain. If you're this reductionist about neural networks, why do you so sharply distinguish human or animal brains, which are also neurons firing and activating each other across nonlinear activation boundaries at large scale?

SJRB · Tuesday at 7:20 AM

I mean it's clearly not the same, but I still get why she did this.

Edit: Altman statement:

gaming_again · Tuesday at 8:59 AM

They need to come up with any sexy voice quick. The remaining options are terrible. It's either adhd kid, chad, or an aunt.

JimmyRustler · Tuesday at 9:01 AM

They should just hire Morgan Freeman and be done with it.

thefool · Tuesday at 10:26 AM

They should cast Bobcat Goldthwait.

hyperbertha · Tuesday at 7:43 PM

ResurrectedContrarian said:
It sounds like you're describing "distribution" as the language itself at this point. As in, "my friend never replies out of distribution even in complex debates... every turn of phrase he uses already exists, every rhetorical technique is well known, he just assembles these pieces in new sequences to build out his answers!" Well yeah, that's language.

When a transformer outputs a series of tokens, it's not like autocomplete in familiar sense, where you just guess at a purely surface level.

Transformer networks use causal attention across the body of text, which means that the full in-context meaning and impact of each word in a given piece of text is understood by tracing its situational context back across that whole body of words backwards, a bit like a lot of intertwined threads across the text.

Those threads of meaning and intention are "loaded" and pointing somewhere... so if you give it a long passage and ask for it to reply, it doesn't simply look at the surface of which words are here and which are next; no, it contextually grasps the arguments, tones, intentions, contradictions, etc and then unrolls where that text is heading as an extremely high-level abstraction over language that is very much like thought. Change even one subtle part of the argument several paragraphs back, and its entire understanding of the text correctly shifts with it. It's not a "bag of words" nor an ngram like approach where words simply predict words on a flat plane, but learns successively higher abstractions over the language.

And just to state the obvious: these networks are highly nonlinear. They don't "sum up" probabilities or anything of the sort. They have nonlinear activations everywhere, which act like decision boundaries at every layer, as well as countless attention heads, which are like information routing mechanisms. These combined make it an entirely different ballgame from what you seem to think it's doing with statistical prediction. It's not at all like "this word now adds to the probability that you'll use that word next," but more "ah, with the addition of this new word, a whole new direction opens up on where the prior words are going and what might be thinkable or logical next."

How can pure language use go "out of distribution" as you ask? We do it all the time. Go to a philosophy class, and watch how you teach people to reason: it's by following the threads of the argument up to the present moment in the conversation, grasping where that is headed (including possible contradictions, intentions, etc along the way), and then use a logical language and form to verbally unroll the next logical direction. Teaching students to "talk out" a problem via Socratic discourse or other methods is teaching them to solve novel problems. No empirically new data is needed at all for this exercise in reason, just learning to talk well through problems.

The idea of neurons in machine learning is of course an attempt to mimic neuron activations in the brain. If you're this reductionist about neural networks, why do you so sharply distinguish human or animal brains, which are also neurons firing and activating each other across nonlinear activation boundaries at large scale?

Humans and animals learn based on casual modelling. Not statistical modelling. While you may claim that the abstractions formed from training on so much text is deeper than surface level, the EVIDENCE is showing not. These models can produce text and theory that sound plausible, yet can't solve the vast majority of common sense problems they are faced with.

Any thing that is only learning statistical relations between two concepts are not 'causally reasoning'. That is absurd.

And just because llms may be neural nets, doesn't mean those neurons are anywhere near as robust as biological neurons. This is a fallacy. The architecture is demonstrably different.

MrRenegade · Tuesday at 8:52 PM

gaming_again said:
Altman is fascinating. I can't work out if his humbleness is genuine or a tool to disarm people. I've never heard somebody say 'I don't know' so often in interviews.

Oh, get back to the ground ffs. You don't know ANYTHING about these faces. Look how humble he is, look how sincere, blablabla... and then you get Puff Daddy going pat bateman on his girlfriend with a towel around his colonel...

Fuck, I don't know what people need, maybe a fucking big asteroid or a war to finally wake the fuck up from their rainbow dream.

If they don't have some way to make their models learn by themselves they will be in big fucking trouble sooooon.

ResurrectedContrarian · Tuesday at 10:43 PM

hyperbertha said:
Humans and animals learn based on casual modelling. Not statistical modelling. While you may claim that the abstractions formed from training on so much text is deeper than surface level, the EVIDENCE is showing not. These models can produce text and theory that sound plausible, yet can't solve the vast majority of common sense problems they are faced with.

Any thing that is only learning statistical relations between two concepts are not 'causally reasoning'. That is absurd.

And just because llms may be neural nets, doesn't mean those neurons are anywhere near as robust as biological neurons. This is a fallacy. The architecture is demonstrably different.

Calling it "statistical association" is nonsense, and a total mischaracterization -- as is suggesting that they don't learn algorithms.

Even a small multilayer perceptron can learn sequential logical algorithms including steps like XOR, because that's the whole point of having nonlinear activations. I mention XOR because that's exactly the kind of logic that first showed decades ago that these networks learn discontinuous / nonlinear algorithmic relationships and logical sequences, not merely statistical associations.

The nonlinearities across all its vast number of layers -- plus the innovative routing capabilities of attention mechanisms in transformers, which link tokens by a query-key interface and let the model internally query across the text, and which were the breakthrough which led to GPT -- both are at the heart of how these networks work.

They do not simply model surface statistical relationships. And this notion that they cannot understands algorithms is comic if you've ever given them novel problems to solve in code, asked them next to rewrite it in another paradigm, etc over and over with incredible results.

Trunx81 · Tuesday at 10:49 PM

I just tried it and Sky is still available?
Anyway, is there any cheaper plan on ChatGPT4? Any offers? Really want to try it for longer.

PSYGN · Tuesday at 11:47 PM

Trunx81 said:
I just tried it and Sky is still available?
Anyway, is there any cheaper plan on ChatGPT4? Any offers? Really want to try it for longer.

They change the voice but kept the name

ResurrectedContrarian · 2024-05-22T05:59:30+0100

And related to my discussions with

hyperbertha above, there's a fantastic new research paper from the brilliant team over at Anthropic today.

It examines the mental space of a large language model (Claude, their own competitor to GTP4 etc), and is able to find features (similar to regions or patterns by which clusters of neurons activate) for all kinds of concepts, from things in the world (eg. the Golden Gate bridge) to concepts like sarcasm.

It's an incredible read: https://www.anthropic.com/research/mapping-mind-language-model
(after that intro, reading the full paper is even better: https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html)

Look at their analysis of the Golden Gate bridge concept. They were able to locate this feature space inside the model, and then to measure when it fires, a bit like looking at in-depth brain scans while having a patient talk or look at things, but much more detailed since we can freely take this kind of brain apart to see all its active connections.

As they note in the paper, the concept of this bridge can be seen activating when the landmark is discussed in any language, and even when you feed new images of the bridge into the model.

What's even more fascinating is that they can basically boost that sector of its "brain" to alter its behavior. So normally if you ask it "what is your physical form?" it comes back with a generic thing about being an AI without a physical body. But if you pump some extra activation (kind of like the zapping the brain) into the Golden Gate bridge feature when asking the question, it responds as if it understands itself to be this bridge, and coherently.

One of the other fascinating cases is that they were able to isolate a feature that activates on code errors -- and even across different programming languages. And across a large range of possible errors, from typos to divide-by-zero to many other bugs, and activates only when the code is actually incorrect. Manipulating this feature has fascinating effects that prove this concept to be a way that it understands errors in general.

Read this section: https://transformer-circuits.pub/20...index.html#assessing-sophisticated-code-error

There's a lot of content in this report, but the basic finding is of course what most of us who examine the connective logic of transformer networks already intuit:

they represent complex abstractions (even things like sarcasm or sycophancy) and they do so across languages and genres, genuinely abstracting from language to higher order concepts
they have some kind of implicit model of the world of possible objects and places, which shows if you examine the activations of something like a specific proper name, landmark, etc and watch how it affects everything it does
they also have features which demonstrate that they aren't merely repeating code when giving programming samples, but even "know" when making an error, even of subtle types that require reading the entire context of the line of code to see why it's an error

and so on

SJRB · 2024-05-22T14:59:46+0100

least-chaotic-week-at-open-ai-be-like-v0-d08154qznu1d1.jpg

ResurrectedContrarian · 2024-05-22T15:21:35+0100

The Sky voice was the only good one... I find the other female voices to sound like annoying HR women or something, no thanks. It's rather weak that they let ScarJo shut that down when they had perfectly solid legal backing to keep it, since the voice wasn't based on her or even given instructions to imitate her; sure, they chose a woman with a similar sound after she turned it down, but so what? If you can't hire a certain actress, and you decide to hire another woman who looks or sounds similar, that's perfectly acceptable in film / music / etc.

hyperbertha · 2024-05-22T15:24:55+0100

ResurrectedContrarian said:
And related to my discussions with hyperbertha above, there's a fantastic new research paper from the brilliant team over at Anthropic today.

It examines the mental space of a large language model (Claude, their own competitor to GTP4 etc), and is able to find features (similar to regions or patterns by which clusters of neurons activate) for all kinds of concepts, from things in the world (eg. the Golden Gate bridge) to concepts like sarcasm.

It's an incredible read: https://www.anthropic.com/research/mapping-mind-language-model
(after that intro, reading the full paper is even better: https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html)

Look at their analysis of the Golden Gate bridge concept. They were able to locate this feature space inside the model, and then to measure when it fires, a bit like looking at in-depth brain scans while having a patient talk or look at things, but much more detailed since we can freely take this kind of brain apart to see all its active connections.

As they note in the paper, the concept of this bridge can be seen activating when the landmark is discussed in any language, and even when you feed new images of the bridge into the model.

What's even more fascinating is that they can basically boost that sector of its "brain" to alter its behavior. So normally if you ask it "what is your physical form?" it comes back with a generic thing about being an AI without a physical body. But if you pump some extra activation (kind of like the zapping the brain) into the Golden Gate bridge feature when asking the question, it responds as if it understands itself to be this bridge, and coherently.

One of the other fascinating cases is that they were able to isolate a feature that activates on code errors -- and even across different programming languages. And across a large range of possible errors, from typos to divide-by-zero to many other bugs, and activates only when the code is actually incorrect. Manipulating this feature has fascinating effects that prove this concept to be a way that it understands errors in general.

Read this section: https://transformer-circuits.pub/20...index.html#assessing-sophisticated-code-error

There's a lot of content in this report, but the basic finding is of course what most of us who examine the connective logic of transformer networks already intuit:

they represent complex abstractions (even things like sarcasm or sycophancy) and they do so across languages and genres, genuinely abstracting from language to higher order concepts

they have some kind of implicit model of the world of possible objects and places, which shows if you examine the activations of something like a specific proper name, landmark, etc and watch how it affects everything it does

they also have features which demonstrate that they aren't merely repeating code when giving programming samples, but even "know" when making an error, even of subtle types that require reading the entire context of the line of code to see why it's an error

and so on

How do we know it's not arranging concepts in the space based on how commonly similar concepts show up close to each other in the training data? This doesn't seem to disprove statistical association to me...

ResurrectedContrarian · 2024-05-22T15:29:17+0100

hyperbertha said:
How do we know it's not arranging concepts in the space based on how commonly similar concepts show up close to each other in the training data? This doesn't seem to disprove statistical association to me...

There are even feature activations for abstract concepts like empathy, sycophancy, sarcasm... the model clearly has a high level grasp of language, tone, communication, intention, even deception.

To recognize deceptive or sarcastic praise from genuine praise is a complex reasoning task, requiring contextual grasp of intention; the same words and phrases often belong to both. Many humans struggle to correctly parse these things.

And the "code error" feature I linked, if you read that section of the report, is far beyond mere association.

Hugare · 2024-05-22T16:31:22+0100

SJRB said:

lol, true. But after this gig they can go anywhere with that curriculum.

I dont get it, it doesnt sound like her at all, imo. Altman was a gentleman for letting it go anyway.

I dont think they would be dumb enough to train the voice based on Scarlet's.

Evolved1 · 2024-05-23T01:53:32+0100

they shouldn't have caved. her position is absurd... and that was the best voice. by a lot.

ResurrectedContrarian · 2024-05-23T02:32:46+0100

Evolved1 said:
they shouldn't have caved. her position is absurd... and that was the best voice. by a lot.

100% agree, as I said above

The other female voices are absolutely insufferable… like the worst HR condescension I can imagine.

The male ones are a bit annoying too. Hers was a total breakthrough, the first AI voice which felt natural to me across all home assistants etc so far.

SJRB · 2024-05-23T05:23:31+0100

The cool thing about the Sky voice is that it’s basically one big thought experiment. People mention Scarlett Johansson, so you hear Scarlett Johansson.

Now listen to Sky and think of Rashida Jones. Now suddenly Sky sounds like Rashida Jones. You’re just “tricking” your brain.

I have no doubt this won’t end in some legal battle, Sky is clearly not ScarJo, but the way this is handled by OpenAI and Altman is weak.

Support NeoGAF

OpenAI will unveil something new today (Monday) at 10 am PT (not GPT5). Altman: "feels like magic to me"

Member

Member

Gold Member

Member

Member

Member

Gold Member

My fantasy is that my girlfriend was actually a young high school girl.

Member

Suffers with mild autism

Gold Member

Member

Gold Member

Member

Member

Report me if I continue to troll

Suffers with mild autism

Member

Member

Suffers with mild autism

Gold Member

Suffers with mild autism

Member

Suffers with mild autism

Member

make sure the pudding isn't too soggy but that just ruins everything

Suffers with mild autism

Gold Member

Similar threads