• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

OpenAI will unveil something new today (Monday) at 10 am PT (not GPT5). Altman: "feels like magic to me"

hyperbertha

Member
Absolutely not. The only way to have that absurd confusion is if you define “distribution” in such a way that it’s meaningless. It abstracts from what it has seen to a higher order prior that it can leverage, just as humans do when we make analogies or combine ideas in novel situations.



I don’t agree because it isn’t memorization, it is abstractive compression, much like the way much of our human language knowledge works in our own brain. Calling it mere memorization is a total conceptual misunderstanding of the model, its architecture, its capabilities, and even the math.


You could say this of human neurons too.
The architecture is designed to output a set of tokens that are most likely to follow previous set of tokens. how does such a thing ever generalize out of distribution?

And your last sentence is absurd to me. Humans output based on statistics prediction?
 

Tams

Member
I just went to GPT and got the following:



doge-meme-and-tech.jpg

OpenAI's ChatGPT-4o:

BzDqvJ5.jpeg


Anthropic's Claude 3 Sonnet:

oHdMtpl.jpeg
 

SJRB

Gold Member
Either the Sky voice is voiced by someone else like OpenAI claims and this thing will be over in no time, or they actually did train a model on her which would be the dumbest fucking blunder imaginable.

Either way this is handled poorly by Altman.
 

navii

My fantasy is that my girlfriend was actually a young high school girl.
There is irony somewhere about the whole world hating it when she played a robot and now they are dying for her to voice one.
 
Either the Sky voice is voiced by someone else like OpenAI claims and this thing will be over in no time, or they actually did train a model on her which would be the dumbest fucking blunder imaginable.

Either way this is handled poorly by Altman.
Altman is fascinating. I can't work out if his humbleness is genuine or a tool to disarm people. I've never heard somebody say 'I don't know' so often in interviews.
 
Last edited:

ResurrectedContrarian

Suffers with mild autism
The architecture is designed to output a set of tokens that are most likely to follow previous set of tokens. how does such a thing ever generalize out of distribution?
It sounds like you're describing "distribution" as the language itself at this point. As in, "my friend never replies out of distribution even in complex debates... every turn of phrase he uses already exists, every rhetorical technique is well known, he just assembles these pieces in new sequences to build out his answers!" Well yeah, that's language.

When a transformer outputs a series of tokens, it's not like autocomplete in familiar sense, where you just guess at a purely surface level.

Transformer networks use causal attention across the body of text, which means that the full in-context meaning and impact of each word in a given piece of text is understood by tracing its situational context back across that whole body of words backwards, a bit like a lot of intertwined threads across the text.

Those threads of meaning and intention are "loaded" and pointing somewhere... so if you give it a long passage and ask for it to reply, it doesn't simply look at the surface of which words are here and which are next; no, it contextually grasps the arguments, tones, intentions, contradictions, etc and then unrolls where that text is heading as an extremely high-level abstraction over language that is very much like thought. Change even one subtle part of the argument several paragraphs back, and its entire understanding of the text correctly shifts with it. It's not a "bag of words" nor an ngram like approach where words simply predict words on a flat plane, but learns successively higher abstractions over the language.

And just to state the obvious: these networks are highly nonlinear. They don't "sum up" probabilities or anything of the sort. They have nonlinear activations everywhere, which act like decision boundaries at every layer, as well as countless attention heads, which are like information routing mechanisms. These combined make it an entirely different ballgame from what you seem to think it's doing with statistical prediction. It's not at all like "this word now adds to the probability that you'll use that word next," but more "ah, with the addition of this new word, a whole new direction opens up on where the prior words are going and what might be thinkable or logical next."

How can pure language use go "out of distribution" as you ask? We do it all the time. Go to a philosophy class, and watch how you teach people to reason: it's by following the threads of the argument up to the present moment in the conversation, grasping where that is headed (including possible contradictions, intentions, etc along the way), and then use a logical language and form to verbally unroll the next logical direction. Teaching students to "talk out" a problem via Socratic discourse or other methods is teaching them to solve novel problems. No empirically new data is needed at all for this exercise in reason, just learning to talk well through problems.

And your last sentence is absurd to me. Humans output based on statistics prediction?
The idea of neurons in machine learning is of course an attempt to mimic neuron activations in the brain. If you're this reductionist about neural networks, why do you so sharply distinguish human or animal brains, which are also neurons firing and activating each other across nonlinear activation boundaries at large scale?
 

SJRB

Gold Member


I mean it's clearly not the same, but I still get why she did this.

Edit: Altman statement:

sama-statement-re-scarlett-v0-3wx3lmu45p1d1.jpeg
 
Last edited:

hyperbertha

Member
It sounds like you're describing "distribution" as the language itself at this point. As in, "my friend never replies out of distribution even in complex debates... every turn of phrase he uses already exists, every rhetorical technique is well known, he just assembles these pieces in new sequences to build out his answers!" Well yeah, that's language.

When a transformer outputs a series of tokens, it's not like autocomplete in familiar sense, where you just guess at a purely surface level.

Transformer networks use causal attention across the body of text, which means that the full in-context meaning and impact of each word in a given piece of text is understood by tracing its situational context back across that whole body of words backwards, a bit like a lot of intertwined threads across the text.

Those threads of meaning and intention are "loaded" and pointing somewhere... so if you give it a long passage and ask for it to reply, it doesn't simply look at the surface of which words are here and which are next; no, it contextually grasps the arguments, tones, intentions, contradictions, etc and then unrolls where that text is heading as an extremely high-level abstraction over language that is very much like thought. Change even one subtle part of the argument several paragraphs back, and its entire understanding of the text correctly shifts with it. It's not a "bag of words" nor an ngram like approach where words simply predict words on a flat plane, but learns successively higher abstractions over the language.

And just to state the obvious: these networks are highly nonlinear. They don't "sum up" probabilities or anything of the sort. They have nonlinear activations everywhere, which act like decision boundaries at every layer, as well as countless attention heads, which are like information routing mechanisms. These combined make it an entirely different ballgame from what you seem to think it's doing with statistical prediction. It's not at all like "this word now adds to the probability that you'll use that word next," but more "ah, with the addition of this new word, a whole new direction opens up on where the prior words are going and what might be thinkable or logical next."

How can pure language use go "out of distribution" as you ask? We do it all the time. Go to a philosophy class, and watch how you teach people to reason: it's by following the threads of the argument up to the present moment in the conversation, grasping where that is headed (including possible contradictions, intentions, etc along the way), and then use a logical language and form to verbally unroll the next logical direction. Teaching students to "talk out" a problem via Socratic discourse or other methods is teaching them to solve novel problems. No empirically new data is needed at all for this exercise in reason, just learning to talk well through problems.


The idea of neurons in machine learning is of course an attempt to mimic neuron activations in the brain. If you're this reductionist about neural networks, why do you so sharply distinguish human or animal brains, which are also neurons firing and activating each other across nonlinear activation boundaries at large scale?
Humans and animals learn based on casual modelling. Not statistical modelling. While you may claim that the abstractions formed from training on so much text is deeper than surface level, the EVIDENCE is showing not. These models can produce text and theory that sound plausible, yet can't solve the vast majority of common sense problems they are faced with.



Any thing that is only learning statistical relations between two concepts are not 'causally reasoning'. That is absurd.

And just because llms may be neural nets, doesn't mean those neurons are anywhere near as robust as biological neurons. This is a fallacy. The architecture is demonstrably different.
 
Last edited:

MrRenegade

Report me if I continue to troll
Altman is fascinating. I can't work out if his humbleness is genuine or a tool to disarm people. I've never heard somebody say 'I don't know' so often in interviews.
Oh, get back to the ground ffs. You don't know ANYTHING about these faces. Look how humble he is, look how sincere, blablabla... and then you get Puff Daddy going pat bateman on his girlfriend with a towel around his colonel...

Fuck, I don't know what people need, maybe a fucking big asteroid or a war to finally wake the fuck up from their rainbow dream.

If they don't have some way to make their models learn by themselves they will be in big fucking trouble sooooon.
 

ResurrectedContrarian

Suffers with mild autism
Humans and animals learn based on casual modelling. Not statistical modelling. While you may claim that the abstractions formed from training on so much text is deeper than surface level, the EVIDENCE is showing not. These models can produce text and theory that sound plausible, yet can't solve the vast majority of common sense problems they are faced with.



Any thing that is only learning statistical relations between two concepts are not 'causally reasoning'. That is absurd.

And just because llms may be neural nets, doesn't mean those neurons are anywhere near as robust as biological neurons. This is a fallacy. The architecture is demonstrably different.

Calling it "statistical association" is nonsense, and a total mischaracterization -- as is suggesting that they don't learn algorithms.

Even a small multilayer perceptron can learn sequential logical algorithms including steps like XOR, because that's the whole point of having nonlinear activations. I mention XOR because that's exactly the kind of logic that first showed decades ago that these networks learn discontinuous / nonlinear algorithmic relationships and logical sequences, not merely statistical associations.

The nonlinearities across all its vast number of layers -- plus the innovative routing capabilities of attention mechanisms in transformers, which link tokens by a query-key interface and let the model internally query across the text, and which were the breakthrough which led to GPT -- both are at the heart of how these networks work.

They do not simply model surface statistical relationships. And this notion that they cannot understands algorithms is comic if you've ever given them novel problems to solve in code, asked them next to rewrite it in another paradigm, etc over and over with incredible results.
 
Last edited:

Trunx81

Member
I just tried it and Sky is still available?
Anyway, is there any cheaper plan on ChatGPT4? Any offers? Really want to try it for longer.
 

ResurrectedContrarian

Suffers with mild autism
And related to my discussions with hyperbertha hyperbertha above, there's a fantastic new research paper from the brilliant team over at Anthropic today.

It examines the mental space of a large language model (Claude, their own competitor to GTP4 etc), and is able to find features (similar to regions or patterns by which clusters of neurons activate) for all kinds of concepts, from things in the world (eg. the Golden Gate bridge) to concepts like sarcasm.

It's an incredible read: https://www.anthropic.com/research/mapping-mind-language-model
(after that intro, reading the full paper is even better: https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html)

Look at their analysis of the Golden Gate bridge concept. They were able to locate this feature space inside the model, and then to measure when it fires, a bit like looking at in-depth brain scans while having a patient talk or look at things, but much more detailed since we can freely take this kind of brain apart to see all its active connections.

ZPyrIV.jpg


As they note in the paper, the concept of this bridge can be seen activating when the landmark is discussed in any language, and even when you feed new images of the bridge into the model.

What's even more fascinating is that they can basically boost that sector of its "brain" to alter its behavior. So normally if you ask it "what is your physical form?" it comes back with a generic thing about being an AI without a physical body. But if you pump some extra activation (kind of like the zapping the brain) into the Golden Gate bridge feature when asking the question, it responds as if it understands itself to be this bridge, and coherently.


One of the other fascinating cases is that they were able to isolate a feature that activates on code errors -- and even across different programming languages. And across a large range of possible errors, from typos to divide-by-zero to many other bugs, and activates only when the code is actually incorrect. Manipulating this feature has fascinating effects that prove this concept to be a way that it understands errors in general.

Read this section: https://transformer-circuits.pub/20...index.html#assessing-sophisticated-code-error

There's a lot of content in this report, but the basic finding is of course what most of us who examine the connective logic of transformer networks already intuit:
  • they represent complex abstractions (even things like sarcasm or sycophancy) and they do so across languages and genres, genuinely abstracting from language to higher order concepts
  • they have some kind of implicit model of the world of possible objects and places, which shows if you examine the activations of something like a specific proper name, landmark, etc and watch how it affects everything it does
  • they also have features which demonstrate that they aren't merely repeating code when giving programming samples, but even "know" when making an error, even of subtle types that require reading the entire context of the line of code to see why it's an error

and so on
 
Last edited:

ResurrectedContrarian

Suffers with mild autism
The Sky voice was the only good one... I find the other female voices to sound like annoying HR women or something, no thanks. It's rather weak that they let ScarJo shut that down when they had perfectly solid legal backing to keep it, since the voice wasn't based on her or even given instructions to imitate her; sure, they chose a woman with a similar sound after she turned it down, but so what? If you can't hire a certain actress, and you decide to hire another woman who looks or sounds similar, that's perfectly acceptable in film / music / etc.
 

hyperbertha

Member
And related to my discussions with hyperbertha hyperbertha above, there's a fantastic new research paper from the brilliant team over at Anthropic today.

It examines the mental space of a large language model (Claude, their own competitor to GTP4 etc), and is able to find features (similar to regions or patterns by which clusters of neurons activate) for all kinds of concepts, from things in the world (eg. the Golden Gate bridge) to concepts like sarcasm.

It's an incredible read: https://www.anthropic.com/research/mapping-mind-language-model
(after that intro, reading the full paper is even better: https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html)

Look at their analysis of the Golden Gate bridge concept. They were able to locate this feature space inside the model, and then to measure when it fires, a bit like looking at in-depth brain scans while having a patient talk or look at things, but much more detailed since we can freely take this kind of brain apart to see all its active connections.

ZPyrIV.jpg


As they note in the paper, the concept of this bridge can be seen activating when the landmark is discussed in any language, and even when you feed new images of the bridge into the model.

What's even more fascinating is that they can basically boost that sector of its "brain" to alter its behavior. So normally if you ask it "what is your physical form?" it comes back with a generic thing about being an AI without a physical body. But if you pump some extra activation (kind of like the zapping the brain) into the Golden Gate bridge feature when asking the question, it responds as if it understands itself to be this bridge, and coherently.


One of the other fascinating cases is that they were able to isolate a feature that activates on code errors -- and even across different programming languages. And across a large range of possible errors, from typos to divide-by-zero to many other bugs, and activates only when the code is actually incorrect. Manipulating this feature has fascinating effects that prove this concept to be a way that it understands errors in general.

Read this section: https://transformer-circuits.pub/20...index.html#assessing-sophisticated-code-error

There's a lot of content in this report, but the basic finding is of course what most of us who examine the connective logic of transformer networks already intuit:
  • they represent complex abstractions (even things like sarcasm or sycophancy) and they do so across languages and genres, genuinely abstracting from language to higher order concepts
  • they have some kind of implicit model of the world of possible objects and places, which shows if you examine the activations of something like a specific proper name, landmark, etc and watch how it affects everything it does
  • they also have features which demonstrate that they aren't merely repeating code when giving programming samples, but even "know" when making an error, even of subtle types that require reading the entire context of the line of code to see why it's an error

and so on

How do we know it's not arranging concepts in the space based on how commonly similar concepts show up close to each other in the training data? This doesn't seem to disprove statistical association to me...
 

ResurrectedContrarian

Suffers with mild autism
How do we know it's not arranging concepts in the space based on how commonly similar concepts show up close to each other in the training data? This doesn't seem to disprove statistical association to me...


There are even feature activations for abstract concepts like empathy, sycophancy, sarcasm... the model clearly has a high level grasp of language, tone, communication, intention, even deception.

N6ljWU.jpg

To recognize deceptive or sarcastic praise from genuine praise is a complex reasoning task, requiring contextual grasp of intention; the same words and phrases often belong to both. Many humans struggle to correctly parse these things.

And the "code error" feature I linked, if you read that section of the report, is far beyond mere association.
 
Last edited:

Hugare

Member
lol, true. But after this gig they can go anywhere with that curriculum.

I dont get it, it doesnt sound like her at all, imo. Altman was a gentleman for letting it go anyway.

I dont think they would be dumb enough to train the voice based on Scarlet's.
 

Evolved1

make sure the pudding isn't too soggy but that just ruins everything
they shouldn't have caved. her position is absurd... and that was the best voice. by a lot.
 

ResurrectedContrarian

Suffers with mild autism
they shouldn't have caved. her position is absurd... and that was the best voice. by a lot.
100% agree, as I said above

The other female voices are absolutely insufferable… like the worst HR condescension I can imagine.

The male ones are a bit annoying too. Hers was a total breakthrough, the first AI voice which felt natural to me across all home assistants etc so far.
 

SJRB

Gold Member
The cool thing about the Sky voice is that it’s basically one big thought experiment. People mention Scarlett Johansson, so you hear Scarlett Johansson.

Now listen to Sky and think of Rashida Jones. Now suddenly Sky sounds like Rashida Jones. You’re just “tricking” your brain.

I have no doubt this won’t end in some legal battle, Sky is clearly not ScarJo, but the way this is handled by OpenAI and Altman is weak.
 
Top Bottom