• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Nvidia and Remedy use neural networks for eerily good facial animation

Tagyhag

Member
This is fantastic to see. Yes LA Noire was great but the amount of work and limitations were insane, it was clearly not feasible tech.

plenty of studios have large budgets, lots of time, and far largers staffs and cant hold a candle to naughty dogs work. cough ubisoft cough

if you think all developers would produce naughty dog level work given the same budget/time you are out of your mind

Specific game example? Please don't say an open world game lol.
 
This is fantastic to see. Yes LA Noire was great but the amount of work and limitations were insane, it was clearly not feasible tech.



Specific game example? Please don't say an open world game lol.

oh the typical "but its open world" crux

fine, replace naughty dog with zero dawn. not that it really matters much for facial animation of the select few main protagonists anyway
 

Shpeshal Nick

aka Collingwood
The facial animations in Quantum Break were already pretty good in cut scenes. Some of the best out there.

If this can improve further on those then their next game should look nice.
 
Need to be more clear what they're using the Neural Networks for.

What are they trying to predict and whether it's regression or classification.

It's certainly not transforming words and sounds into facial animation because it's copying his nonverbal facial animation. So what's the NN used for and why is it helpful when we already have good motion cap tech?
 

Tagyhag

Member
oh the typical "but its open world" crux

fine, replace naughty dog with zero dawn. not that it really matters much for facial animation of the select few main protagonists anyway

Zero Dawn's facial animations were fine, but nothing compared to UC4.

I think you fail to realize that different developers have different goals, and if someone is making an open-world game, without a heavy focus on its characters and narrative, the facial animation isn't going to look as good as something like Uncharted where the focus is clearly the characters.

Then you add the extra time that 1st party devs get (Not to mention just having to work with one version) compared to multiplat games that only get 1 or 2 years of dev time for multiple versions of a game (Horizon had a 6 year development cycle) and we come to the conclusion that "it just boils down to talent and nothing else" is the wrong idea.

That's not to say Devs like Naughty Dog and Guerrilla aren't talented, they obviously are. But give them the same limitations and time constraints that other devs get and you'll see that their work will suffer as well.
 
Zero Dawn's facial animations were fine, but nothing compared to UC4.

I think you fail to realize that different developers have different goals, and if someone is making an open-world game, without a heavy focus on its characters and narrative, the facial animation isn't going to look as good as something like Uncharted where the focus is clearly the characters.

Then you add the extra time that 1st party devs get (Not to mention just having to work with one version) compared to multiplat games that only get 1 or 2 years of dev time for multiple versions of a game (Horizon had a 6 year development cycle) and we come to the conclusion that "it just boils down to talent and nothing else" is the wrong idea.

That's not to say Devs like Naughty Dog and Guerrilla aren't talented, they obviously are. But give them the same limitations and time constraints that other devs get and you'll see that their work will suffer as well.

check my original post, i never said it was only a talent issue. i did however dismiss the notion that talent isnt a large part of animation quality
 

Nev

Banned
Good thing we already have Injustice 2 and don't have to wait 5 years for another middling Remedy game to see facial animation on this level.
 
Y'all never gonna touch Medal Of Honor!

medal_of_honor_uncanny_by_digi_matrix-dbiokcc.gif
 

Adnor

Banned
Which is not a problem, since once it's been trained you don't need to do it again. And the beauty of it is that it can improve itself on its own even after it's been trained.
As someone who knows nothing about programming the fact that they train programs to improve by themselves is so alien and so cool for me.
 

Crossing Eden

Hello, my name is Yves Guillemot, Vivendi S.A.'s Employee of the Month!
oh the typical "but its open world" crux

fine, replace naughty dog with zero dawn. not that it really matters much for facial animation of the select few main protagonists anyway
Are you seriously insinuating that there won't inherently be challenges in getting top tier facial quality out of a 3rd party multiplat open world game compared to a really linear 1st party game? Zero Dawn doesn't even have relatively good facial animation as the majority of scenes have dead eye system due to being procedurally generated, (not to mention the subpar camera work for this day and age).
It's straight up a time and budget issue for the most part and then there's stuff like asset quality. Unity was the longest amount of time an AC game spent in development with the largest budget and they pushed for this level of asset quality. Hell there's straight up crossover between the studios in terms of the employees if you look at animator reels so to say "it's all talent" is pretty inaccurate, pretty much every large studio working on a triple A game is hiring the most talented animators in the industry so it's really how that team is used and how much time and money they're given. To drive the point one of the animators who worked on the facial animation of this scene, also did the facial animation for this. Why do you think the facial animation looks a decent margin worse in Lost Legacy than it does in UC4?
 

Shpeshal Nick

aka Collingwood
oh the typical "but its open world" crux

fine, replace naughty dog with zero dawn. not that it really matters much for facial animation of the select few main protagonists anyway

Don't confuse visual quality with facial animation. Facial animation definitely isn't one of Horizon Zero Dawn's strengths. Uncharted has great facial animation.
 
Seems like they have to stand extremly still from those videos. ND if I am not misstanken use the same mocap both for the actor movement and speech (the same shot). That to me seems far superior since you capture the whole actor in the moment and dont have to rely on combining two different ones.
 

JaseC

gave away the keys to the kingdom.
Seems like they have to stand extremly still from those videos. ND if I am not misstanken use the same mocap both for the actor movement and speech (the same shot). That to me seems far superior since you capture the whole actor in the moment and dont have to rely on combining two different ones.

Much of Naughty Dog's facial animation is done by hand, though. The idea behind this technology is that it makes that laborious process significantly easier if not unnecessary. Even if the actors had to do mocap for a given cutscene and then work through the script again for facial animation, that'd still be preferable to animators being tasked with the heavy lifting.
 

Agent_4Seven

Tears of Nintendo
It looks cool and all, but why they're focusing only on facial animation (hell, even movement of the neck is not captured there)? I mean, what's the point in realistic factial animation if the rest of the body is not even close to be as realistic as face and not acting as realistic as face with bare bones textures, shit load of clipping etc. L.A. Noire is the prime example of how bad it can look if the rest of the body has bare bones animations (along with other problems) with realistic head attached to it.
 
Much of Naughty Dog's facial animation is done by hand, though. The idea behind this technology is that it makes that laborious process significantly easier if not unnecessary. Even if the actors had to do mocap for a given cutscene and then work through the script again for facial animation, that'd still be preferable to animators being tasked with the heavy lifting.

The point I was trying to make was that having a single shot is better than combining two not from a tecnical perspective but from the acting itself. Grabbing the person in the moment and how they use their entire body (headmovment, arms, legs, facial expression) combined with the talent of the manual animation I belive will provide a far better result than combining the movement with som facial tech where the actor can not even move the head from what it seems and could provide some really uncanny results IMO.
 

Kinyou

Member
oh the typical "but its open world" crux

fine, replace naughty dog with zero dawn. not that it really matters much for facial animation of the select few main protagonists anyway
Doesn't Horizon zero dawn perfectly show the crux of open world? Compare the face animations from Horizon with Killzone Shadow fall. Same developer but Shadowfall has much more accurate face animations because they obviously had to do so much less.

RjVUmYy.gif
 

JaseC

gave away the keys to the kingdom.
The point I was trying to make was that having a single shot is better than combining two not from a tecnical perspective but from the acting itself. Grabbing the person in the moment and how they use their entire body (headmovment, arms, legs, facial expression) combined with the talent of the manual animation I belive will provide a far better result than combining the movement with som facial tech where the actor can not even move the head from what it seems and could provide some really uncanny results IMO.

Yeah, I understand that. What I was getting at with my closing sentence is that the technology, even in its current state, could have its place in a mocap setup like Naughty Dog's. The cutscene is played out and mocapped as usual, then reread for nuanced facial animation generation, and finally, if need be, the animators touch up the end result.
 

tuxfool

Banned
I know ND used facial capture this time around, but it's my understanding that there was still a fair amount of manual work involved in achieving the final result.

This is true for any performance capture pipeline. It still takes a lot of work to get performance capture looking good.
 

JaseC

gave away the keys to the kingdom.
This is true for any performance capture pipeline. It still takes a lot of work to get performance capture looking good.

Right. Which circles this discussion back to what I said before about the current iteration of the technology being aimed at streamlining the process of creating nuanced facial animation. ;)
 
Apart from the lack of eye contact that ain't bad at all. Reminds me a bit of how good RE7s faces looked.

tumblr_okqj1lEznv1vm2ftdo2_400.gif
1485923921024.gif


Shame about the hair :\

The uncannyness in RE's faces are fine for a horror game, she looking creepy is totally fine. I'm impressed by how good they look in-game, RE 7 is a gorgeous game.
 
Need to be more clear what they're using the Neural Networks for.

What are they trying to predict and whether it's regression or classification.

It's certainly not transforming words and sounds into facial animation because it's copying his nonverbal facial animation. So what's the NN used for and why is it helpful when we already have good motion cap tech?

It uses a RNN, so it could use both video and audio and leverage LSTM to successfully predict the desired outcome.
 

Wiseblood

Member
Not only was LA Noire's facial capture a technological dead end, but I don't think the result has aged particularly well. The faces still look nicely animated but when playing it on a modern PC with a high-res display the faces actually look blurry.
 

tuxfool

Banned
Right. Which circles this discussion back to what I said before about the current iteration of the technology being aimed at streamlining the process of creating nuanced facial animation. ;)

Yeah, it definitely should be a goal. However, as I speculated earlier (aside from the raw nature of this technique), you're still going to reduce animations to fit simpler facial models with fewer bones. You're almost certainly, also going to have to compress animation data.

For the foreseeable future that is still going to require a lot of manual labour to get the best results.
 

Crossing Eden

Hello, my name is Yves Guillemot, Vivendi S.A.'s Employee of the Month!
Much of Naughty Dog's facial animation is done by hand, though. The idea behind this technology is that it makes that laborious process significantly easier if not unnecessary. Even if the actors had to do mocap for a given cutscene and then work through the script again for facial animation, that'd still be preferable to animators being tasked with the heavy lifting.
ND no longer solely keyframes facial animation for cutscenes. Their models are now too complicated for that to be feasible. Actually the worst looking cutscenes in UC4 are those that were made before they started using facial capture.
 

strafer

member
Give me Alan Wake 2 with those facials and you have my money.

The animations in the first game was the only thing that kept it from being a perfect game.
 

spiderferi

Member
Need to be more clear what they're using the Neural Networks for.

What are they trying to predict and whether it's regression or classification.

It's certainly not transforming words and sounds into facial animation because it's copying his nonverbal facial animation. So what's the NN used for and why is it helpful when we already have good motion cap tech?

I think it would be a regression, doesn't seem that classification would be the case here. It would really lower the costs if the system worked with a good accuracy. Just capture the scene and then feed it to the NN.

The only problem I have with this is that it feels a little late. not that It's a bad thing or sth but given how the technology and AI have progressed and with the rapid advancements in fields like Deep Learning, I expected people would go for things like this sooner.
 

Crossing Eden

Hello, my name is Yves Guillemot, Vivendi S.A.'s Employee of the Month!
Apart from the lack of eye contact that ain't bad at all. Reminds me a bit of how good RE7s faces looked.

tumblr_okqj1lEznv1vm2ftdo2_400.gif
1485923921024.gif


Shame about the hair :\
Medal of Honor is a straight up case study in the uncanny valley. They missed the mark by such an absurd degree.
 

nOoblet16

Member
plenty of studios have large budgets, lots of time, and far largers staffs and cant hold a candle to naughty dogs work. cough ubisoft cough

if you think all developers would produce naughty dog level work given the same budget/time you are out of your mind

On the contrary Unity has amazing facial animation.
ND no longer solely keyframes facial animation for cutscenes. Their models are now too complicated for that to be feasible. Actually the worst looking cutscenes in UC4 are those that were made before they started using facial capture.

He said "much of the facial animations in Uncharted 4" not all of the facial animations. basically eventhough they use facial captures they still do a lot of handwork on it later. The game's pseudo realistic art style sort of demands that the facial animations are exaggerated as well and hence eventhough they have the mocap reference they do a lot of fine tuning, probably more than other games in order to achieve their look.
 

Mutant

Member
"Okay the wireframe looks nice but why do they have the recorded actor in a frame and not the final work?

...

OH GOD THAT IS THE FINAL WORK ISN'T IT?"
 

Crossing Eden

Hello, my name is Yves Guillemot, Vivendi S.A.'s Employee of the Month!
He said "much of the facial animations in Uncharted 4" not all of the facial animations. basically eventhough they use facial captures they still do a lot of handwork on it later. The game's pseudo realistic art style sort of demands that the facial animations are exaggerated as well and hence eventhough they have the mocap reference they do a lot of fine tuning, probably more than other games in order to achieve their look.
That's pretty much every studio ever tho since the inception of performance capture in the industry. You can't just drop facial capture, (or any mocap data really), data on a model and call it a day. Another note is that, judging by all the animation reels
that I may have spent a VERY large amount of time watching on vimeo and youtube.
, the majority of scenes for UC4 use facial capture. Seems the scenes where they solely used keyframing are most prominent in the Italy chapter.
 

nOoblet16

Member
That's pretty much every studio ever tho since the inception of performance capture in the industry. You can't just drop facial capture, (or any mocap data really), data on a model and call it a day.
I've already acknowledged that all other games do it and that's how it works. What I meant to say is that the balance between capture and animation by hand probably skews more towards animating by hand for Uncharted 4 than other games. Which would mean their amazing facial animation is primarily because of the artists than their capture technology.
 

Crossing Eden

Hello, my name is Yves Guillemot, Vivendi S.A.'s Employee of the Month!
I've already acknowledged that all other games do it and that's how it works. What I meant to say is that the balance between capture and animation by hand probably skews more towards animating by hand for Uncharted 4 than other games. Which would mean their amazing facial animation is primarily because of the artists than their capture technology.
Yea I would still argue that that really isn't true.
 

tuxfool

Banned
I've already acknowledged that all other games do it and that's how it works. What I meant to say is that the balance between capture and animation by hand probably skews more towards animating by hand for Uncharted 4 than other games. Which would mean their amazing facial animation is primarily because of the artists than their capture technology.

it really isn't.
 

JaggedSac

Member
Guess this is a good a place as any to say that I just finished Quantum Break and that was a damn fine game. Gorgeous graphical effects and a neat story. Shame it didn't do better. Gunplay was merely decent.

Hopefully this new tech could help them finish their story driven games quicker and cheaper. They really are a unique dev house.
 

JaseC

gave away the keys to the kingdom.
Yeah, it definitely should be a goal. However, as I speculated earlier (aside from the raw nature of this technique), you're still going to reduce animations to fit simpler facial models with fewer bones. You're almost certainly, also going to have to compress animation data.

For the foreseeable future that is still going to require a lot of manual labour to get the best results.

Oh, sure, I'm not arguing that the technology as it currently exists could be plopped into a production pipeline and leave animators twiddling their thumbs. I mean, theoretically, that could eventually happen, but obviously we're not there yet, not entirely unlike how machine learning-powered Twitter bots tend to trip over language rules and spit out messy sentences if not gibberish.

ND no longer solely keyframes facial animation for cutscenes. Their models are now too complicated for that to be feasible. Actually the worst looking cutscenes in UC4 are those that were made before they started using facial capture.

Yeah, I'm aware. I did imply by way of "much" that animators weren't the only source of facial animation and clarified as much just several posts later.
 
Using neural networks to teach a machine how to perfectly mimic human expression and emotion? Can't imagine what could possibly go wrong.

skynet.png
 
Sort of, I was curious and found the paper that this article is about. It's really cool, and neural nets are going to play a major role in this type of automation in the future, but this early implementation turns out to have some big limitations at the moment (basically, all the same limitations as L.A. Noire's facial capture):

1. Every single user needs to be calibrated independently for the system to work:

(That 5-10 minute dataset is the labor intensive part, and obviously doesn't only take 5-10 minutes to produce).

Similarly, while I'm sure later implementations of neural networks for this type of animation automation will have self-improvement, this one doesn't seem to. Not only does it lack self-improvement, but part of the hand-tweaked training footage used by the system outlined in this paper specifically calls for performance of the actor in character. While that helps with error correction for that singular character's performance, it's not robust enough yet to even error correct a single user with a fully neutral training dataset.

2. It turns out that while their intention is to eventually get this system up and running based on head-mounted face cams that are typically used for simultaneous facial and motion capture (like the one Ashley Johnson is wearing with the face markers in the Naughty Dog video linked above in this thread), their current implementation doesn't work with that. Instead all subsequent input footage provided to the neural net after it's finished learning from the training dataset is still coming from the same controlled environment and lighting setup used in training.

Again, I'm sure they'll get there, but it's not a trivial task that has already been handled. In fact, I'm guessing that they're going to need to rework the way they feed and tell the neural net how to interpret the arbitrary input footage, as right now all input footage is being converted to grayscale and cropped to 320x240 so the neural net looks at each frame as set of 76.8k scalar values (I'm assuming 0-255 range). Introducing a moving background and variable lighting as an actor shifts and moves in a scene doing simultaneous body and facial motion capture would disrupt a lot of what their current neural net is assuming it will receive from the incoming footage.

It's basically using the fact that the actor, background, and lighting setup are all constants to allow the system to look at each pixel as an advanced face tracking dot, but if any of those aspects are changed, it's like wiping the face tracking dots off an actor's face, or moving them to incorrect locations and screws with the automation.

So the system, as it exists right now, has almost all of the same constraints as the L.A. Noire facial capture system requiring the actor, environment, and lighting all be identical but produces higher quality 3D animations and doesn't require as elaborate a camera rig as LAN used. It'll get better, but it's not there yet, and this implementation certainly isn't learning and bettering its error correction over time beyond the initial training phase. Neural nets are definitely the future, though, and I stumbled across another video of a character animation system that was using a neural net to achieve realtime results for doing the equivalent of IK blended animation, only which looked much more natural than typical IK animation corrections in games.

Probably the coolest part I learned from the article is that this system doesn't use any temporal smoothing, but instead processes each frame completely independently of all others, and yet its results are so high quality that it appears perfectly smooth when played as animation. That's quite a feat in its own right and something hand-tweaking never achieves on its own.

This doesn't sound too great at the moment. This current technique has a lot of limitations too it seems.
 
"Okay the wireframe looks nice but why do they have the recorded actor in a frame and not the final work?

...

OH GOD THAT IS THE FINAL WORK ISN'T IT?"

Ehhh... no, it's not. It's the input video for the neural network.

... wait, when people say it looks amazing, they couldn't possibly be thinking the first frame is the generated CG, right?
 
Ehhh... no, it's not. It's the input video for the neural network.

... wait, when people say it looks amazing, they couldn't possibly be thinking the first frame is the generated CG, right?

No, most of can see the ovious that frame 1 is the actor and frame 3 (right side) is their output/final.

The audio-driven one is pretty cool as well.

Yep. The audio-driven stuff excites me far more. Animating 100s or 1000s of lines of dialogue based on 5-10 minutes of training captures and the rest strictly recorded audio could drastically improve NPCs in RPGs and other open world games. And the could be changed on-the-fly with new audio, theoretically. The implications could be huge, but will take years to come to fruition if the tech gets adopted by other devs.
 
Top Bottom