• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Nvidia and Remedy use neural networks for eerily good facial animation

dex3108

Member
Interesting use of technology.

Remedy, the developer behind the likes of Alan Wake and Quantum Break, has teamed up with GPU-maker Nvidia to streamline one of the more costly parts of modern games development: motion capture and animation. As showcased at Siggraph, by using a deep learning neural network—run on Nvidia's costly eight-GPU DGX-1 server, naturally—Remedy was able to feed in videos of actors performing lines, from which the network generated surprisingly sophisticated 3D facial animation. This, according Remedy and Nvidia, removes the hours of "labour-intensive data conversion and touch-ups" that are typically associated with traditional motion capture animation.

Aside from cost, facial animation, even when motion captured, rarely reaches the same level of fidelity as other animation. That odd, lifeless look seen in even the biggest of blockbuster games is often down to the limits of facial animation. Nvidia and Remedy believe its neural network solution is capable of producing results as good, if not better than that produced by traditional techniques. It's even possible to skip the video altogether and feed the neural network a mere audio clip, from which it's able to produce an animation based on prior results.

https://www.youtube.com/watch?v=VtttfrmfMZw

https://arstechnica.com/gaming/2017/08/nvidia-remedy-neural-network-facial-animation/?amp=1
 

Ushay

Member
Oh boy that was incredible. Cannot wait to see what Remedy produce next, those guys have always been cinematic wizards, it's sad how underrated their games are.
 

faridmon

Member
Wow, I was pretty meh on it until he moved around his eyebrows and forehead.

Mind blowing stuff

LA Noire looked amazing, why don't they use that technology?

That wasn't a very efficient use of performance capture. It was laborious, expensive and time consuming. on top of the facial performance capture, animators had to use external and post-production effects such as adding polygons and textures to reach that facial expressions.
 

vivekTO

Member
LA Noire looked amazing, why don't they use that technology?

Not saying this looks bad

Because that tech is not feasible to use with motion capture of the performers, they have to sit in front of array of cameras to capture the facial video and than superimpose them on the face mesh. Until unless they figure out something to capture the performance as well, i don't think that tech can be widely use.
 
Holy shit this looks incredible!

rw8qf83.gif


Cannot wait to see what Remedy have cooking :)
 

nOoblet16

Member
LA Noire looked amazing, why don't they use that technology?

Not saying this looks bad
Because that tech was a dead end.
It's wasn't really facial animation as much as it was just a video slapped onto where the face of the character should be.

The limitations of that technique:
1) The ingame model looks exactly like the actor, you cannot use a different model
2) The actors had to sit completely still
3) Couldn't mo cap body and capture facial animations at the same time.
4) They can't retarget the eyes of the model because it's not actually an eye but just part of the video. Meaning there can be disparity in cutscenes.
5) The faces can't really be lit properly
6) It also limits the quality of the shaders

Basically too many limitations and it was a one off case where it managed to work well. For s game with similar style cutscenes, Mafia 3 has amazing facial animations.
 

laxu

Member
This looks great but doesn't it have the same issue as the LA Noire tech where the subject would have to emote without moving their head? Or could this be implemented in a motion capture setup so that actors can use their whole body? Or would this be something to use for a canned set of responses where animators would compose a cutscene or discussion from the responses and maybe body pose change animations?
 

peppers

Member
This plus their lighting technology will probably make Remedy's next game absolutely mind blowing in the graphics department.
 

Magypsy

Member
Because that tech was a dead end.
It's wasn't really facial animation as much as it was just a video slapped onto where the face of the character should be.

The limitations of that technique:
1) The ingame model looks exactly like the actor, you cannot use a different model
2) The actors had to sit completely still
3) Couldn't mo cap body and capture facial animations at the same time.
4) They can't retarget the eyes of the model because it's not actually an eye but just part of the video. Meaning there can be disparity in cutscenes.
5) The faces can't really be lit properly
6) It also limits the quality of the shaders

Basically too many limitations and it was a one off case where it managed to work well. For s game with similar style cutscenes, Mafia 3 has amazing facial animations.

7) Lots of cameras needed; very expensive.

This new neural network tech can be done veeerry cheaply. Hell, train it more and you could feasibly use video of any shape size and angle.
 

Maxey

Member
LA Noire looked amazing, why don't they use that technology?

Not saying this looks bad
Yeah, the lack of depth inside the mouth of the characters sure looked amazing.

It looked good at points but it was a very limited technology.

The new techniques are much better.
 

SomTervo

Member
LA Noire looked amazing, why don't they use that technology?

Not saying this looks bad

As well as all the already-mentioned reasons, I'm pretty sure the LA Noire method was also vastly inefficient in terms of space, which is partly why the game's install size was so massive and it was over 3 DVDs. Could be wrong about that though.
 

tuxfool

Banned
7) Lots of cameras needed; very expensive.

This new neural network tech can be done veeerry cheaply. Hell, train it more and you could feasibly use video of any shape size and angle.

You still need cameras to capture the performance, this method only works for facial capture. In terms of facial capture (on today's methods) you're basically talking about reducing from 2 cameras to one.

This neural net doesn't change the practical setup needed for performance capture, only the procedural ones such as data conversion.
 

EvB

Member
7) Lots of cameras needed; very expensive.

This new neural network tech can be done veeerry cheaply. Hell, train it more and you could feasibly use video of any shape size and angle.

Yep, that's the cool thing. It reduces cost and time in cleanup
 

Magypsy

Member
You still need cameras to capture the performance, this method only works for facial capture. In terms of facial capture (on today's methods) you're basically talking about reducing from 2 cameras to one.

This neural net doesn't change the practical setup needed for performance capture, only the procedural ones such as data conversion.

You're right about standard facial capture, but I was talking about LA Noire's facial capture which required a whole array of cameras.

 

EvB

Member
Yeah, I'm a bit confused here. This is pretty good, but it's not leagues ahead of the competition: https://www.youtube.com/watch?v=IA4bmiXNMoo
It's not about it being better or worse than that other super expensive method of motion capture. Of course you are going to get good results if you are walking around and entire state of the art mo-cap facilty
It's about the high quality results that you can achieve super easily.
People complained about Mass Effect's facial animation. Well this means that there is nothing to stop a developer simply recording the actors as they read their lines and getting more than useable lip sync info.


This shows the footage they captured from. (input frame)
Target are the results using an expensive multi camera setup/ conventional mo-cap setup that big productions use.
Output shows the neural network's attempt using just a single standard video feed with no 3d data.

Basically mo-cap from a smartphone would be possible.

This image shows another 3 attempts by different teams to achieve the same thing, as you can see, they are not even attempt to the gold standard method with all the expensive kit or the neural network method from a single camera feed.
 

tuxfool

Banned
You're right about standard facial capture, but I was talking about LA Noire's facial capture which required a whole array of cameras.

Oh, right. As others have noted that process was at a dead end, and it wasn't even about the requirement of multiple cameras.
 

tuxfool

Banned
Watch both videos again, keep in mind what the actors are having to deal with equipment wise.

In performance capture you're still going to need a head camera attached to a helmet.

They also presented audio driven animation tech.
Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion

Sould be great for games with huge amount of NPCs.

I do question the true practicality of this method for games. The real trick in applying mocap to games is the ability to use low poly and low bone/blendshape count models and still have it look right. There is also the question that raw animation data like this is often in the hundreds of megabytes in size, which is really impractical for use in games.

The methods here may deal with it, but they don't touch upon these considerations.
 

vivekTO

Member
maybe a way to get the lass talented and funded studios to produce animation work on par with titles like uncharted 4

Hellblade | Ninja Theory is aiming the same, Less expensive better result with there Face tech, and i think that looks beautiful.
 
Wonder when we're going to see Remedy's
Max Payne successor
, I mean, "Project 7". I'm hyped after that "shoot dodge" reference Sam Lake gave when talking about it.
 

Dryk

Member
This sort of technology has the potential to save thousands of hours of labour per game. It reallly is incredible.
 
It has far more to do with budget and time than talent.

plenty of studios have large budgets, lots of time, and far largers staffs and cant hold a candle to naughty dogs work. cough ubisoft cough

if you think all developers would produce naughty dog level work given the same budget/time you are out of your mind
 

Popeck

Member
Next step: apply neural networks to design THE PERFECT GAME. Dem neural networks, is there anything they cannot do?
 

nOoblet16

Member
That's just the standard mo-cop tech.

Yeah, I'm a bit confused here. This is pretty good, but it's not leagues ahead of the competition: https://www.youtube.com/watch?v=IA4bmiXNMoo
Watch the video again.
Do you see the actor wearing any dots on his face for capture purpose? That should tell you it's not standard mo cap.

Standard mocap just uses reference points from an actor's face and modified the model accordingly, but at the end of the day the animation is still incomplete. There's only so much data you can gather from using standard mo cap and with 20-30 reference points. You still have to add in the additional detail by hand to make the expression really work.

This provides a much faster way by reducing the amount of hand work it'll require. Lastly, it's doing all of that without using any dots as reference point. That's what's truly impressive about it.


It's not meant to deliver results more realistic than what we have, it's meant to deliver equally realistic results at a faster rate.
 

cm osi

Member
technology is amazing but the more we get closer to photorealism the more i shift toward fantasy
 

nOoblet16

Member
They've already done one better. How about no video at all?

From the article:


It's like the phoneme-driven animation Valve used for HL2 but on steroids.

The main trick with this entire process is that it lives and dies based on the initial samples provided to the neural net. The article mentions that it needs about 10 minutes of really high quality, hand tuned 3D facial mocap first that it can learn from.
Which is not a problem, since once it's been trained you don't need to do it again. And the beauty of it is that it can improve itself on its own even after it's been trained.
 
Because that tech was a dead end.
It's wasn't really facial animation as much as it was just a video slapped onto where the face of the character should be.

The limitations of that technique:
1) The ingame model looks exactly like the actor, you cannot use a different model
2) The actors had to sit completely still
3) Couldn't mo cap body and capture facial animations at the same time.
4) They can't retarget the eyes of the model because it's not actually an eye but just part of the video. Meaning there can be disparity in cutscenes.
5) The faces can't really be lit properly
6) It also limits the quality of the shaders

Basically too many limitations and it was a one off case where it managed to work well. For s game with similar style cutscenes, Mafia 3 has amazing facial animations.

What about the fps being stuck at 30fps for the animations?
 
This is incredible. As someone who had to do minor rig work in animation at one point (Just in school, switched majors since) and have also witnessed how much work goes into Mocap stuff I really can appreciate how big this has the potential to be. Thanks people for sharing additional info, pictures etc that help put it into even better perspective though.
 
Top Bottom