Face2face github

4/5/2023

For training of the model, I used Daniel Hesse’s amazing pix2pix TensorFlow (TF) implementation which is really well documented. Luckily, at the workshop Gene also pointed out some existing codebase for generative models like pix2pix. This video was especially suited as the camera position was kind of static so that I could get a lot of images with the same positions of her face and background.

At the end, I decided to go with Angela Merkel’s (German chancellor) New Year’s speech in 2017. I looked up several potential videos on YouTube that I could use to create the data that ranges from interviews to speeches from prominent persons.However, this is something that I can try out later to improve performance even more.

On another blog article by Satya Mallick, he also recommended to skip frames but I didn’t do this as fps was decent enough now. Reducing the size of the frame by factor four improved fps a lot. I figured out that input frame was just too big.

One of the problem that I had was that at my first implementation the face landmark detector was extremely laggy (very low frames per second-fps).
The pose estimator is an implementation of the paper: One Millisecond Face Alignment with an Ensemble of Regression Trees by Vahid Kazemi and Josephine Sullivan, CVPR 2014.
First, a face detector is used to detect the faces and then the pose estimator is applied on the detected face.
Detecting the facial landmarks is a two-stage process.
For this, I used Dlib’s pose estimator which can detect 68 landmarks (mouth, eyebrows, eyes etc…) on a face along with OpenCV to process the video file: The first thing I had to do was to create the dataset. After this workshop I decided to create my own project similar to what Gene did with the face tracker but with a different person. This kind of demo was really refreshing for me as I’m usually not exposed to those kind of projects at my job.

0 Comments

Face2face github

Leave a Reply.

Author

Archives

Categories