Pix2pixHD is a method for synthesizing high resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (CGANs). It can generate high resolution image results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures.
In less technical terms, pix2pixHD is a straightforward way to generate high-resolution images with nearly endless options to change small and large details about the images. This is done by drawing on a label map (Fig. 1) and then translating the drawings using GANs to produce HD image outputs (Fig. 2).
In Fig. 2, you can see examples where significant changes were made. On the left, some of the cars have different colors, the shadow has been removed from the sidewalk, and the ground has been changed from asphalt to bricks. On the right, trees have been added across the top of the image and the sidewalk on the right is lighter with a green tint. There are more small details changed in each image that you can notice upon closer inspection.
The pix2pixHD methodology was originally introduced in 2017 in an academic paper by Ph.D. researchers from the University of California, Berkeley in coordination with NVIDIA Corporation. A revised version of the paper was published in August 2018. The code is available on github.
A paper titled Everybody Dance Now went on to modify the adversarial training setup of pix2pixHD in order to produce temporally coherent video frames such that the moves of a dancer in a source video were translated onto a target who appears to be doing the same dance moves in a second video but it is in fact a generated video.