An image processing apparatus and a method comprises generating a video frame that includes a patch obtained by projecting, onto a two dimensional plane, a point cloud that represents an object having a three-dimensional shape as a group of points; generating a thumbnail two-dimensional image, the thumbnail two-dimensional image being generated independently from the patch; embedding the thumbnail two-dimensional image into the video frame; and encoding the video frame to generate a bitstream.