Introduction

While designing fashion catalogs, it is often desired to have control over model features such as ethnicity, pose, color, build, etc. Traditionally, this has been done by shooting catalogs with different models to suit the purpose. However, the current state of the art research in generative image modelling shows promising results in related tasks such as human face generation, which involves generating high resolution human faces. Fig 2 shows few generated multi ethnicity model faces.

Fig. 2: An example giving a glimpse of the different variations possible within model faces generated with Flixstock Model Faces

At Flixstock, we’re working towards generating complete human bodies Fig 1. While this is very relevant to us and the fashion industry in general, here are a few challenges involved.

Fig. 1: Models come in a variety of different poses. Notice the fluidity in the relative positions of various body parts, especially the hands and legs.

Challenges

  • One, the spatial relationship between different parts of a human body is more fluid than human faces. While the nose will always be situated below the eyes, the hands and legs of a model can have different spatial configurations. This spatial fluidity also makes it difficult for generative models to produce such images with high fidelity.
  • Two, there is a lot of interdependence between bodily features, which makes it difficult for generative models to model disentangled representations. For example, if a dark model is typically seen with black, curly hair, it becomes difficult to model a dark model with blonde hair. This also highlights the more general problem of data bias, which affects almost every machine learning model. 
  • Three, regardless of the rest of the body, faces need to be modelled with utmost precision. This is because a face is the most distinguishing feature of the human body, and the mind is quite sensitive to distortions in faces. Generating high quality faces becomes difficult for reasons mentioned above.
  • Four, it is desired to have control on model characteristics such as expression, hairstyle, pose Fig 3 ,Fig 4,Fig 5  etc. This means that we need to generate multiple images showing a diverse variety of these characteristics while also maintaining the identity of the model. Controlling these features requires having control over the latent space of generative models, which is still an active area of research.
  • Lastly, all data collection exercises during the process have to honor their respective intellectual property rights and agreements.

Fig. 5: Controlling the pose of models is a much desired feature. On the left, we have a source image whose pose we want to change. In the middle is a reference image whose pose we want to copy. On the right, is a generated image which has the pose of the reference image while still preserving the identity of the source image.

Fig. 4: An example strip which shows control over the hair color of a model. Each pair of images contains the same model with but with a dark or a light hair color.

Fig. 3: These examples represent the level of control that we aim to achieve over the smile of a model, which is a very distinct characteristic of the image. Each pair of images contains the same model with varying levels of smile on their faces.