Introduction

Hi everyone, we are here with the Neurosell team. In today's article we would like to share our experience in training neural networks on the example of our product - Virton virtual fitting room. We will tell you about the complexities of the algorithm, user experience and training opportunities.

First things first, let's review what our goals were:

Maximum preservation of context to get an accurate result of fitting in the image (for example, the user wants to match a T-shirt to his shorts in style;
Accurate product image transfer with preserving poses, adding details of fabric folds, etc;
Relative economy of algorithms;

What do we use for training and work?

UNet-based generative and reference models;
PyTorch under the hood, transformers and diffusers;
SCHP for pose detection and correction;
DensePose for body pixelization;
NVidia Cuda-based servers;

Now that we have a quick look at the goals and technologies, let's break down what we are doing to train the algorithms, looking at the main challenges and current results.

Training algorithms, metrics in current format

Initial training of the algorithms was performed on the large DeepFashion database, which offers many variations of different fashion items. Other sources for pose detection were also used. However, there is a disadvantage in using such databases - they do not provide real user experience and train models in ideal conditions.

What to do in such a case? Run beta versions of the algorithms!

As a result, real users start trying on clothes, creating new generations and pre-training the algorithm. And in the case of pilot products on online stores, we can make the algorithm even better, getting new styles, different users with their own understanding of how the services work.

So, what we have achieved in two weeks of training the algorithms:

At launch and in the first week of operation - generation accuracy was in the neighborhood of 30%;
On the second week of work, some correction of UX and internal algorithms - the accuracy increased to 60%;

Thus, by pre-training the algorithms on real users we bring the results to commercial values and this is only for 2 weeks of work.

What affects the accuracy of an algorithm?

Under ideal conditions, of course, we get great results, but in the real world it doesn't work that way. We get different users who often take photos that are not quite right for the neural network to work. And all of this has to be taken into account.

In our case, accuracy is affected by:

The quality of the original photos;
Compliance with color rules (if the person in the photo starts to merge with the background, or if he or she has monochromatic clothes that are difficult to segment);
Presence of other people in the frame;
Using photos that are not full-length;
Posing (crossed legs, arms, etc.);

The more of these indicators are combined together, the lower the quality of generation becomes. What to do in this case:

First work through the UX and explain to the user in clear language under what conditions they might have a bad result;
Do-train models on segmentation in various non-ideal poses for the algorithm;
Implement Multi-View Pose Transfer (for sideways photos and other body rotations;
Implement body part intersection detection, e.g. using pose detection algorithms (there are good examples on Tensorflow) and alert the user to problems;

Generation examples - from successes to epic fails

And as the final block of this article - we decided to share with you the results of generation, among which there are excellent, and there are quite funny and even strange.

Conclusion

What I would like to summarize in the end: any neural networks trained on ideal data and various open databases, as a rule, do not take into account the user factor, which definitely affects the result. Therefore, it is worthwhile to densely engage in pre-training of models on your real users, work with UX and look for new options to improve algorithms (for example, by introducing new additional tools).

And as always, discussions and your questions are welcome!

Let's talk about neural network training on the example of Virton's virtual fitting room

Table of contents