HumanPartSegmentation : A Machine Learning Model for Segmenting Human Parts

This is an introduction to「HumanPartSegmentation」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

Self-Correction for Human Parsing is a machine learning model released by BaiduResearch in October of 2019 that can perform segmentation for different parts of a person.

Self-Correction for Human ParsingLabeling pixel-level masks for fine-grained semantic segmentation tasks, e.g. human parsing, remains a challenging…arxiv.org

PeikeLi/Self-Correction-Human-ParsingAn out-of-box human parsing representation extractor. Our solution ranks 1st for all human parsing tracks (including…github.com

The following parts are supported.

CATEGORY = (
‘Background’, ‘Hat’, ‘Hair’, ‘Glove’, ‘Sunglasses’, ‘Upper-clothes’, ‘Dress’, ‘Coat’,
‘Socks’, ‘Pants’, ‘Jumpsuits’, ‘Scarf’, ‘Skirt’, ‘Face’, ‘Left-arm’, ‘Right-arm’,
‘Left-leg’, ‘Right-leg’, ‘Left-shoe’, ‘Right-shoe’
)

Below is a result on an input image.

Source：https://github.com/PeikeLi/Self-Correction-Human-Parsing/blob/master/demo/demo.jpg

Inference result

Architecture

HumanPartSegmentation has been trained from 50,000 images of the LIP dataset, but this dataset presents some challenges. In normal segmentation, all the pixels belonging to one instance share the same semantic label, but in human part segmentation, ambiguous boundaries between different semantic parts makes the cost of annotating higher, and often result in noise and mislabeling in the Ground Truth (GT) data.

Example of noise in the GT data (Source：https://arxiv.org/pdf/1910.09777.pdf）

In Self-Correction for Human Parsing (SCHP), it is assumed that the dataset contains noise, and a specific loss function is applied to edges to generate class-agnostic boundaries, combined with a self-correction method used to refine GT label data to achieve more accurate segmentation.

The network architecture uses resnet101 as the backbone and is known as Context Embedding with Edge Perceiving (CE2P). CE2P wasfirst introduced in Devil in the Details: Towards Accurate Single and Multiple Human Parsingpublished in September 2018, which uses a method to improve accuracy by applying a specific loss function on edges between parts of the segmentation. Traditionally, learning is based on the assumption that the GT data is correct, but CE2P assumes that the GT segmentation contains noise, and deals data accordingly.

Devil in the Details: Towards Accurate Single and Multiple Human ParsingHuman parsing has received considerable interest due to its wide application potentials. Nevertheless, it is still…arxiv.org

Source: https://arxiv.org/pdf/1910.09777.pdf

One of the characteristic SCHP is the use of a self-correcting learning cycle to modify the labels of the ground truth data as they learn. As shown in Distilling the Knowledge in a Neural Network published in March 2015, multiclass labels are known to contain dark knowledge. By using pseudo-masks, you can generate soft-target labels that contain dark knowledge, as opposed to one-hot labels that contain only the correct answer labels.

Distilling the Knowledge in a Neural NetworkA very simple way to improve the performance of almost any machine learning algorithm is to train many different models…arxiv.org

SCHP generates less noisy teacher labels, from the perspective of distillation, and a more accurate model by repeatedly training on GT labels, then re-labeling with the trained model, and training again using those new labels.

The generated model was awarded at the CVPR 2019 LIP Challenge.

Usage

You can use the following command to run HumanPartSegmentation on the webcam video stream in ailia SDK.

$ python3 human_part_segmentation.py -v 0

Here are some results.

ailia-ai/ailia-models(Image from https://github.com/PeikeLi/Self-Correction-Human-Parsing/blob/master/demo/demo.jpg) Shape : (1, 3, 473…github.com

Overview

Architecture

Usage

Related topic