pocketpose.models.body#

Submodules#

Package Contents#

Classes#

TFLiteModel

Interface for all TensorFlow Lite models.

EfficientPose

Base class for EfficientPose models.

EfficientPoseRTLite

EfficientPose-RT Lite model.

EfficientPoseILite

EfficientPose-I Lite model.

EfficientPoseIILite

EfficientPose-II Lite model.

EfficientPoseRT

EfficientPose-RT model.

EfficientPoseI

EfficientPose-I model.

EfficientPoseII

EfficientPose-II model.

EfficientPoseIII

EfficientPose-III model.

EfficientPoseIV

EfficientPose-IV model.

TFLiteModel

Interface for all TensorFlow Lite models.

MoveNet

Base class for the MoveNet models.

MoveNetLightning

MoveNet Lightning model.

MoveNetLightningFP16

MoveNet Lightning model with float16 quantization.

MoveNetLightningINT8

MoveNet Lightning model with int8 quantization.

MoveNetThunder

MoveNet Thunder model.

MoveNetThunderFP16

MoveNet Thunder model with float16 quantization.

MoveNetThunderINT8

MoveNet Thunder model with int8 quantization.

HeatmapDeocder

Base class for all decoders.

PoseNetDecoder

TFLiteModel

Interface for all TensorFlow Lite models.

PoseNet

Base class for PoseNet models.

PoseNetSinglePerson

MoveNet Lightning model.

PoseNetMultiPerson

MoveNet Lightning model.

SimCCDecoder

Base class for all decoders.

ONNXModel

Interface for all ONNX models.

RTMPose

Base class for RTMPose models.

RTMPoseM

MoveNet Lightning model.

Functions#

get_skeleton(→ pocketpose.datasets.skeletons.Skeleton)

Attributes#

pocketpose.models.body.get_skeleton(name) pocketpose.datasets.skeletons.Skeleton#
class pocketpose.models.body.TFLiteModel(model_path: str, model_url: str, **kwargs)#

Bases: pocketpose.models.interfaces.imodel.IModel

Interface for all TensorFlow Lite models.

We assume that the model has a single input, but it can have multiple outputs.

process_image(image)#

Default implementation of process_image() for models that don’t need preprocessing.

This method can be overridden by subclasses to implement model-specific preprocessing.

Args:
image (np.ndarray): The image to prepare for prediction. The image is a numpy

array with shape (1, height, width, channels) and dtype uint8 (range [0, 255]).

get_output(output_idx: int) numpy.ndarray#

Returns the output tensor of the model.

Args:

output_idx (int): The index of the output tensor to return.

Returns:

The output tensor as a numpy array.

predict(image: numpy.ndarray) Any#

Predicts the pose of the image.

Args:
image (np.ndarray): The image to predict the pose of. The image has

the shape and dtype expected by the model.

Returns:

The prediction returned by the model. This can be a single tensor or a tuple of tensors, depending on the model.

pocketpose.models.body.model_registry#
class pocketpose.models.body.EfficientPose(model_path: str, model_url: str, input_size: tuple, real_time: bool = False, lite: bool = False)#

Bases: pocketpose.models.interfaces.TFLiteModel

Base class for EfficientPose models.

process_image(image)#

Default implementation of process_image() for models that don’t need preprocessing.

This method can be overridden by subclasses to implement model-specific preprocessing.

Args:
image (np.ndarray): The image to prepare for prediction. The image is a numpy

array with shape (1, height, width, channels) and dtype uint8 (range [0, 255]).

reorder_keypoints(keypoints: numpy.ndarray) numpy.ndarray#

Sort the keypoints according to match the expected order.

EfficientPose outputs the keypoints in a different order than the expected order, so we need to reorder them. This function takes the predicted keypoints, maps them to the expected order and returns the reordered keypoints.

postprocess_prediction(prediction, original_size)#

Postprocesses the prediction to get the keypoints.

Args:
prediction (Any): The raw prediction returned by the model. This can

be a single tensor or a tuple of tensors, depending on the model.

original_size (tuple): The original size of the input image as (height, width).

Returns:

The predicted keypoints as a list of (x, y, score) tuples.

decode_heatmaps(heatmaps, original_size)#

Decode the heatmaps to keypoints coordinates.

Args:

heatmaps (np.ndarray): Numpy array of shape (1, H, W, K)

Returns:

List of predicted coordinates of shape (K, 3) as (x, y, score)

class pocketpose.models.body.EfficientPoseRTLite#

Bases: EfficientPose

EfficientPose-RT Lite model.

class pocketpose.models.body.EfficientPoseILite#

Bases: EfficientPose

EfficientPose-I Lite model.

class pocketpose.models.body.EfficientPoseIILite#

Bases: EfficientPose

EfficientPose-II Lite model.

class pocketpose.models.body.EfficientPoseRT#

Bases: EfficientPose

EfficientPose-RT model.

class pocketpose.models.body.EfficientPoseI#

Bases: EfficientPose

EfficientPose-I model.

class pocketpose.models.body.EfficientPoseII#

Bases: EfficientPose

EfficientPose-II model.

class pocketpose.models.body.EfficientPoseIII#

Bases: EfficientPose

EfficientPose-III model.

class pocketpose.models.body.EfficientPoseIV#

Bases: EfficientPose

EfficientPose-IV model.

class pocketpose.models.body.TFLiteModel(model_path: str, model_url: str, **kwargs)#

Bases: pocketpose.models.interfaces.imodel.IModel

Interface for all TensorFlow Lite models.

We assume that the model has a single input, but it can have multiple outputs.

process_image(image)#

Default implementation of process_image() for models that don’t need preprocessing.

This method can be overridden by subclasses to implement model-specific preprocessing.

Args:
image (np.ndarray): The image to prepare for prediction. The image is a numpy

array with shape (1, height, width, channels) and dtype uint8 (range [0, 255]).

get_output(output_idx: int) numpy.ndarray#

Returns the output tensor of the model.

Args:

output_idx (int): The index of the output tensor to return.

Returns:

The output tensor as a numpy array.

predict(image: numpy.ndarray) Any#

Predicts the pose of the image.

Args:
image (np.ndarray): The image to predict the pose of. The image has

the shape and dtype expected by the model.

Returns:

The prediction returned by the model. This can be a single tensor or a tuple of tensors, depending on the model.

pocketpose.models.body.model_registry#
class pocketpose.models.body.MoveNet(model_path: str, model_url: str, input_size: tuple)#

Bases: pocketpose.models.interfaces.TFLiteModel

Base class for the MoveNet models.

MoveNet is a lightweight pose estimation model developed by Google Research that runs on mobile devices. It uses a lightweight MobileNetV2 backbone and a Feature Pyramid Network (FPN) decoder together with CenterNet-style keypoint prediction heads. The model is trained on the COCO dataset and can detect 17 keypoints.

For more information, see the following links: - https://www.tensorflow.org/hub/tutorials/movenet - https://blog.tensorflow.org/2021/05/next-generation-pose-detection-with-movenet-and-tensorflowjs.html

postprocess_prediction(prediction, original_size)#

Postprocesses the prediction to get the keypoints.

Args:
prediction (Any): The raw prediction returned by the model. This can

be a single tensor or a tuple of tensors, depending on the model.

original_size (tuple): The original size of the input image as (height, width).

Returns:

The predicted keypoints as a list of (x, y, score) tuples.

class pocketpose.models.body.MoveNetLightning#

Bases: MoveNet

MoveNet Lightning model.

The Lightning model is the smallest MoveNet model and is intended for latency-critical applications.

class pocketpose.models.body.MoveNetLightningFP16#

Bases: MoveNet

MoveNet Lightning model with float16 quantization.

class pocketpose.models.body.MoveNetLightningINT8#

Bases: MoveNet

MoveNet Lightning model with int8 quantization.

class pocketpose.models.body.MoveNetThunder#

Bases: MoveNet

MoveNet Thunder model.

The Thunder model is the largest MoveNet model and is intended for high accuracy applications. This model gives better predictions than the Lightning variants, but is also slower.

class pocketpose.models.body.MoveNetThunderFP16#

Bases: MoveNet

MoveNet Thunder model with float16 quantization.

class pocketpose.models.body.MoveNetThunderINT8#

Bases: MoveNet

MoveNet Thunder model with int8 quantization.

class pocketpose.models.body.HeatmapDeocder#

Bases: pocketpose.models.decoder.base_decoder.Decoder

Base class for all decoders.

Decoders are used to decode the prediction of pose models into a keypoint list in the image coordinate system. The keypoint list is a list of tuples (x, y, score) where x and y are the coordinates and score is the prediction confidence.

All decoders must implement the decode method. Each model has a corresponding decoder, and the decode method is automatically called when the model is used for prediction.

decode(prediction, image_shape)#
class pocketpose.models.body.PoseNetDecoder(output_stride=32, local_maximum_radius=1, threshold=0.5)#
decode_multi_pose(heatmaps, offsets, displacement_fwd, displacement_bwd)#
build_part_with_score_queue(scores)#
score_is_maximum_in_local_window(keypointId, score, heatmapY, heatmapX, scores)#
traverse_to_target_keypoint(keypoints, displacements, direction, scores, offsets)#
get_edge_keypoints(edge_id, direction)#
estimate_target_keypoint_position(edge_id, source_keypoint_id, target_keypoint_id, keypoints, displacements, scores, offsets)#
get_displacement(edge_id, keypoint, displacements)#
get_strided_index_near_point(point, output_stride, displacements)#
get_instance_score(keypoints)#
sigmoid(x)#
class pocketpose.models.body.TFLiteModel(model_path: str, model_url: str, **kwargs)#

Bases: pocketpose.models.interfaces.imodel.IModel

Interface for all TensorFlow Lite models.

We assume that the model has a single input, but it can have multiple outputs.

process_image(image)#

Default implementation of process_image() for models that don’t need preprocessing.

This method can be overridden by subclasses to implement model-specific preprocessing.

Args:
image (np.ndarray): The image to prepare for prediction. The image is a numpy

array with shape (1, height, width, channels) and dtype uint8 (range [0, 255]).

get_output(output_idx: int) numpy.ndarray#

Returns the output tensor of the model.

Args:

output_idx (int): The index of the output tensor to return.

Returns:

The output tensor as a numpy array.

predict(image: numpy.ndarray) Any#

Predicts the pose of the image.

Args:
image (np.ndarray): The image to predict the pose of. The image has

the shape and dtype expected by the model.

Returns:

The prediction returned by the model. This can be a single tensor or a tuple of tensors, depending on the model.

pocketpose.models.body.model_registry#
class pocketpose.models.body.PoseNet(model_path: str, model_url: str, input_size: tuple)#

Bases: pocketpose.models.interfaces.TFLiteModel

Base class for PoseNet models.

process_image(image)#

Default implementation of process_image() for models that don’t need preprocessing.

This method can be overridden by subclasses to implement model-specific preprocessing.

Args:
image (np.ndarray): The image to prepare for prediction. The image is a numpy

array with shape (1, height, width, channels) and dtype uint8 (range [0, 255]).

flip_keypoints(keypoints, image_width)#

Flip the keypoints horizontally.

postprocess_prediction(prediction, original_size) List[List[float]]#

Postprocesses the prediction to get the keypoints.

Args:
prediction (Any): The raw prediction returned by the model. This can

be a single tensor or a tuple of tensors, depending on the model.

original_size (tuple): The original size of the input image as (height, width).

Returns:

The predicted keypoints as a list of (x, y, score) tuples.

extract_keypoints_from_heatmaps(heatmaps)#

Extract the keypoints from the heatmaps.

Args:

heatmaps: The heatmaps to extract the keypoints from. Shape: (height, width, num_keypoints)

Returns:

A tuple containing the keypoints and their confidences.

apply_offsets(keypoints, offsets, output_stride=32)#
class pocketpose.models.body.PoseNetSinglePerson#

Bases: PoseNet

MoveNet Lightning model.

class pocketpose.models.body.PoseNetMultiPerson#

Bases: PoseNet

MoveNet Lightning model.

class pocketpose.models.body.SimCCDecoder#

Bases: pocketpose.models.decoder.base_decoder.Decoder

Base class for all decoders.

Decoders are used to decode the prediction of pose models into a keypoint list in the image coordinate system. The keypoint list is a list of tuples (x, y, score) where x and y are the coordinates and score is the prediction confidence.

All decoders must implement the decode method. Each model has a corresponding decoder, and the decode method is automatically called when the model is used for prediction.

decode(prediction, image_shape)#
class pocketpose.models.body.ONNXModel(model_path: str, model_url: str, **kwargs)#

Bases: pocketpose.models.interfaces.imodel.IModel

Interface for all ONNX models.

We assume that the model has a single input, but it can have multiple outputs.

process_image(image)#

Default implementation of process_image() for models that don’t need preprocessing.

This method can be overridden by subclasses to implement model-specific preprocessing.

Args:
image (np.ndarray): The image to prepare for prediction. The image is a numpy

array with shape (1, height, width, channels) and dtype uint8 (range [0, 255]).

predict(image: numpy.ndarray) Any#

Predicts the pose of the image.

Args:
image (np.ndarray): The image to predict the pose of. The image has

the shape and dtype expected by the model.

Returns:

The prediction returned by the model. This can be a single tensor or a tuple of tensors, depending on the model.

pocketpose.models.body.model_registry#
class pocketpose.models.body.RTMPose(model_path: str, model_url: str, input_size: tuple)#

Bases: pocketpose.models.interfaces.ONNXModel

Base class for RTMPose models.

process_image(image)#

Default implementation of process_image() for models that don’t need preprocessing.

This method can be overridden by subclasses to implement model-specific preprocessing.

Args:
image (np.ndarray): The image to prepare for prediction. The image is a numpy

array with shape (1, height, width, channels) and dtype uint8 (range [0, 255]).

postprocess_prediction(prediction, original_size) List[List[float]]#

Postprocesses the prediction to get the keypoints.

Args:
prediction (Any): The raw prediction returned by the model. This can

be a single tensor or a tuple of tensors, depending on the model.

original_size (tuple): The original size of the input image as (height, width).

Returns:

The predicted keypoints as a list of (x, y, score) tuples.

class pocketpose.models.body.RTMPoseM#

Bases: RTMPose

MoveNet Lightning model.