In a recent paper published in the journal Scientific Reports, researchers introduced an innovative approach to deducing the 3D posture of mice, encompassing their limbs and feet, from monocular videos. Additionally, the study unveils the Mouse Pose Analysis Dataset, an extensive video dataset showcasing labeled poses and behaviors of lab mice within their habitat. This dataset is supplemented by high-resolution mouse computerized tomography (CT) scans, crucial for shape modeling in the reconstruction of 3D poses.
Background
The study introduces a groundbreaking computer vision technique for continuous 3D motion tracking of laboratory mice, offering an affordable and non-intrusive solution. This technique addresses the challenge of investigating irregular motion in animal models relevant to human clinical conditions. It replaces the need for manual measurements that are not only expensive but also distressing for the animals.
The method adapts well-established 3D human pose estimation techniques to mice, involving the prediction of 2D keypoints and subsequent optimization for 3D pose estimation, accounting for potential occlusions. The method is not only scalable for monitoring multiple cages simultaneously but also robust against occlusions using strategically placed anatomically significant keypoints.
Advancements in pose estimation
2D Pose Estimation: Animal pose estimation rooted in deep learning draws inspiration from human pose algorithms. Recent surveys showcase this influence. A deep learning model, DeepLabCut employs transfer learning for human-like accuracy, even with few labeled samples, inspiring further advancements. Another model, LEAP, accelerates annotation via iterative fine-tuning, while DeepPoseKit enhances robustness. Despite their open-field performance, their efficacy in home cage images remains untested. Spatio-temporal consistency across frames improves accuracy. The OptiFlex model computes optical flow from keypoint heat maps, while OpenPifPaf and DeepGraphPose incorporate composite fields and graphical models, respectively, aiding occluded keypoint inference.
3D Pose Estimation: Though 2D pose suffices for many inquiries, 3D movement understanding requires kinematics integration. 3D poses can result from triangulating 2D keypoints across cameras or using depth sensors. The multi-view 3D rig proposed by researchers, including a Kinect device, evaluates single-view 3D reconstruction. Recent machine learning advances, such as LiftPose3D, predict 3D joint location from single views, while Dunn et al. regress a volumetric representation.
In contrast, the proposed approach frames 3D pose estimation as an optimization task, leveraging a mouse skeleton model for interpretability and robustness against occlusions. This model incorporates 3D joint angles, addressing challenges related to missing data and over-parameterization through a robust prior.
Comprehensive mouse pose analysis
The Mouse Pose Analysis Dataset encompasses 455 video clips of common inbred strains of laboratory mice (C57BL/6N) and diversity-outbred mice, coupled with 80 CT images of C57BL/6N mice. The aim is to provide a comprehensive resource for diverse research concerns in animal physiology and behavior. All CT scans adhered to animal care guidelines. The dataset represents a broad spectrum of lab mice genotypes, sexes, weights, and activities within their home cages.
The dataset includes manually labeled keypoints and behavior annotations from diverse behaviors, providing researchers with substantial insights. Although specific experimental data for evaluation has not been released, the dataset, accessible through a provided link, contains 5460 annotated frames for 2D pose evaluation and 80 CT scans. It is the first large-scale dataset with comprehensive keypoint and behavior annotations for mice in home cage settings, distinguishing it from similar datasets.
Mouse pose prediction encompasses three stages: bounding box detection, 2D pose prediction, and 3D pose optimization. For 2D detection, the feature extraction pipeline includes a single-shot detector for mouse detection and a stacked hourglass network for pose estimation. Models are pre-trained on the dataset Common objects in context (COCO), comprising labeled human keypoint positions, which is generated for 2D pose and detection models. This dataset is adapted for mice by replacing human keypoints with those for mouse keypoints. The object keypoint similarity (OKS) score evaluates the pose model.
The 3D pose prediction is designed as an optimization problem using a kinematic chain with 18 joints. Shape prior and pose prior improve stability and convergence. A proposed custom 3D capture rig facilitates multiview 3D pose reconstruction. The biological attributes of mice are predicted using a neural network model trained on continuous video data. Gait measurements are obtained through both direct measurements and the pose estimation method.
Results demonstrate the accuracy of inferred 3D poses from the Multiview video data set. The root mean square error (RMSE) of joint measurements shows an average error of less than 10 mm, representing under 10% relative error. 3D joint angles outperform other representations in predicting biological attributes, achieving perfect classification on a test set. Gait measurements are effectively estimated from 3D poses, matching direct measurements.
The method's bounding box outputs excel in inferring mouse behavior compared to deep convolutional networks. The system's granularity and efficiency provide advantages over intricate neural networks, with bounding box representations excelling in detecting subtle positional changes.
Conclusion
In summary, this research introduces a method for deducing 3D poses of mice from single-view videos. The evaluation was focused on keypoints accuracy and their biological relevance. Outputs from various stages predicted biological attributes linked to gait disruptions. The method enables health prediction and offers the potential for non-invasive monitoring and gait analysis. The system serves as an alternative for gait parameter measurement. Future work includes accuracy enhancement and expansion to animal social interactions.