Advances in computational techniquesâparticularly machine learningâhave expanded opportunities to analyse early infant motor repertoires, especially in naturalistic settings. The aim of this study was to evaluate the strengths, limitations, and performance of state-of-the-art pose estimation algorithms in challenging, home-based video conditions. We analysed 22 videos recorded by parents using mobile phones from eight newborns in the Baby Grow study, at 2, 4, and 8Â weeks of age. The videos varied in clothing (common onesie, babygrow, vest), background (grey, black, coloured), lighting (with/without shadows), and camera angles (top, front, bottom). From these, 2,640 frames were extracted and manually annotated to serve as ground truth. We tested demo versions of MediaPipe, OpenPose, PCT, RTMpose, Sapiens, and VitPose, and evaluated performance using object keypoint similarity (OKS), percentage of correct keypoints (PCKh), speed, and accuracy. RTMpose showed the highest overall accuracy, while MediaPipe had the fastest processing speed. However, when balancing speed and accuracy at ratios of 70:30, 50:50, and 30:70, MediaPipeâs speed compensated for its lower accuracy, making it a strong candidate for practical applications. Model performance varied under different environmental conditions, with RTMpose, Sapiens, and VitPose being the most robust. As infant movement research increasingly shifts to real-world environments, selecting appropriate models and ensuring video quality are essential. Our findings show that (1) new models outperform legacy tools like OpenPose, and (2) video context and model selection significantly affect pose estimation accuracy.