Gesture recognition technology is a popular area of research, offering applications in many fields, including behaviour research, human–computer interaction (HCI), medical research, and surveillance culture, among others. However, the large quantity of data needed to train a recognition algorithm is not always available, and differences between the training set and one’s own research data in factors such as recording conditions and participant characteristics may hinder transferability. To address these issues, we propose training and testing recognition algorithms on virtual agents, a tool that has not yet been used for this purpose in multimodal communication research. We provide an example use case with step-by-step instructions, using mocap data to animate a virtual agent and create customised lighting conditions, backgrounds, and camera angles, creating a virtual agent-only dataset to train and test a gesture recognition algorithm. This approach also allows us to assess the impact of particular features, such as background and lighting. Our best-performing model in optimal background and lighting conditions achieved accuracy of 85.9%. When introducing background clutter and reduced lighting, the accuracy dropped to 71.6%. When testing the virtual agent-trained model on images of humans, the accuracy of target handshape classification ranged from 72% to 95%. The results suggest that training an algorithm on artificial data (1) is a resourceful, convenient, and effective way to customise algorithms, (2) potentially addresses issues of data sparsity, and (3) can be used to assess the impact of many contextual and environmental factors that would not be feasible to systematically assess using human data