Generative artificial intelligence (AI) poses a significant threat to data integrity on crowdsourcing platforms, such as Prolific, which behavioral scientists widely rely on for data collection. Large language models (LLMs) allow users to generate fluent and relevant responses to open-ended questions, which can mask inattention and compromise experimental validity. To empirically estimate the prevalence of this behavior, we analyzed keystroke data from three studies ( N = 928) on Prolific between May and July 2025. Using an embedded JavaScript tool, we flagged participants who pasted text or whose keystroke count was anomalously low compared with their response length. For each flagged participant, we manually compared detected keystrokes with their final response to determine if the text could have been typed. This confirmed that despite deterrence measures, approximately 9% of participants submitted responses consistent with AI assistance or other forms of outsourced responding. These participants outperformed noncheaters (by up to 1.5 SD ), were more than twice as likely to share geolocations with other participants (suggesting possible proxy use), and exhibited lower internal consistency on questionnaire scales. Simulated power analyses indicate that this level of undetected cheating can diminish observed effect sizes by 10% and inflate required sample sizes by up to 30%. These findings highlight the urgent need for new detection methods, such as keystroke logging, which offers verifiable evidence of cheating that is difficult to obtain from manual review of LLM-generated text alone. As AI continues to evolve, maintaining data quality in crowdsourced research will require active monitoring, methodological adaptation, and communication between researchers and platforms.