Author(s)
James McLellan, Svetlana Ikonomova, Shwetha Sreenivasan, Alan Amin, Catherine Baranowski, Amanda Reider Apel, Peter Kelly, David Ross, Aviv Spinner
Abstract
High-quality datasets that span broad sequence diversity are essential for understanding protein sequence-function relationships beyond local mutational landscapes. Here, we applied Growth-based Quantitative Sequencing (GROQ-seq) to measure function across an 11,722 member protease library, comprised of natural homologs and AI-shrunken variants. This library spans vast sequence diversity, with Levenshtein distances of up to 245 and a mean pairwise sequence identity of 41 % to TEV protease S219V. We identified sequence-divergent TEV protease homologs that preserve function against the native TEV protease substrate. These findings reveal the robustness of protease activity across highly diverse sequences. Here, we demonstrate the aptitude of the GROQ-seq assay for screening large, diverse protein libraries for function, enabling efficient data generation at scale for training machine learning models across broad sequence landscapes.
Citation
https://www.biorxiv.org/
Keywords
AI-Ready Biological Data
Citation
McLellan, J. , Ikonomova, S. , Sreenivasan, S. , Amin, A. , Baranowski, C. , Reider Apel, A. , Kelly, P. , Ross, D. and Spinner, A. (2026), Functional Profiling of Thousands of Sequence-Diverse Protease Homologs with GROQ-seq, https://www.biorxiv.org/, [online], https://www.biorxiv.org/ (Accessed May 7, 2026)
Additional citation formats
Issues
If you have any questions about this publication or are having problems accessing it, please contact [email protected].