Our innovative Sequence-Activity Relationship methodology, called innov’SAR, identifies high fitness mutants from mutant libraries relying on physico-chemical properties of the amino acids, digital signal processing and regression techniques. innov’SAR correlates variations caused by mutations in spectra with biological activity/fitness. It takes into account the impact of mutations on the whole spectrum and does not focus on local fitness alone.
The novelty of innov’SAR is that it uses Fast Fourier Transform (FFT) to numerically encode protein sequences of a library of variants of the protein/enzyme with known activities into a set of protein spectra
innov’SAR then finds the numerical patterns from the sequences of these protein variants that best correlate with changes in protein activity upon residue substitutions. To achieve this, the method search among all encoding possibilities provided by the AAindex database and uses partial least squares regression (PLSR) to model the relationship between the sequence information and protein activity. The method assumes that the determinants of protein activity are not purely local, but globally distributed over the linear sequence of the protein.