Our innovative Sequence-Activity Relationship (iSAR) methodology identifies high fitness mutants from mutant libraries relying on physico-chemical properties of the amino acids, digital signal processing and regression techniques. iSAR correlates variations caused by mutations in spectra with biological activity/fitness. It takes into account the impact of mutations on the whole spectrum and does not focus on local fitness alone.
The novelty of iSAR is that it uses Fast Fourier Transform (FFT) to numerically encode protein sequences of a library of variants of the protein/enzyme with known activities into a set of protein spectra
iSAR then finds the numerical patterns from the sequences of these protein variants that best correlate with changes in protein activity upon residue substitutions. To achieve this, the method search among all encoding possibilities provided by the AAindex database and uses partial least squares regression (PLSR) to model the relationship between the sequence information and protein activity. The method assumes that the determinants of protein activity are not purely local, but globally distributed over the linear sequence of the protein.