Reconstruction of a Set of Points from the Noise Multiset of Pairwise Distances in n^2 Steps for the Cyclic Sequencing Problem
Motivation and Aim: An important fraction of the peptidoma of bacteria is non-ribosomal peptides (NRP), representing a class of secondary peptide metabolites, usually produced by bacteria and fungi, and having an extremely wide range of biological activity and pharmacological properties. In the overwhelming majority of cases (64%), NRPs have a cyclic structure . In connection with their biosynthesis from the non-rybosomal path, the identification of NRPs by classical methods of bioinformatics and genomics is impossible, and is carried out only on the basis of mass spectrometry.
Mathematically, the sequencing of a cyclic chain from mass spectra is reduced to the problem known to mathematicians for long: the recovery of the coordinates of a set of points X from the multiset of pairwise distances between them ∆X (so-called the beltway problem, which having no polynomial-time algorithm in the general case). The computational complexity of the best algorithm developed by now is O(n^n log n) . Despite the many approaches used (the brute force method, graph models, dynamic programming, the divide and conquer method, hidden Markov models, spectral convolution, etc.), attempts to design a polynomial algorithm for the beltway problem failed. So at present, the possibilities of de novo reconstruction of the structure of cyclic NRPs are limited. Thus, the development of new bioinformatic methods for the reconstruction of bacterial non-ribosomal peptides is very relevant.
Methods and Algorithms: We proposed a new method to solve the problem [3,4]. It is based on sequential removal of redundancy from the inputs. For the error-free inputs that simulate mass spectra with high accuracy (~ 10^-3 Da), the size of inputs decreases from O(n^2) to O(n). In this way, exhaustive search can be almost completely removed from the algorithms, and the number of steps to reconstruct a sequence is n^2, where n is is the cardinality of the set X, n=|X|.
Results: Now in  we generalized this method through the use of integral transforms. It is shown that the generalized approach can be successfully used for reconstructing the set X not only from a complete and error-free set of pairwise distances ∆X, but also for a set ∆X + f containing a large number of redundant and missing data f (noise), |f| > |∆X|. The high efficiency of the proposed method was shown. The computational complexity of the our algorithm is O(n^2), where n is the cardinality of the input set ∆X + f, n=|∆X + f|.