Two equal length sequences of nucleic acids (A,C,T,or G) are judged based on
a 4x4 score
matrix. The homology score for the sequences is just the sum of the scores
at each position in the sequence, looked up in the matrix.
For example, if the two sequences were ACTA and GATC then their homology
score would be score(A,G) + score(C,A) + score(T,T) + score(A,C).
Our problem is to construct the score matrix. It has the following restrictions:
-
1) All entries must be integers between -10 and 10 inclusive
-
2) It must be symmetric ( score(x,y) = score(y,x) )
-
3) Diagonal entries must be positive ( score(x,x)>0 )
-
4) The sum of the 16 entries must be 0.
We are given a set of equal length sequences that are known to be homologous.
We want to fill in the score matrix so that the average homology score for all
the pairs from the set is maximized. If there are n sequences in the set, then
there are n*(n-1)/2 pairs to consider.
Create a class BioScore that contains a method maxAvg that is given knownSet,
the list of sequences known to be homologous. The method computes the optimum
scoring matrix and returns the resulting maximal average homology score for the
pairs from knownSet.
|