Discussion:
Looping inside Needleman-Wunsch algorithm & good values for the Similairity Matrix
(too old to reply)
a***@altavista.com
2008-10-09 10:35:28 UTC
Permalink
Hi everyone,


Concerning the Needleman-Wunsch algorithm (cf.
http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm) I have
noticed a possible loop.

Inside the algorithm there is an important decision making mechanism.
Its a "if, else if, else if" structure like:

if(ScoreValue == DiagonalValue + SimilarityValue(i, j)
{
blah;
}
else if(Score == Left + d)
{
blah;
}
else if(Score == Up + d)
{
blah;
}


I've been playing with value of the similarity matrix and for certain
values I can get the algorithms to stick as there is no else clause to
offer an out if none of the above conditions are met. Now I know that
I could introuduce an else statement and avoid the loop but it does
not address the fundamental probalm of constructing a useful
Similarity matrix.

It's on this point issue that I want to ask for help. The similaity
Matrix that I built for my task is ultimately causing this problem. So
i need to make it better. My question is: what properties should a
good Similarity matrix have to avoid this loop? How does one go about
constructing an effective Similarity matrix? What properties should it
have? I'm not referring to the simple structure that score 1 for a
match and 0 for a mismatch. i'm referring to more complex
structures...
Any hints/advice/user-experiences/websites/papers much appreciated.

Thanking you,
Al.
Philipp Pagel
2008-10-10 08:10:53 UTC
Permalink
Post by a***@altavista.com
Concerning the Needleman-Wunsch algorithm (cf.
http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm) I have
noticed a possible loop.
Inside the algorithm there is an important decision making mechanism.
if(ScoreValue == DiagonalValue + SimilarityValue(i, j)
{
blah;
}
else if(Score == Left + d)
{
blah;
}
else if(Score == Up + d)
{
blah;
}
You mean the traceback part - right?
Post by a***@altavista.com
I've been playing with value of the similarity matrix and for certain
values I can get the algorithms to stick as there is no else clause to
offer an out if none of the above conditions are met.
By definition, one of these three cases must be true. The reason for
that is of course that the dynamic programming matrix was constructed
using these rules in the first place. I see two possible reasons for the
problem you observe:

1) There is a bug in your program. Either in the construction phase or
in the traceback

2) You are experiencing a floating point precision problem.
Testing floating point numbers for equality can be problematic...
Post by a***@altavista.com
It's on this point issue that I want to ask for help. The similaity
Matrix that I built for my task is ultimately causing this problem. So
i need to make it better.
My question is: what properties should a
good Similarity matrix have to avoid this loop? How does one go about
constructing an effective Similarity matrix? What properties should it
have? I'm not referring to the simple structure that score 1 for a
match and 0 for a mismatch. i'm referring to more complex
structures...
The above seems to indiate that you mean the scoring matrix when you say
"similarity matrix" - right?. From the perspective of the algorithm it
does not matter at all - you can use whatever scoring matrix you want.
Of course from the biological point of view you want to choose a matrix
that best matches your problem, question or data. But that is beside
the point because no scoring matrix will cause the traceback to fail.

cu
Philipp
--
Dr. Philipp Pagel
Lehrstuhl f. Genomorientierte Bioinformatik
Technische Universität München
http://mips.gsf.de/staff/pagel
Loading...