Chris Hoffmann
2007-06-27 15:44:37 UTC
Hi everybody,
I was wondering about DNADIST, from the PHYLIP package.
I am conducting a big sequencing project and there will be several phases. I
would like to construct a distance matrix using DNADIST with a initial
dataset and later on only add more sequences to the set. but I didn't want
to have to re-run the program with all the sequences again. is there a way
to only insert the new data into the matrix?
For example:
initially I want calculate the distances from sequences in group of
sequences A;
then when I get group of sequences B, calculate the distances within
sequences in group B;
and calculate the distances between sequences in group A and B without
having to re-calculate the distances for group A again.
Tthis is a simple example, I am actually likely to have 5 or more sets of
sequences, ranging from 5000 to 20000 sequences per group (perhaps more).
I realize I may have to adapt the code (another issue entirely) but what I
am concerned is if the methods used by DNADIST give reliable results if I
calculate them in this fashion.
I wanted to use the F84 model, the default, but I am open to suggestions.
Any help on this would be great.
Thanks
Chris
I was wondering about DNADIST, from the PHYLIP package.
I am conducting a big sequencing project and there will be several phases. I
would like to construct a distance matrix using DNADIST with a initial
dataset and later on only add more sequences to the set. but I didn't want
to have to re-run the program with all the sequences again. is there a way
to only insert the new data into the matrix?
For example:
initially I want calculate the distances from sequences in group of
sequences A;
then when I get group of sequences B, calculate the distances within
sequences in group B;
and calculate the distances between sequences in group A and B without
having to re-calculate the distances for group A again.
Tthis is a simple example, I am actually likely to have 5 or more sets of
sequences, ranging from 5000 to 20000 sequences per group (perhaps more).
I realize I may have to adapt the code (another issue entirely) but what I
am concerned is if the methods used by DNADIST give reliable results if I
calculate them in this fashion.
I wanted to use the F84 model, the default, but I am open to suggestions.
Any help on this would be great.
Thanks
Chris