GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1265927065


From:
Subject: Re: [DNA] Variance Assessment of R:U106 DYS425Null Cluster
Date: Thu, 11 Feb 2010 22:24:25 +0000 (UTC)
In-Reply-To: <mailman.4269.1265922105.2099.genealogy-dna@rootsweb.com>


>From: "Lancaster-Boon" <
>Dear Anatole,

>Thanks once more. It does seem you have seen my question more clearly.



Dear Andrew,



Regretfully, you continue with your pattern. Why wouldn't you write - "It does seem that now I ask my questions more specifically"?



In fact, it is close to impossible to see unclear questions "more clearly". It takes a lot of efforts to understand what do you actually mean in your questions which often sound like statements on how complicated things are. Then, you constantly use terms without defining them. Then, you take MY terms and apparently second-guess my definitions, changing them into your "way of thinking".   



Here is an example:

    

>>Andrew: "You need knowledge of not only one common ancestor for whole
uncontroversial clades, but also the common ancestor for each sub-branch and
sub-sub-branch, etc."

>Anatole: Absolutely not. If a tree is non-complicated one (meaning, has only one common ancestor), just one figure, that is an average number of
mutations per marker (or per haplotype) gives you an immediate answer in
terms of TMRCA. [...] In other words, the most important job there is to
determine, is it a uncomplicated (first-order) system or a complicated one.


As you see, I was avoiding non-defined terms "each sub-branch"an "each sub-sub-branch" in a haplotype tree. What I said, is that you do not need even to use those fuzzy words when you start to analyze a haplotype tree. All you need to determine is whether the tree is a first-order one (meaning it has only one - technically - common ancestor) or it is a more complicated one. In a second case, you need to separate the branches and calculate them differently. As you see, I did not use terms with "each sub-branch" and "each sub-sub-branch", because are poorly defined and in fact irrelevant. 



How did you distort my words? Here is how:



>(At this point Anatole, you are to some extent talking past me. I asked if
you need to be able to define all sub-branching and you said "no". But you
ALSO say that you do not need to do it IF there ARE NO sub-branches (not a
complicated case with multiple common ancestors).)


Mind you, I did not say that I "do not need to do it if there are no sub-branches". I did not use that term "sub-branches" at all on the reason I explained above. You put it to my mouth and are now saying that I am talking past you. You have confused yourself. 



I repeat: forget about those "sub-sub-branches". All you need to do is to determine - one common ancestor or more for your dataset. There are simple criteria for that. I have described them earlier. Look at the example David Ewing sent in and I have analyzed it. It is in a parallel thread. Notice that I did not use "sub-sun-branches" in it. There was no need for it. 



I composed the tree, separate four different branches which were quite visible, and applied the logarithmic-linear criterion to the largest one. It showed that they gave the same TMRCA by each method. It means that the separation of the branches and their calculations were correct.   



You apparently refuse to consider concrete example and prefer to talk generalities. It led you to a trap.



O.K., let's continue:



>>>Andrew: "How do you see two clades within one dataset?"

>>Anatole: I compose a haplotype tree. It shows separate, distinct branches,
if they do exist. Sometimes you can see them directly, by eye, but it takes
an experience to notice them. I would not recommend it. ... Shallow branches
are separate branches. They come from a recent common ancestor. They should
be tested separately.



> (Andrew)...So, I think I can summarize as follows:

>1. SOMETIMES, in second order or complicated cases, you need to separate out sub-branches and handle them seperately.


(Anatole) I do not know why you put an emphasis on SOMETIMES. Of course it is sometimes, how else? Then, there is no "second order" here. "Second order" has a completely different meaning, being compared with "first order". It is not applicable for these mutations. The rest is correct.  



>2. To do this, you use, I am now reasonably confident, the various normal
techniques for this, either by eye or using one of the software solutions,
or may be your own? I don't hear of any new technique here?


I repeat, that I would not recommend to do it "by eye". There are too many markers in extended haplotypes in order to get a full picture. Let computer do it. A good example was in the David's said dataset. He rightly picked a series of DYS391=10 as a different "clade" (a branch), however, added to them two haplotypes from another branch only (apparently) because they also had DYS391=10. The those two it was just a random mutation in quite another branch. The computer put those two in a right branch (colored differently). 



>But in practice I still find this somewhat confusing.



O.K., it happens.



>If I understand correctly, it is easy to define an extreme case "first order" example of data as per your description. It is a comb shaped tree, or star shaped network...



(Anatole) So far, so good... But you forgot again the principal criterion - the logarithmic method. "A comb shaped tree" is hard to find. Some not-so-comb-shaped trees can still have one common ancestor. There are VERY MANY transient cases. 



>and could be given by data something like this:-
10-10-10-10-10
10-10-10-11-10
10-10-10-10-10
10-10-10-10-11




(Anatole) Wrong. You cannot tell anything from such a bastardized - again - example. But certainly it does not fit - formally - a first-order case. A number of mutations and a number of base haplotypes to not match in terms of the logarithmic-linear dual method. I will show it for just illustrative purpose. Since there are two of "base" haplotypes among four, hence, ln(4/2) = 0.693, and two mutations per four haplotypes, that is 4/4 = 0.500. The figures do not match each other. It means that this dataset will give a phantom common ancestor.   



>In other words, no obvious proposals for secondary common ancestors are
prominent at all. This is what you mean right?




(Anatole) You do it again. You distorted my approach, and show it as it was mine. And then ask a rhetorical question, meaningless in this situation.



Since the rest in your comment is essentially based on that distorted view, there is no reason to continue with it. I suggest you to re-formulate your "questions" and come again. Meanwhile, take a close look at John's and David's examples. It will help you.   



However, to save you time with some simple things, I will make brief comments:


>So (my new question NB) how do you decide when the branching is important
for the analysis and when not?



I have explained above, probably fifth or sixth time. I apply the dual criterion. I compose a tree. There are branchings and branchings. Some of them are distinct and clearly have a separate base haplotype each. Some of them look like a fish skin, they are part of the same branch.   



>Maybe this is something where a simple example might help.



MORE??



>Personally I thought my example was a good one.



No. Though, in a way, it was a good one to show that it was a wrong one.





>To describe a similar case in words:
a. let's say you have 12 x 67 marker haplotypes, very closely matching,
connected by surname and region of origin, and no other close matches
b. half of them have 18-23 and half have 19-23 on YCAII. So far so good
c. Let's say a third of them have 11 instead of 12 on DYS439, but this
includes some with 18-23 and some with 19-23.
One possible answer is maybe to say this is just an unlucky example.



(Anatole) It would be a bad answer in this context. 12 haplotypes is not a good thing for a tree composition, however, the tree will immediately show where are separate branches and where are parts of the same branch. An "eye" sorting out would be a lost case here.



>I guess one possible tree they'll come up with with group the set into 4
possible combinations (18 and 11, 18 and 12, 19 and 11, 19 and 12)...



Do not guess. This is how people get confused.



>In the real world my response to the above situation is to say we need more
data...



Oh, yes. Always. However, sometimes you do not have such a luxury. For example, with excavated haplotypes. So, you try to minimize an error, and put a properly calculated margin of error. Often it still provides you with a useful information.



Regards,



Anatole Klyosov



This thread: