GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-02 > 1265874147


From: "Lancaster-Boon" <>
Subject: [DNA] Variance Assessment of R:U106 DYS425Null Cluster
Date: Thu, 11 Feb 2010 08:42:27 +0100


Dear Anatole

Thanks once more. It does seem you have seen my question more clearly.
Cutting down to where I think the core of the discussion is now:-

Andrew: "In order to count mutations I think you need a full "family tree".

Anatole: Absolutely not. You will never get a "full family tree" for "old"
TMRCAs. You need just to figure out was there one nearest common ancestor
for your dataset, or two or several of them. I have explained how.

Andrew: "You need knowledge of not only one common ancestor for whole
uncontroversial clades, but also the common ancestor for each sub-branch and
sub-sub-branch, etc."

Anatole: Absolutely not. If a tree is non-complicated one (meaning, has only
one common ancestor), just one figure, that is an average number of
mutations per marker (or per haplotype) gives you an immediate answer in
terms of TMRCA. [...] In other words, the most important job there is to
determine, is it a uncomplicated (first-order) system or a complicated one.

(At this point Anatole, you are to some extent talking past me. I asked if
you need to be able to define all sub-branching and you said "no". But you
ALSO say that you do not need to do it IF there ARE NO sub-branches (not a
complicated case with multiple common ancestors).)

Andrew: "How do you see two clades within one dataset?"

Anatole: I compose a haplotype tree. It shows separate, distinct branches,
if they do exist. Sometimes you can see them directly, by eye, but it takes
an experience to notice them. I would not recommend it. ... Shallow branches
are separate branches. They come from a recent common ancestor. They should
be tested separately.

...So, I think I can summarize as follows:

1. SOMETIMES, in second order or complicated cases, you need to separate out
sub-branches and handle them seperately.

2. To do this, you use, I am now reasonably confident, the various normal
techniques for this, either by eye or using one of the software solutions,
or may be your own? I don't hear of any new technique here?

But in practice I still find this somewhat confusing. If I understand
correctly, it is easy to define an extreme case "first order" example of
data as per your description. It is a comb shaped tree, or star shaped
network, and could be given by data something like this:-

10-10-10-10-10
10-10-10-11-10
10-10-10-10-10
10-10-10-10-11

In other words, no obvious proposals for secondary common ancestors are
prominent at all. This is what you mean right?

But most examples are not quite this simple and a computer programme, or
your own eye, will be able to propose at least a few "second order" common
ancestors. A computer programme will indeed insist on proposing lots of
them. Furthermore no method I know of will give any obvious way of
distinguishing "shallow" branching from "deep" branching?

So (my new question NB) how do you decide when the branching is important
for the analysis and when not? Maybe this is something where a simple
example might help. Personally I thought my example was a good one. To
describe a similar case in words:

a. let's say you have 12 x 67 marker haplotypes, very closely matching,
connected by surname and region of origin, and no other close matches

b. half of them have 18-23 and half have 19-23 on YCAII. So far so good

c. Let's say a third of them have 11 instead of 12 on DYS439, but this
includes some with 18-23 and some with 19-23.

One possible answer is maybe to say this is just an unlucky example. I will
not necessarily agree that it is very unusual, but it would be an answer.

If I give the example to computer programmes or humans they'll all give
different answers depending on the fine tuning of how they weight things,
and indeed if I add or subtract just one haplotype or marker the whole story
will change.

I guess one possible tree they'll come up with with group the set into 4
possible combinations (18 and 11, 18 and 12, 19 and 11, 19 and 12) and say
that each of these 4 forms one separate clade with its own second order
common ancestor. But in reality this seems VERY unlikely. The tree almost
certainly has a second order and a third order of common ancestry, and they
seem to be pretty important ones.

In the real world my response to the above situation is to say we need more
data, and TMRCA estimates of any type would be fairly unhelpful for the time
being. Maybe this is too negative? I am interested to hear how other people
including yourself see it.

Best Regards
Andrew


This thread: