Archiver > GENEALOGY-DNA > 2010-03 > 1267449990

From: "Diana Gale Matthiesen" <>
Subject: Re: [DNA] Handling data in bigger projects.
Date: Mon, 1 Mar 2010 08:26:30 -0500
References: <1AAA3001708D48B0BDEDDABAE833DB52@PC>
In-Reply-To: <1AAA3001708D48B0BDEDDABAE833DB52@PC>

> -----Original Message-----
> From: On Behalf Of Lancaster-Boon
> Sent: Sunday, February 28, 2010 5:52 PM
> To:
> Subject: [DNA] Handling data in bigger projects.
> Hi Diana
> What if one major aim of the project is to work out how
> relatively common each haplogroup is in a set of related regions?

The frequencies of the haplogroups aren't going to change because you split up a
project by haplogroup.

> Once you split them into different projects the sampling for
> each haplogroup will work differently, or let's say more
> differently than otherwise.

Setting aside the fact that there is no controlled "sampling" going on here, in
the first place, what mystical force would alter the frequencies haplogroups of
people being tested just because the projects are organized differently?

> This could be caused by
> something as simple as one project having more active
> admins.

Your sample is too crude to begin with to be worried about the effect of
different admins. The fact is most people are surname tested before they join a
regional project, so the number of people joining a regional project is already
determined by factors that have nothing to do with the regional project, itself,
whether there is one large regional project to join or several
haplogroup-specific regional project to join.

I'm making the assumption here that the original, non-haplogroup-specific
surname project will continue to exist, with it's original admin, for the simple
reason that people don't know their haplogroup until after they've been tested.
Also, the less common haplogroups won't need to be spun off because there aren't
enough of them to prove unmanageable. Test subjects need a central project to
join, initially. Then, after they find out their haplogroup, they can move to
the sub-project. I would also assume that there would be a high level of
cooperation between the original project and the spun-off projects.

The bottom line is: You're going to sort the data by haplogroup, anyway. You
can go the bulky route and keep the sorted data in one table -- for no good
reason because, genetically, the test subjects can have had no connection for
thousands of years -- or sort them into separate tables, making them much easier
to work with.

Getting back to what started this discussion, not all projects are providing as
much information as they could (or should) for their members to get the most out
of their testing. If the fact that a project has gotten too large to handle is
the reason, one solution would be to split up (downsize) the project in some
way, and doing it by spinning off the major haplogroups seems to me to be the
most logical way to do it.


> Best Regards
> Andrew
> ---
> From: "Diana Gale Matthiesen" <>
> Subject: Re: [DNA] Handling data in bigger projects.
> Date: Sun, 28 Feb 2010 11:46:45 -0500
> One obvious way to deal with large projects -- surname or
> regional -- is to break them down by haplogroup. People
> in different haplogroups haven't had a common ancestor
> in thousands of years, so I see no earthly reason to keep
> them in the same project, unless the project is small and
> easily managed as it is.

This thread: