Archiver > GENEALOGY-DNA > 2010-02 > 1267378311

From: Doug McDonald <>
Subject: Re: [DNA] Handling data in bigger projects. Was RED. Callto participants
Date: Sun, 28 Feb 2010 11:33:22 -0600
References: <0B37177B19B64203B534711185F452B2@PC>
In-Reply-To: <0B37177B19B64203B534711185F452B2@PC>

Lancaster-Boon wrote:
> Kirsten,
> I agree. We will indeed be keeping an eye out for anyone who has ideas about
> this aspect of the project. Just to give an example of one problem we face:
> we have not even begun to try to work out how we would absorb data from
> other testing companies.
> Diana,
> I think it would be very interesting to hear more ideas about how to handle
> data. Certainly I have always used plain old fashioned spreadsheets whenever
> I needed to go beyond test company supply tables (which is increasingly
> important).
> I think there are two reasons for considering going beyond spreadsheets:
> 1. Is sheer size of database when you look at a case like the British Isles
> Project.

In the Clan Donald I keep the data in a spreadsheet.

With about 1000 people it is not too large.

It's easy to use.

But ... I don't use it to do searches. For that .. I actually use our webpage.

The spreadsheet data is converted to ... Javascript. Well, to a large Javascript array.
The Javascript on the web page can search by ID or haplotype. This works well.

4000 or 5,000 people would still be fine with Excel and the present search mechanism.

Bigger than that would take a database.

I also do stuff with autosomes. These data files are bigger.
My largest reference dataset is 537,904 x 1033

This is barely small enough for Excel 2007 to work with. In general I don't do that
but do use Excel to look at it. I use specially written C programs to manipulate the data.

If you go to DNAforums and the Autosome page you will see the results.

Doug McDonald

This thread: