YDNA and YDNA Testing: How it all works

You can find all this information in almost any modern book on human biology, but I'll summarize it to make it easy, I hope. Actually, I found it difficult to understand all the discussions of DNA mutation, short tandem repeat (STR), haplotype, allele, single nucleotide, single nucleotide poymorphism (SNP, which is usually just called a SNIP), and haplogroup; therefore, I'll define each term so that anyone interested can look back for reference. Still, I'll try to use most of the terms sparingly, often substituting other words.

To begin, we need to introduce cells, chromosomes, and DNA, all words you probably remember from high-school biology, even if you are as old as I am. Our bodies are formed of cells, and the nucleus of every normal cell in our bodies contains 46 chromosomes, 23 from each parent. Occasionally, there are people with exceptions, but the exceptions are rare. Each chromosome is made up of a DNA molecule wrapped around other material.

Two of these 46 chromosomes are the sex chromosomes. For each female, the sex chromosomes are both X chromosomes, one given by her mother and one given by her father. For each male, one sex chromosome is an X chromosome from his mother, and one is a Y chromosome from his father. The Y chromosome is always inherited from the father; so it comes down to each male from his father, from a grandfather, from a great-grandfather, and on back to a very ancient human male.

The DNA molecule looks like two spirals with bars joining the spirals, the "double helix." If we uncoil the DNA molecule and stretch it out, it looks like a ladder. The rungs of the ladder are called bases, and each rung is made up of two bases called a base pair with each color representing one of the four different DNA components. Red represents a component designated with the letter C. Blue represents a component designated by A; yellow represents a component designated by T; and green represents a component designated by G. The letters A, T, C, and G stand for Adenine, Thymine, Cytosine, and Guanine; however, the only fact important for our purposes is that there are four distinct types of DNA components that can be distinguished from each other. Note that a blue base (A) on one side of a rung is always matched with a yellow base (T) on the other side (and vice versa). A red base (C) on one side is always matched with a green base (G) on the other side, and so forth.

A segment of a DNA molecule might consist of four or five components, perhaps as simple as GATC, repeated over and over, looking on one side of the ladder, of course. Personnel at DNA-testing laboratories and other genetic scientists call the sequence that is repeated a short tandem repeat (STR), but the STRs selected for testing may also be called markers because that is the designation often used by laboratories, and marker is the name we will use. The marker to be tested ends when the sequence changes from repetitions of GATC or whatever sequence was present to repetitions of a different combination, perhaps TAGCC. We need to consider the bases on only one side because when a sequence changes on one side, it changes on the other. This occurs because the same bases are always paired; for example, T as a base on one side of the unwound molecule is always paired with A as the base on the other side. Thanks to mapping done during the Human Genome Project, the exact location on the YDNA molecule, the helix, is known for each marker that genetic laboratories have designated as sites that are suitable for genealogical testing.

As already stated, a segment of DNA, which may be called a marker when it is tested in a laboratory, can have many repetitions of the same sequence of base pairs. The number of repetitions usually ranges from about seven to more than forty; however, the range of values for any given marker is much smaller, with more than ten possible values (alleles, if you wish) being very unusual. Examples from the Alexander DNA project illustrate the relatively few values found even for markers where mutations occur most frequently. Marker DYS439, a rapidly mutating marker has values ranging from 10 to 14, with almost all the repetitions being 11, 12, or 13. Marker DYS449, also a rapid-mutation site, has values from 26 to 34, but very few at either extreme, mostly 29, 30, 31, or 32. DYS390, which does not mutate rapidly, has values from 21 to 27 but, out of approximately 300 men tested at the time I did the analysis, there were only two values of 21 and only one value of 27. DYS454, which mutates extremely slowly, has only one value, 11, for all members of the project. In the male population at large, values of 10 and 12 occur for marker DYS454, although rarely.

For each person tested, each marker has a specific value, and the collection of those values is his haplotype. People who are closely related should be the same haplotype or have only a few mismatched values; however, there is some possibility of change each time YDNA passes from father to son. Although this means that even a man and his father may have one or more mismatches, the mutation rate is slow enough that mismatches in the Alexander project range from zero to four or five (out of sixty-seven) for people whose lines split in the days of Colonial American.

In addition to mutations that increase or decrease the number of repetitions of a sequence at a given marker, other types of mutations can occur in the YDNA molecule. One of these mutations can replace a base pair of T and A with the other base pair, G and C, or vice-versa. So far as is known, the change occurs at only one spot on the DNA molecule, and it is the single nucleotide polymorphism, the SNP or SNIP, mentioned earlier. DNA genealogists and genetic anthropologists define a haplogroup by reference to the SNP that distinguishes it from all other groups. The person in whom the SNP occurred would likely be the same haplotype as his brothers, his father, his uncles, etc., but one might ask whether the SNP means he belongs to a haplogroup that is different from that of the rest of his family. The answer is that it does not unless he has a very large number of male-line descendants, and that would not be known for several generations after he lived.

Geneticists have estimated the probability of a SNP as about one out of every fifty-million lifetimes; however, the frequency may be higher. Since each man living today should have only three thousand to four thousand male-line ancestors all the way back to the man who was ancestor to all men, the likelihood is small for there being a great number of SNPs in any given person's male line. Most SNPs should be fairly recent because, as we move toward the present, more and more people are alive at a given time. The YDNA of every member of a family or of closely related persons should have the same SNPs unless he or a recent ancestor underwent such a mutation.

With the necessary words and terms defined simply but not incorrectly, I hope, let us look far, far back in prehistory to the lifetime of the most-recent man from whom all living men descend. This doesn't mean that there were not other men alive at that time also, but, over the centuries and millenia, the lines of the other men ceased having males born to the lines. To make that concept easier to understand, consider that, if you are an Alexander, your Alexander great-great grandfather may have had five brothers and that, out of all six men, there may be living male-line descendants of only one, your great-great grandfather. Lines die out, as several women seeking male cousins to take the YDNA test have discovered!

Let us call the haplogroup of this ancestor of all living men "A," with the knowledge that there is nothing special about calling it A. We could just as well have called his haplogroup "Jim," if that was the name he used. His father, his brothers and probably many uncles and cousins in the neighborhood and surrounding neighborhoods probably belonged to the same haplogroup and might have objected to their group being dubbed Jim, but that would have been their problem, not ours. Anyhow, the brothers', uncles', and cousins' male lines all died out, leaving only Jim's male descendants. We have called him and all those descendants haplogroup A, at least up to the point that someone in the group had a SNP. SNIP! Before the SNP, there may have been several mutations in the YDNA regions we have called markers, and, if we could test the YDNA of group members, we would likely find that the group had several different values for any given marker. Still, they were all haplogroup A up until the SNP.

If Jim's descendant in whom the SNP occurred had a large number of descendants in his own male line, the SNP produced a new haplogroup that we could call A1, which is probably the designation actually given by genetic anthropologists, or we could call the new haplogroup B or X or whatever we wished. Names aren't particularly important except to distinguish one group from another, and the responsible organization changes haplogroup names from time to time. Call the new group A1, and the world now had two patrilineal haplogroups, A and A1. Even if the man in whom the SNP occurred and resulted in creation of haplogroup A1 had marker values almost identical to those of most of his cousins who remained in haplogroup A, marker-changing mutations would occur in both groups. Over time, by chance, these marker-changing mutations would likely make the most common marker values in one haplogroup different from the most common marker values in the other.

Since a genetic mutation produced haplogroup A1 from haplogroup A, it shouldn't come as a surprise that additional mutations occurred in A and A1. Some of the mutations were SNPs, and some were increases or decreases in the values associated with DNA markers. Both types occurred, slowly but inevitably and produced more haplogroups and more variation in marker values among the haplogroups. Each haplogroup has come to have marker values that are almost a signature for that haplogroup, not the value of any specific marker but the values of a group of markers. For example, a genetics expert can look at the marker values for a person in the Spartanburg Alexander family and say with almost, but not quite, perfect certainty that the person belongs to the M-222 haplogroup without a test for the M222 SNP being performed. Such discrimination will probably take comparison of several tens of markers.

YDNA Testing And The Alexander Project

Most of the approximately 370 men participating in the Alexander YDNA project by the middle of 2016 tested at least thirty-seven markers, and almost all had those tests run by the same laboratory. Several have had sixty-seven markers tested, and a few have had tests on one-hundred-eleven markers or more. The genetic scientists at the laboratory performing the tests don't say that two individuals are definitely related but, instead, provide only estimates of the likelihood of their having a common male ancestor within a given number of generations. Still, the statements they are willing to make can be taken to mean that a common ancestor since around 1500 or 1600 is unlikely if there are mismatches on more than four or five markers out of thirty-seven or more than six or seven out of sixty-seven, the maximum tested by most participants. Based on the laboratory's estimates, several participating Alexanders have so many mismatches with others in the project that it is unlikely they have a common ancestor within the last few thousand years.

In the project, we have learned that most family groups, as brought together from their YDNA results, have no more than four or five members. This means that there were many origins for the surname Alexander, even within the British Isles, where most of the project participants believe or know their roots to lie.

The largest family in our project, with more than seventy members in 2016, is the family deriving from James "the Weaver" or one of his close kinsmen, the family to which almost all United States Alexanders were assigned by early genealogists. The group is called the "Seven Brothers Family," although it is unlikely that there were as many as seven brothers as was once believed. Some of the project members of this family knew or strongly believed they belonged to the family, but others now in the family had no inkling of their origins. The YDNA mismatches between this family and our family group, which we can trace to the area around Anson County, NC, and Spartanburg County, SC, suggest a separation of at least a thousand years, and analysis of a specific mutation pushes the split back to well beyond a thousand years, meaning we are no closer to them genetically than to almost any person we meet casually on the street.

Only four or five other Alexander families in the project have more than about a dozen members, and our Spartanburg group is one of them, with about twenty members. A third family that can trace a few of its ancestors back to Glasgow, Scotland, and a fourth family that can trace some of their earliest known ancestors back to Campbelltown, Scotland, have nearly thirty members each. A fifth family,which has few common threads except YDNA that matches very closely within the group, can mostly trace to an area in SC near Spartanburg County, and, interestingly, this group and the Spartanburg group match so closely that there is only a low probability that our Alexander name does not have a common origin. They and we have designated our family groups as "Confused," or Cons. Although the laboratory geneticists are hesitant to say that the two groups have a common Alexander ancestor, I have studied the matches and mismatches and have done a statistical analysis that leads me to the conclusion that our common ancestor lived around 1300 to 1500, shortly after the time most Europeans were getting family names, by choice or by assignment. The YDNA of the group called the Campbelltown family differs from that of the Spartanburg Cons and the Cons Too (or Cons Two!) much less than is usually found between two families in our project, and the three groups probably have a common ancestor in the period around 1200 or a bit earlier. Although it is unlikely that we will ever trace the three families or two of the three to a common ancestor, we can perhaps hope that future advances in DNA analysis will help us trace them to a common time and place.

From the discussion so far, it is likely apparent that, in general, YDNA markers for men of a common surname can match closely or match very poorly, and, if they have tested on 67 markers or more, it is usually easy to determine whether they have a recent common ancestor of that surname. Good matches between individuals with the same surname mean they are likely related unless there is reason to believe otherwise; for example, one man's family has roots in Britain, and the other's family has roots in southern Europe or Russia. Although it is fairly unusual to have an exact match on all tested markers between two people whose most-recent common ancestor lived in the eighteenth century, there is likely to be no more than one, two, or three mismatches even then. Remember, however, that the exactness of the match does not depend directly on the closeness of the kinship. For example, I match exactly on all markers tested with three Alexanders with whom I have no common ancestor born later than about 1730, while each of them differs on one or more markers with a closer cousin, and I differ by one from a cousin who is closer to me. For the three of us matching exactly, each marker's value was the most common value found on that marker for several other Alexander YDNA testers, all but one of whom could trace his ancestral roots back to an area near Spartanburg County, SC. None of them differed from us by more than four markers, making all of us almost certain cousins.

A look at the table will show that each of two of the family groups, the Seven Brothers group and the Glasgow group, have so many DNA mismatches to each other and to the other three that no paternal-line kinship can exist. The Seven Brothers and the Glasgows are quite representative of the project's family groups in that they are quite distinct from the other families. Very few -- in fact, almost no other -- family groups match as closely as the Spartanburg group, the Cons Too group, and the Campbelltown group match one another.

Comparing the Spartanburg Cons and other family groups
Family Haplogroup &
Identifying SNP
Mismatches
on 67 Sites
Mismatches
on 111 Sites
Spartanburg Cons R1b1a2a1a1b4b
SNP: M222
-- --
Seven Brothers R1b1a2 19 33
Cons Too R1b1a2a1a1b4b
SNP: M222
7 14
Campbelltowns R1b1a2a1a1b4b
SNP: M222
8 16
Glasgows R1b1a2a1a1b 23 38
Just space
Comparing the Seven Brothers and other family groups
Family Haplogroup &
Identifying SNP
Mismatches
on 67 Sites
Mismatches
on 111 Sites
Seven Brothers R1b1a2 - -
Spartanburg Cons R1b1a2a1a1b4b
SNP: M222
19 33
Cons Too R1b1a2a1a1b4b
SNP: M222
18 30
Campbelltowns R1b1a2a1a1b4b
SNP: M222
18 33
Glasgows R1b1a2a1a1b 16 30
Just space
Comparing the Cons Too and other family groups
Family Haplogroup &
Identifying SNP
Mismatches
on 67 Sites
Mismatches
on 111 Sites
Cons Too R1b1a2a1a1b4b
SNP: M222
-- --
Spartanburg Cons R1b1a2a1a1b4b
SNP: M222
7 14
Seven Brothers R1b1a2 18 30
Campbelltowns R1b1a2a1a1b4b
SNP: M222
12 18
Glasgows R1b1a2a1a1b 21 34
Just space
Comparing the Campbelltown family and other family groups
Family Haplogroup &
Identifying SNP
Mismatches
on 67 Sites
Mismatches
on 111 Sites
Campbelltowns R1b1a2a1a1b4b
SNP: M222
-- --
Seven Brothers R1b1a2 18 33
Spartanburg Cons R1b1a2a1a1b4b
SNP: M222
8 16
Cons Too R1b1a2a1a1b4b
SNP: M222
12 18
Glasgows R1b1a2a1a1b 21 34
Just space
Comparing the Glasgow family and other family groups
Family Haplogroup &
Identifying SNP
Mismatches
on 67 Sites
Mismatches
on 111 Sites
Glasgows R1b1a2a1a1b -- --
Seven Brothers R1b1a2 16 30
Spartanburg Cons R1b1a2a1a1b4b
SNP: M222
23 38
Cons Too R1b1a2a1a1b4b
SNP: M222
21 34
Campbelltowns R1b1a2a1a1b4b
SNP: M222
21 34

Conclusions From YDNA Genealogy

The Spartanburg Alexanders fathered by James (died 1753) -- and perhaps by a brother or close cousin of James -- make up a separate family group, probably joined to the Con Too Alexanders and the Glasgow Alexanders a few centuries before they began migrating from the British Isles to colonial America. Of the approximately three hundred seventy men who have submitted their YDNA for testing, not one who can trace his Alexander ancestry back to earlier than 1700 matches our group extremely well; however, at least two of the Cons Too Alexanders, who are fairly close, claim an Alexander ancestor born in the mid-seventeenth century. It is likely that future advances in DNA research will permit narrowing down the time our Alexander line and the Cons Too Alexander line diverged, and we might even learn the area or village in which they then lived, but that is not currently possible, thus leaving James (died 1753) as our earliest-known Alexander ancestor.

We know that our original James passed on his YDNA to William, of whom little is known, James of Spartanburg, John, who died in Williamson County, TN, David, who lived and died in Pendleton District, SC, and Robert, who lived and died in Lincoln County, NC. With the YDNA passed on to each son, the following occurred: (1) most likely, no changes, (2) a small probability of a genetic mutation in one marker, or (3) an extremely low probability of a mutation in more than one of the markers. Each son then passed on YDNA to their sons with zero, one, or more than one change in the series of numbers, and each succeeding generation of males similarly passed on YDNA to sons with some small probability of mutation at each transfer. The YDNA from me, descending from James of Spartanburg, and a test participant descending from David matched exactly on 67 of 67 markers tested, while my DNA and that of a descendant of another brother differed on four markers; however, his DNA matched that of another descendant of Spartanburg much more closely.

Sometimes comparison of the values of each marker in a family can reveal additional information. For example, in the line of James of Spartanburg, all tested descendants of Matthew Alexander’s son Thomas have a value of 14 on a marker for which all other family members have a value of 13, meaning that the defining mutation occurred either when Thomas received the YDNA from Matthew or when Matthew received it from James of Spartanburg. Likewise, both tested descendants of another grandson of James of Spartanburg have a value of 29 on a marker for which all others have a value of 30. These two descendants also have higher values than anyone else on a second marker, and it is of interest to note that they don't have the same value for this marker, meaning that two separate mutations occurred on this marker for one of them. Mutations such as these that show up on the tests of some members of a family but not other members are often called branch identifiers because they may allow determination of which of several sons in a family was the patriarch of the branch even when there are no records to help.