R.H. Fletcher, M.Sc Hons(Statistics), July 1995.
There are two major factors that influence an individuals measured performance: genotype and environment. These two factors are sometimes loosely called breeding and feeding. The effects of these two factors are not immediately distinguishable in an individual animals performance, but it is of primary concern to the animal breeder to attempt to separate these two influences, as it is only the genetic component or breeding that is passed on directly to future generations. Breeding values are an attempt to estimate the genetic component of the actual measured performance of an animal, and hence are more correctly known as Estimated Breeding Values or EBVs.
In order to attempt to discern the genetic component of an animals performance, we need to understand a little about the nature and distribution of the variation of performance figures within a flock.
Some statistical terms:
1. Normal distribution:
This is the distribution of values that is most widely used to represent a frequency graph. Variables that follow a Normal distribution are most likely to have an average or typical value called the mean, and are progressively less likely to occur further away from (above or below) the mean. The frequency graph is a bell-shaped curve like the one shown below for body weights:
The dotted vertical lines are drawn at plus and minus one standard deviation (15kg) from the mean. Approximately two thirds of values will lie within this range ie. from 20kg to 50kg in this example.
Suppose that we did not know that the mean was 35kg, and we were attempting to estimate the mean by weighing a random sample of say 25 animals. These 25 animals might have an average weight of 33kg with a standard deviation of 12kg. Because of the small sample size, the best we might be able to say about the average of a very large number of such animals is that the mean should lie within say 2.4kg of the observed average of 33kg. This figure of 2.4kg is called the standard error of the estimate of the mean. It is calculated as the standard deviation divided by the square root of the number of animals measured:
12kg / square root(25) = 12 / 5 = 2.4kg
If we were to take a large number of such samples of 25 animals, then we would guess that in approximately two thirds of the samples, we would measure an average between 30.6kg and 35.4kg.
Because of the square root involved in the formula, increasing the sample size four-fold to 100 animals would halve the standard error. The standard errors of EBVs behave in a similar manner.
Thus a standard deviation is a measure of the variability or spread of the data, whereas a standard error is a measure of the likely variability (or accuracy) of an estimate of a figure such as a population mean or an individuals breeding value.
A standard deviation is more or less the same, no matter how much data is used to calculate it, whereas the standard error of an estimate gets smaller (ie the accuray of the estimate is improved) as more information is used to calculate the estimate.
2. Breeding Values:
These are defined as a figure for each individual, which represents how much better or worse the average of an infinite number of progeny of this animal should be. Thus, since no individual animal can have this many progeny, it is a figure that can never be known with complete accuracy. However, the idea of a breeding value is still a very useful concept, and using estimated breeding values (EBVs) to help in deciding which animals to keep for breeding purposes can increase the annual rate of genetic (permanent) improvement in a flocks performance. In the final analysis, however, the breeder must also take other things into consideration, such as faults, when selecting animals to be the parents of the next generation.
3. Estimating Breeding Values:
Now, we have to resort to a mathematical model describing how the performance figure for an individual animal arises:
P = g + e,
or, in words, what we observe, the animals phenotype, is the sum of its genotype and the environmental influences it has faced in its life to date. ("Breeding" plus "feeding").
We can further refine this model by splitting the environmental and genetic components as follows:
i) g = (gs + gd) / 2 + gi
or an individuals genetic component is the average of that of his sire and his
dam (gs + gd) / 2, plus a portion that derives from the random
assortment of genes at conception (this is the bit that can make brothers and sisters in a
large family quite different, even though they share a family similarity.)
ii) e = ek + eu
where ek is the sum of known environmental influences such as birth and rearing ranks, sex, birthdate and age of dam,
and eu are other (unknown) environmental influences such as feed availability, parasite control etc, etc.
We can see from i) that half sibs (animals with the same sire) can provide some information about the likely value of gs, while full sibs will help in estimating the gd part.
Similarly, in ii), we can use the average differnce between twins and singles for example, to help account for the effects of birth and/or rearing ranks.
Using all the information available to us, we can get a fairly good estimate of gs (the animals sires breeding value) , (not so good an estimate for the dam because of fewer progeny) but even under truly ideal conditions, we can only guess at the gi part of an animals genotype, and the eu part of its phenotype. This is why EBVs have quite large standard errors.
In fact, using the formula 1} above, we can prove that, in a randomly mating population (where genetic variation is maintained), the genetic variation due to the random assortment of genes at conception is exactly one-half of the total genetic variation in the population:
Var(g) = Var ((gs+gd)/2) + Var(gi)
Culling and mating ratios:
Note that the estimated breeding value for a sire is more accurate (has a smaller standard error) when he has a larger number of offspring measured, and similarly his offsprings EBVs will be more accurate than those from another sire with less recorded offspring. Also, excessive culling after weaning will reduce the number of offspring recorded for later traits, and thus reduce the accuracy of EBVs.
Breeding values, or more correctly Estimated Breeding Values, can be a useful tool for an animal breeder and will likely lead to better selection decisions than merely using the raw data, or the appearance of an animal, but it should always be remembered that they are estimates and have associated standard errors reflecting this fact.