Bayesian analysis of genetic backgrounds of twinning rate in Thoroughbred horses

The objectives of this study were: to evaluate the numerical effi ciency of Gibbs sampling algorithm in an animal threshold model with single gene effect and to verify a hypothesis about a single locus determining twinning rate in Polish Thoroughbred horse population. The properties of implemented algorithm were illustrated with the use of several simulated data sets. The real data set of 1859 mares with full pedigree information was analysed. For these data the hypothesis has been not rejected. It indicates a major gene segregation for horse twinning rate.


INTRODUCTION
By contrast to number of mammal species, in horses the twin births are unfavourable.In most cases, mare multiple pregnancies end with miscarriages and stillbirths.Live foal twins have usually demonstrated lower performance values compared to single ones (Kulisa et al., 1999;Morris and Allen, 2002).Heterosexual twin pregnancies may cause leukocyte chimerism, which determines sterility of the female (Rejduch et al., 2000) and in consequence, they lead to serious eco-SKOTARCZAK E. ET AL. nomic losses to the breeders.It should be noted that multiply pregnancies are the most important reason of reproduction depression.Over last years, some techniques have been applied to twin reduction, for instance early transrectal ultrasonographic detection of twins and manual crush of one embryonic vesicle.
Multiply pregnancies in horses are determined by genetic and environmental factors.Some investigations carried out in human population indicate an impact of some pharmacological resources (especially assisted reproductive techniques) on the increase of twinning rate.It corresponds with a few studies performed in horses.For instance, Faustini et al. (2003) showed relationship between pharmacological induction of oestrus and twin pregnancy in mares.The incidence of multiple ovulation is determined by age of mares and stage of lactation (Górecka, 2003).Unfortunately, the number of reports on the impact of environmental factors on multiple pregnancies in horses is still relatively small.However, it seems that research concerning other mammal species characterized by low fecundity is well-founded.According to Karlsen et al. (2000), for example, twining rate in cattle is affected by age at calving, consecutive parity, season of calving, etc.
Many authors suggest that multiply pregnancies are less frequently recorded in some of the so-called primitive horse breeds than in cultural ones, mainly for Thoroughbred horses (Perkins and Grimmett, 2001).Kulisa et al. (1999) reported that almost 10% Thoroughbred mares have at last one twin pregnancy.It clearly indicates considerable genetic backgrounds of this trait.On the other hand, the heritability estimates of twinning rate obtained via linear models were very small (Bresińska et al., 2004).Similar results, under a linear animal model, were reported for twinning rate in cattle.These estimates ranged from 0.003 to 0.09 ( Van Vleck and Gregory, 1996;Gregory et al., 1997).As expected, the higher heritability estimates of twinning rates were received by the use of threshold animal model both in horses (Wolc et al., 2006) and in cattle (Van Tassel et al., 1998).
Number of reports concern studies on genetic determination of fecundity.Over the last decades many genes have been identifi ed in livestock.Several putative genes have also been detected for cattle twins (Cruickshank et al., 2004), ovulation rate in pigs (Rathje et al., 1997) and sheep (Davis, 2005).Zoldag et al. (2003) suggested segregation of single loci for twin pregnancy in horses.The exact molecular identifi cation of locus/loci determining horse twins is expensive since a number of markers should be analysed.Therefore, the fi rst step of detection is usually based on free marker segregation analysis.These methods provide information on genotypic effects and allele frequencies.Nowadays, Bayesian approach, mainly Gibbs sampling algorithm, has been a very useful tool for this analysis (Sørensen et al., 1995).
The objectives of this study are: to evaluate the numerical effi ciency of Gibbs sampling algorithm proposed and to detect a single locus determining twinning rate in Polish Thoroughbred horse population.

Data
The data of 1859 Thoroughbred mares (with full pedigree information) born between 1929 and 1989 with 356 multiple pregnancies among them were analysed.Information on reproductive performance of mares as well as pedigree information was recovered from the Polish Stud Book (volumes V-XVI).Since the end of the 1980s there were manual crushes of one of twin embryos and unfortunately, these incidences have not been registered in the Polish Thoroughbred Horse Stud Book.Therefore, the data has been excluded from the analysis.The pedigree described the relationships between 2303 Thoroughbred horses (with 444 base individuals) including 259 stallions (Tables 1 and 2).

Model
In this study y i is the observed binary variable, which equals 1 for a mare with multiple pregnancy and 0 in other case.A mixed inheritance threshold model de-scribed among others by Morton and McLean (1974) and recently by Kadarmideen and Janss (2005) was used for the analysis of the data.The model has the following form: u = Xß+ZWµ+Za+e where: u (n x 1) is a vector of liability (unobserved variable), X (n x b) and Z (n x q) are known incidence matrices, W (q x 3) is a matrix assigning one of the three possible genotypes AA, Aa and aa to individuals, ß (b x 1) is a vector of fi xed effects, µ = [µ AA, 0, − µ AA ] ' , where µ AA denotes the additive effect of genotype AA which is assumed to be positive, i.e. µ AA > 0, a (q x 1) is a vector of additive polygenic effects and e (n x 1) is a vector of random errors.
The relation between the unobserved liabilities and the observed phenotypes is such, that if the observed binary variable denotes success (i.e.y i = 1) then the corresponding unobserved liability u i is positive.In the opposite case u i is negative.

Estimation
The estimation of unknown parameters was conducted with the use of Bayesian methodology and Gibbs sampling algorithm.The following prior probability distributions were assumed: the uniform distribution for vectors β and µ and the multidimensional normal distribution for vectors a and e, i.e. , where A (q x q) is the additive relationship matrix and I n denotes n-dimensional identity matrix.According to the theory of threshold model it was fi xed that 1 2 = e σ .Moreover, the inverted chi-square distribution was assumed for the additive variance component 2 a σ .In the fi rst step of Gibbs sampling it has been always set that each individual is heterozygote, i.e.W matrix has the following form: W = [0:1:0].Finally, for the major gene Mendelian transmission probabilities were assumed.
In each cycle of Gibbs sampling the unknown elements were sampled from the following conditional distributions: , where: i where: z i is the i-th column of Z, A i , i is the element in the i-th row and the i-th column of matrix A -1 , A i ,i is the i-th row of matrix A -1 without the i-th element, a -i is vector a without the i-th element, q i ,..., 1 = , where: w i is the i-th column of matrix W, (zw) i is the i-th column of matrix ZW, i = 1, 2, 3; where: R i where: G i is the genotype of the i-th individual, G -i is a table of the genotypes of all individuals out of i-th individual, the product over j refers to the progeny of the i-th individual, i s G is the genotype of the i-th individual's spouse, are the genotypes of the i-th individual's parents, i = 1,…,q.When the individual is not observed the last term is substituted by 1. Further: , where: n A and n a denote the number of alleles A and a respectively in the group of founders.
Apart from the parameters including in the model the single gene variance was calculated.
The verifi cation of the hypotheses concerning the signifi cance of estimated variances and single gene effect was based on the highest posterior density regions (HPDR) introduced by Scott (1992).The HPDR-s not containing zero indicate the signifi cance of the tested parameters.SKOTARCZAK E. ET AL.

Simulation
Seven different data sets were simulated and analysed in nine variants of the threshold model with single gene effect.In every case the binary records were generated with the use of the pedigree information about Thoroughbred horses mentioned above.The results of the parameter estimation for the simulated data are presented in Table 3.In each analysis 2000000 steps of Gibbs sampling were generated.The fi rst 1000000 was treated as a burn-period.After the checking of the autocorrelation in the generated chains it was decided to keep the important samples in 100-round intervals.The mixing properties of Gibbs sampling in the threshold model are rather poor.To avoid the discrepancy of the whole process an additional assumption about the additive variance component seems to be necessary (Kadarmideen and Janss, 2005).In the presented studies it was assumed that  Using the obtained samples the marginal posterior means were taken as the point estimators of the unknown parameters.

RESULTS AND DISCUSSION
Simulation study.The results of simulations indicate the diffi culties in obtaining proper parameter estimates when the observed variable is binary.Generally, it can be stated, that the variability of the unobserved variable is overtaken by the polygenic additive component.In all the analysed cases the σ a 2 component is overestimated.Consequently, the polygenic heritability coeffi cient will be overestimated.On the contrary, the major gene effect is usually underestimated proving that the procedure is rather conservative.Only one case shows the signifi cance of this parameter.In this case the HPDR interval does not include zero (D3).
As it is well known from literature (i.e.Moreno et al., 1997;Kadarmideen and Janss, 2005), many levels of fi xed effects in the threshold animal model can seriously distort the effectiveness of estimation of the other parameters included in the model.In the threshold model considered in this paper the vector of fi xed effects β and the vector of major gene effect μ are sampled in a similar way.So, the question is: is it possible to obtain the correct, mutually independent estimators of vectors β and μ in the applied procedure?The simulations denoted from D5 to D9 try to fi nd the answer to this problem.The general conclusion is that the procedure stays conservative even when the model used in the analysis of the simulated data was not appropriate.For example, the hypothesis concerning the signifi cance of major gene effect was not rejected even in case D7 when the data were simulated without μ vector and analysed with this vector instead of β vector.
From the practical point of view it is very important that the procedure never shows signifi cance of the major gene effect in cases in which the data were simulated assuming the absence of this effect (D5, D7, D8).
Twinning rate.As it was already mentioned, two fi xed effects could be concerned for the real data: the birth year of the recorded mares and the stud.However, the tests of the contingency table conducted in the pre-analysis showed that there did not exist the signifi cant differences in the percentage of multiple pregnancies inside the both groups.Therefore, taking into account the results of these tests and the simulation studies described above it was decided to analyse the real data according to the model without the fi xed effects vector, i.e. u = ZWµ + Za + e.
Marginal posterior density distributions of single locus effect, variance and frequency are visualized in Figures 1-3.The point estimators are summarized in Table 4.The estimated average effect of dominant homozygote was 1.504 with highest posterior density region ranged from 1.176 to 1.835.It suggests a mixed inheritance model for the investigated trait.As it was already mentioned, single loci have been identifi ed for a number of species of livestock.However, twinning rate is a complex trait, having multiple contributing components, expressed in binary scale.It corresponds with results obtained in cattle populations by Cruickshank et al. (2004), who described different localization of putative QTLs depending on family in North American Holsteins.It should be stressed that according to the current knowledge a simple additive biallelic locus model seems to be satisfactory (except for polygenic component) for the modeling of the fecundity traits.On the other hand, as it was mentioned by Kadarmideen and Janss (2005) the estimation of dominance in threshold model is diffi cult.The estimated dominant allele frequency is relatively low.Because the reproductive ability of twins is very limited it is likely infl uenced by natural selection.Pawlak et al. (2000) reported that 1.1% horse offspring from multiple pregnancies is viable.The cited results indicate that inheritance model assumed in the present study seems to be realistic.Considerably genetic dominance effect would reduce a number of heterozygotes and in consequence dominant allele frequency.Both genetic and phenotypic trends in population studied are highly changeable (Bresińska et al., 2004).Positive trends have been reported for cattle population ( Van Vleck and Gregory, 1996), which suggests positive genetic correlation between twinning rate and some traits included in the selection index.Fortunately, the dependencies are not visible in horse population studied.On the other hand, high fl uctuations of phenotypic averages (Bresińska et al., 2004) can be connected with the incorporation of some stallions (with hypothetical major genotype effects) into the population.The hypothesis on mixed inheritance model of twinning in this population was also supported by quasi-family analysis performed by Wachowska (personal communication).It corresponds with results of Karlsen et al. (2000) who concluded that some bulls had several hundred daughters with no multiple births and other ones had daughters with a twinning rate of 12.9%.
The results should be also perceived from statistical and numerical perspective.The limitation of the polygenic additive variance was important.As it is shown on Figure 4 the estimated variance is close to 2.0 and, in consequence, the heritability coeffi cient is overestimated.Similar problems in Bayesian analysis of binary data have been also described in literature (Kadarmideen and Janss, 2005).
It should be stressed that in the presented research all parameters (allele frequency, genotype effects and variance) of major gene are signifi cant -the HPDR (for 1-α = 0.95) did not include zero.Hence, it indicates a major gene segregation for horse twinning rate.However, the molecular analyses are necessary for physical localization of the hypothesed locus.
it is known from literature the heritability of reproductive traits is rather low, so the condition

Figure 1 .
Figure 1.Marginal posterior distribution of major gene effect

Table 1 .
The distribution of recorded mares across studs

Table 2 .
The distribution of recorded mares in consecutive birth decades