Estimates of genetic parameters of milk traits using REML and Gibbs sampling

Two methods of variance component estimation: Restricted Maximum Likelihood (REML) and Gibbs Sampling (GS), were used to analyze data consisting of 305d first lactation milk and fat yield and fat content of 47,574 cows calved from 1989 through 1996. Two three-trait linear models were applied, both including fixed effect of HYS, random effect of animal and error. One included fixed effect of age at calving class, and the other regression on age at calving. The estimates of genetic correlations and heritabilities were higher when Gibbs Sampling was used; the phenotypic correlations did not depend on the method applied (GS or REML). The heritability estimates for milk yield were 0.28-0.29 by GS and 0.24 by REML. For fat yield they were 0.24 with GS and 0.19 with REML. For fat content they were 0.44-0.45 and 0.37, respectively. The biggest difference between h 2 estimates was found for the fat content. The magnitude of heritabilities was not influenced by the linear model used (with age classes or regression on age).


INTRODUCTION
Estimates of genetic parameters such as heritabilities and genetic correlations are necessary for evaluation of breeding values and designing breeding programmes.The variance and covariance components needed for these estimates require multivariate analysis which accounts more accurately for selection than univariate analysis (Meyer, 1991).These components can be obtained by a variety of methods among which restricted maximum likelihood (REML) has been most frequently used in animal breeding during the last fifteen years (Hartley and Rao, 1967;Patterson and Thompson, 1971).Several software packages for REML variance/covariance estimation are available (Meyer, 1988;Jensen and Madsen, 1992;Misztal, 1994;Boldman et al, 1995).
During the last decade the Bayesian approach based on Markov Chain Monte Carlo methods (MCMC) was introduced to estimate parameters of more complicated models in animal breeding (Gianola and Fernando, 1986;Jensen et al., 1994).A variant of MCMC methods known as the Gibbs sampler is used to obtain ML estimates of variances in linear models (Gelfand and Smith, 1990;Casella and George, 1992).In animal breeding, Gibbs sampling was used for the first time by Wang et al. (1993Wang et al. ( , 1994)).
As a general numerical integration method, the Gibbs sampler is particularly useful for high-dimensional integration in situations where ML methods have failed.The method generates random samples from the marginal posterior distribution of all parameters in the model through sampling from and updating conditional posterior distributions (Wang et al., 1994).Using these random samples, the point estimates of variance components and their errors can be calculated.This is the main advantage of Gibbs sampling over other methods of estimation.Although computationally intensive and complicated, the procedure is simple for programming because all calculations are expressed in scalar algebra and no matrix inversion is needed.
This study compares estimates of genetic parameters for milk production traits obtained by Gibbs sampling (GS) and the REML methods, both applied to multiple trait (MT) animal model.

MATERIAL AND METHODS
The data consisted of milk and fat 305d first lactation yield and fat content from 47,574 cows, daughters of 2,504 sires and 43,055 dams.The pedigree file contained 96,606 animals including cows, their parents and grandparents.Cows calved for the first time from 1989 through 1996.There were 371 herds with a minimum of 35 cows, 850 HYS subclasses and 2 seasons of calving: April-September and October-March.
The following linear models was used for multiple trait analysis: y. = Xh.+Zu. + e . (1) where y. is the vector of observations for trait i (i=l, 2, 3 for milk yield, fat yield and fat content, respectively), b ip b 2 .are vectors of fixed effects, u. is the vector of random animal effects and e. is the vector of random residual effects.X p X 2 and Z are corresponding incidence matrices.where G is the 3 x 3 genetic (co)variance matrix, A is the numerator relationship matrix, E is the 3 x 3 residual (co)variance matrix, I is the identity matrix and ® denotes the Kronecker product function for matrices (Searle, 1982).Age at calving was expressed in days in model (1).Six age groups were created for model (2).The characteristics of the data are given in Table 1.
Variance and covariance components were estimated for all three traits simultaneously, assuming one of the two linear models and using Gibbs Sampling (GS) or Restricted Maximum Likelihood (REML) algorithms.The four resulting combinations of models and procedures were as follows:  van Tassell and van Vleck (1995) were used for estimation of (co)variance components by the Bayesian approach.Prior distributions were needed to describe the Bayesian model.For the fixed effects, flat priors were used, indicating no prior knowledge of these effects; for (co)variances of genetic and residual effects an inverted Wishart distribution was assumed.There were 20,000 samples generated by the Gibbs Sampler with 1,000 rounds in the burn-in period.The REML estimates of genetic parameters were obtained using the MTC program of Misztal (1994) with an assumed convergence criterion of 10~8 .Standard errors of heritabilities were estimated using van Raden's algorithm (Misztal, 1994), standard errors of correlations were not calculated.

RESULTS AND DISCUSSION
There were 316 and 349 rounds of Gauss-Seidel iterations for GS-1 and GS-2, respectively, followed by 20,000 Gibbs samples generated for each model.In the case of REML-1 and REML-2, estimates of (co)variances were obtained after 101 and 61 iterations.
Estimates of genetic and residual (co)variances obtained by the GS and REML methods are given in Tables 2 and 3.The variance components for animal effect  obtained by the GS method were 20-30% higher and the residual components 5-10% lower than the corresponding REML estimates.The covariances between negatively correlated traits (milk and fat %) obtained by the two methods were very similar.Also, there were no differences in the magnitude of variances and covariances estimated from the two models, indicating that they describe the analysed traits similarly.The estimates of heritabilities and genetic and phenotypic correlations for milk production traits are presented in Tables 4 and 5.The GS estimates of heritabilities for milk yield were 0.28 and 0.29, depending on the model, slightly higher than those estimated by REML (0.24).The heritabilities of fat yield (0.24 for GS and 0.19 for REML) and of fat % (0.44-0.45 for GS and 0.37 for REML) were also higher when GS was used as the method of estimation.The simple differences between the magnitudes of parameters estimated by GS and REML were small, only 0.05, on average (Tables 4 and 5).The genetic variance compo-  nents obtained by GS were larger than those obtained by REML (Table 2); thus the latter gave smaller heritabilities.The biggest differences between estimated heritabilities were found for the trait with high heritability, i.e. fat content.Within the same method of estimation the magnitude of heritabilities was not influenced by how the age effects were defined in the model.The genetic correlations estimated by REML were slightly lower than the GS estimates.The biggest difference was found between correlations for milk yield and fat content.The phenotypic correlations were the same no matter which method and model were used.Milk and fat yields were highly correlated (genetic correlation 0.79-0.81and phenotypic correlation 0.87-0.89),whereas the correlation between fat % and fat yield was low (genetic correlation 0.21-0.27and phenotypic correlation 0.32), implying that selection for fat % would not increase fat yield.An increase in fat yield could be achieved by direct selection or through selection for milk yield.The genetic correlation between fat content and milk yield was rather low and negative (-0.33 to -0.43).The phenotypic correlation between those two traits was even lower and also negative (-0.16 to -0.17).
All estimates were similar to those cited in the literature (Zuk et al, 1981;Chauhan and Hayes, 1991;Visscher and Thompson, 1992;Jamrozik and Zarnecki, 1993).The h 2 values for fat yield were lower than for milk yield but still within the range of values reported by other authors (Meyer, 1985;Jamrozik and Zarnecki, 1993).Genetic (0.72-0.76) and phenotypic (0.73-0.81) correlations between milk and fat yields were slightly higher than those obtained by Cue et al. (1987) or Visscher and Thompson (1992) but lower than reported by Zuk et al. (1981) and Jamrozik and Zarnecki (1993).Meyer (1985) found values of 0.76, -0.39 and 0.30 for genetic correlations between milk and fat yields, milk yield and fat content, and fat yield and fat content, very similar to the figures in the present study.A genetic correlation of 0.56 between fat and fat content, a lower genetic correlation of 0.45 between milk and fat yield and a high heritability of 0.65 for fat content were found by Chauhan and Hayes (1991).Their results differed from the genetic parameters estimated for the Polish population.
The standard errors of the heritability estimates were less than 0.001 for GS and less or equal to 0.01 for REML method.These errors made up less than 0.40% of the GS estimated values and were 2.5 to 3% of all REML estimates.In the case of GS the standard errors of heritabilities were calculated on the basis of generated samples, while for the REML method they were approximated.
The errors of all estimates derived from GS were significantly smaller than those from REML, giving more accurate estimates.Van Tassell et al. (1995) compared the standard errors of estimates obtained by these two methods and concluded that GS gave smaller standard errors than REML because of the impact of the prior distribution of variance components.This difference will decrease when the amount of data increases.They also stated that GS may have advantages over REML for large data sets because it allows calculation of parameters without approximations or normality assumptions.Jensen et al. (1994) pointed out that REML analysis includes only an approximation of standard errors of estimates, while GS yields the full marginal posterior distribution permitting direct computation of standard errors as well as many other features of this distribution.Wang et al. (1994) stated that an important advantage of GS is that it provides all parameter estimates always within the permissible parameter space.A serious problem of REML analysis is that estimates can be outside of the parameter space.
This study found that genetic parameters could be estimated using a model with age classes or a model with regression on age, alternatively.When the GS and REML estimation methods were compared, the application of Gibbs Sampling (GS) proved more desirable because of the smaller errors of the heritability estimates and because we could directly calculate not only the point characteristics but also the confidence intervals or any other functions of the posterior distribution of the genetic parameters (Gianola et al., 1991;van Tassell et al., 1995).On the other hand, GS is much more time-consuming than REML and the values of the parameters depend on priors.Van Tassell et al. (1995) showed an increase in the bias of GS estimates when the prior value of h 2 was assumed above the true value.They stated also that the dependence on priors lessens for highly heritable traits.The power of Bayesian methods in animal breeding applications might be evident with more complicated problems solved only approximately by traditional methods.

The difference between bj. and b 2i is that the
former contains HYS effects and regressions on age at calving while the latter includes HYS and age at calving classes.The variance-covariance matrices for random effects were defined as

TABLE 1
Numbers of records, means (x) and standard deviations (SD) for 305d milk and fat yield and fat content by year of calving, season of calving and age class XSDXSD

TABLE 2
Variance components (on diagonal) and covariance components (above diagonal) for first lactation milk yield, fat yield and fat content of Polish Black-and-White cows estimated by Gibbs Sampling (GS) using the model with regression on age (1) or with age classes (2)

TABLE 3
Variance components (on diagonal) and covariance components (above diagonal) for first lactation milk yield, fat yield and fat content of Polish Black-and-White cows estimated by Restricted Maximum Likelihood (REML) using the model with regression on age (1) or with age classes (2)

TABLE 5
Heritabilities (on diagonal), genetic correlations (above diagonal) and phenotypic correlations (below diagonal) for milk yield, fat yield and fat content estimated by Gibbs sampling (GS-2) and REML (REML-2) for model with age classes (2)