Impact of data transformation on the heritability estimates of reproductive traits in laying hens*

Two reproductive traits: percentage of fertilized egg (PFE) and percentage of hatched of set eggs (PHC) of four strains of laying hens (15 339 recorded individuals) over nine generations from pedigree farm were studied. The computations for three type of data sets for each trait (untransformed, arcsinsqrt transformed, probit transformed data) were used to estimate direct and maternal genetic variances within strains. Prior to analysis each observation was divided by the average. Error variance estimates and logarithms of likelihood were taken as comparison criteria. Generally, the reproductive traits are low heritable. Negligible differences between direct heritability estimates of PFE have been registered. However, these estimates obtained from untransformed data were larger than the transformed ones. Generally, smaller residual variances have been received from arcsinsqrt transformed data (in a majority of cases, largest error variances were estimated from probit transformed data). It usually corresponds with the second of the employed criteria.


INTRODUCTION
Generally, reproductive traits are troublesome in statistical analysis. It is known that one of the main assumption of the classical methods is, among others, normality of residuals. Unfortunately, fertility and hatchability, expressed usually as percentage per dam, do not hold the assumption. Thus, data transformations (see e.g., Foerster, 1993) are recommended. When eggs are treated as units, the threshold model (Gianola and Foulley, 1983) can be used. However, from a computational standpoint the use of a threshold approach in complex animal models is more demanding than the linear model. Therefore, from the practical point of view for "easier" computations (theoretical advantage of the threshold model), the transformation of data sets to normality is still preferable. Moreover some authors (Hagger and Hofer, 1989;Varona et al., 1999) reported similar estimates of genetic parameters obtained via the linear and threshold models.
The last decades have seen an increasing role of reproductive traits in livestock improvement programs. A number of investigations concerning the fertility and hatchability have been conducted in layer populations (Chaudary et al., 1987;Hartmann, 2001). It is known that the development of a chicken embryo depends on the egg environment during incubation. The so-called "egg environment" is determined by both dam genotype and the external environment. Hence, a statistical model including the maternal (indirect) effects is more frequently applied to the genetic study of these traits. For instance, Sevalem et al. (1998) reported that maternal additive genetic variance for laying hen reproduction traits made up a considerable part (0.19) of phenotypic variability.
The objective of this paper is to analyse the adequacy of heritability estimates models for transformed and untransformed data-sets of reproductive laying hens. The Bliss degrees (arcsin square) and probit transformations have been checked.

Birds
The study is based on information collected on Pedigree Laying Hen Farm of Iwno (West Poland) in the years 1989-97. Four strains have been included into the analysis (more details are given in Table 1). The percentage of fertilized eggs (PFE) and percentage of hatched of set eggs (PHC) were observed. PFE was examined by candling on day 8 of incubation. The number of observations per generation (year) is relatively small because both traits were registered only for dams (chosen as parents). Feeding level and other environmental conditions did not considerably vary. Descriptive statistics of the data files are also shown in Table 1.

Data sets
Prior to analysis the following three data sets for each trait of each strain were formed: DATA-1 (untransformed data), DATA-2 (data transformed as y^arcsinco/ 72 (see e.g. Foerster, 1993), where: is the i-th untransformed observa-tion (percentage), DATA-3 (the so-called probit transformation). In the case of probit transformation, cumulative frequencies are transformed to a normal probability (probit) scale (see e.g., Lynch and Walsh, 1998).
To compare the results from different data sets (untransformed and transformed data) each observation was divided by the mean.

Genetic model
A single trait animal model has been used to estimate a direct and maternal heritability within strains: y = Xjb, + X 2 b 2 + Z,a + Z 2 m+ e where: y is the nxl vector of observations; bj is a p J xl vector of fixed generation (year) effects (p l =9); b 2 is a p 7 xl vector of hatch period effects (p 2 =4); a is a qxl vector of random direct additive genetic effects; m is a qxl vector of random maternal additive genetic effects; e is a nxl vector of random errors; Xj, X 2 Zj and Z 2 the nxp r nxp v nxq and nxq incidence matrices, respectively. The first and second moments were assumed to be as follows: where A is the qxq additive relationship matrix; a 2 is the direct additive genetic variance; a 2 is the error variance; a 2 m is the maternal additive genetic variance; a am is the covariance between direct and maternal additive effects.
Hence, the y ~ N^b, + X 2 b 2 , Z x AZ\a 2 a + Z x AZ\<5 am + Z 2 AZ\o am + Z 2 AZ' 2 g 2 w + I a 2 e ).
The following genetic^parameters have been estimated: -direct heritability (h a =cr a / a p ) , -maternal additive heritability (h 2 = <j 2 m /a 2 p ), -covariance between direct and maternal effects as proportion to phenotypic variance (d am = G am / o% -total heritability (h 2 T = (a 2 + 0.5a 2 ,, + l.5c am ) / a 2 ), where a 2 is the phenotypic variance.

Computing algorithm and comparison criteria
The derivative-free restricted maximum likelihood (DFREML) algorithm (Meyer, 1989) under a simplex procedure has been employed. A value of 10 8 was used as the convergence criterion on all analysis. The following starting values for each data set were taken: 0.5 for /? 2 , 0.01 for h 2 and 0.001 for d am .
The residual variance estimates were used as a criterion of the model's adequacy for the same trait within strain. Moreover, the logarithm likelihood values (log L) were also checked. The computations were performed by the use the DFREML package programs of Meyer (1993).

RESULTS AND DISCUSSION
Estimates of direct and maternal heritabilities and covariance between the effects (as proportion) for the two traits studied are listed in Tables 2 and 3. As expected, reproductive traits have been shown to be low heritable. However, the estimates are higher for PFE than PHC, which corresponds with the number of results reported in the literature (Sewalem, 1998;Szwaczkowski et al., 2000;Hartmann, 2001). It seems that a larger genetic variability of PHC resulted from a relatively long physiological process. On the other hand, opposite relationships of these traits have also been found. Additionally, differences between the strains have been registered. So, many comparisons of estimates from untransformed and two -way transformed data can be performed. As already mentioned two various approaches have been examined. A number of authors (see e.g., Foerster, 1993) have previously employed the Bliss degrees to transform fertility and hatchability observations. On the other hand, some earlier studies conducted by Szwaczkowski and Piotrowski (1998) indicated unsatisfactory approximation of the empirical distribution (PFE and PHC) to normality. Hence, the probit transformation recommended for discrete data has also been checked. In general, negligible differences between direct heritability estimates of PFE have been observed, although higher estimates (with the exception of strain H77) are obtained from DATA-2. Analogous relationships have been noted for h 2 m and d estimates. More pronounced differences between DATA-3 and other data sets am 1 have been obtained for total heritability estimates in two strains (H77, N88).
In all four strains, heritability estimates of PHC obtained from DATA-1 were larger than the transformed data sets. However, in two strains (H77, S22) the differences between DATA-1 and DATA-2 were small. Similar tendencies have been registered for other estimated functions of (co)variance components including total heritability.
The number of investigations on the effects of non-normality of distribution have been carried out on egg production traits (e.g., Besbes et al., 1993;Szwaczkowski et al., 1994;Koerhuis, 1996). Ibe and Hill (1988) pointed out how transformation of egg production could increase the efficiency of selection through a higher heritability of the transformed data. Unfortunately, the obtained heritabilities do not lead to univocal implications, except some estimates from the probit transformation. Therefore, particular (co)variance components have also been monitored (see Figures 1 and 2). As already mentioned, to compare the magnitudes of some estimates, each observation (within data set) was divided by the average. Higher direct genetic variance components are usually estimated from untransformed data rather than from the transformed ones. However for strain R33 the highest estimates a I have been obtained from the probit transformed data. Basically, similar relationships have been obtained for a 2 , and a am . Generally, estimates of maternal genetic variances are more strongly influenced by strains than by transformation approaches. Covariances between direct and maternal effects are also determined by transformation. These covariance estimates for PFE are positive whereas the dependences for PHC are negative.
It seems that transformation can improve the statistical properties of a trait depending on its original distribution. As presented in Table 1 the distributions of two traits over four strains are considerably different. Skewness coefficients of PFE are higher compared to those of PHC. By the way, it should be noted that PHC distributions for R33 and S22 are more symmetrical than the other ones.
What is data best statistical properties? Two criteria of goodness of model (residual variance estimator and magnitude of logarithm likelihood) have been employed (see Table 4). Generally, smaller residual variances (for both traits of all strains) have been obtained from Bliss degree transformed data. It should be stres-  sed that largest error variances were estimated from DATA-3. For the same data set (especially, PFE of H77 and N88) from DATA-2, were 3-4 times smaller than those for DATA-3. However, for PHC of two strains (R33, S22) error variance estimates from untransformed data are higher compared to the probit transformed ones. According to this criterion it may be concluded that the arcsin transformation leads to the most satisfactory estimates of genetic parameters. Generally, it corresponds with the second of the criteria used (logarithm likelihood). Of all the data sets, the largest logarithms of likelihoods have been obtained for Bliss degree transformed data, whereas for PFE medium values have been received from untransformed data. Differences between DATA-1 and DATA-3 for PHC are strongly influenced by strains, and in consequences it may also be determined by trait distribution within strain, starting values and convergence criteria which must be specified for each given data file. In spite of all, variability of heritability estimates is not very large.
From the theoretical point of view the arcsin squared transformation can be recommended for the reproductive layer traits. On the other hand, differences between strains in heritability estimates have been registered. Hence, it seems that other transformation approaches may also be considered depending on the trait distribution.