Entropy analysis of performance test results of warmblood stallions

The objective of the present study was to evaluate relationships among recorded traits in a 100-day stationary performance test in warmblood stallions in Poland. The possibility of reducing the number of recorded traits was assessed as well. The data are from 582 warmblood stallions undergoing a 100-day test in 2002-2010 at two Polish training centres. The stallions were pre-selected on nine single conformation and movement traits and then inspected on nineteen performance traits scored by the head of the training centre, judging commission, and test riders. To establish the relationships within the data, entropy analysis was used. Entropy and conditional entropy in relation to the final assessment, as well as joint entropy, mutual information, and their quotient were estimated. High relationships between traits depend on their source of variability (head of training centre, judging commission, test rider). Variability over time of ranks for many traits is influenced by the differences in the genetic structure of the stallions tested in the last ten years. These estimates of conditional entropy can be helpful in derivation of weights for the recorded conformation, movement, and performance traits.


INTRODUCTION
Over recent decades, horse breeding has been directed toward performance in sport competition.There are a number of equestrian disciplines such as dressage, show jumping, driving, and eventing competitions.Achievements in competitions at advanced levels are attained late in a sport horse's life.Furthermore, heritability estimates of competition traits are usually low (e.g., Ricard et al., 2000).This points to relatively small selection effectiveness based on sport results.Performance tests are, therefore, conducted to find talented horses for sport horse breeding.Different systems for testing horses and recording traits are used in particular European countries (Thoren-Hellstein et al., 2006).In general, three types of tests are used for young horses (Bruns et al., 2004): station performance tests, field performance tests, and competition tests.That division facilitates the registration of many traits of stallions and mares.It is well known that evaluation of sires is more advanced compared with dams.Relationships between performance test and racing results have been estimated by many authors (e.g., Sobczyńska, 2010).Although these correlations are usually positive, new approaches to performance and genetic evaluation in horses are still in demand (Luehrs-Behnke et al., 2002;Lewczuk et al., 2004).
Evaluation of the performance value of warmblood horses has evolved over the last decades (Lewczuk et al., 2004).Nowadays the system is based on 100-day stationary performance tests.Stallions are pre-selected on the basis of seven conformation and two movement traits, and then scored upon seventeen performance traits.The characteristics are recorded on a point scale.This is troublesome for main three reasons.First, statistical analysis of categorical data is difficult since the majority of classical statistical tools apply to continuous variables (traits) with a normal distribution.Second, genetic evaluation requires specific modelling, thus data transformation or nonlinear (especially threshold) models are recommended (Skotarczak et al., 2007).Furthermore, simultaneous estimation or prediction of effects for many categorical traits is computationally very demanding.Third, registration of many categorical traits has an impact on the cost of performance tests.
Hence, efforts to modify the evaluation procedure of stallions are discussed.Attempts to objectively measure the performance of horses using such parameters as heart rate, analysis of blood samples taken after the exercise test, and free jumping parameters measured by computer image analysis were presented by Lewczuk (1999Lewczuk ( , 2005)).Some authors have suggested inclusion of marker-assisted selection into routine genetic evaluation in horses.Schroeder (2010) reported a number of single genes that may affect performance traits.They can be perceived as potential support of genetic improvement programmes in warmblood horses.Also, reduction of the number of recorded traits seems to be an important aspect in optimizing the performance test procedure.
Recently, new approaches to the analysis of categorical traits have been reported in the literature.One of them is the so-called entropy analysis that enables reduction of recorded traits according to their contribution to the final assessment.The theoretical backgrounds of this methodology have been described among others by Shannon (1948).This approach is successfully applied to analyse discrete variables in a number of areas (Moore et al., 2006;Bertram and Gorelick, 2009).Entropy analysis is especially useful when analysing categorical, non-ranked variables.It provides much more information about reciprocal dependencies than the standard correlation analysis.It should be mentioned, however, that for ranked variables, it gives similar results to Spearman's coefficients.To our knowledge, however, no reports on entropy analysis in horse breeding are available in the literature.
Therefore, the objective of the present study was to evaluate relationships among the recorded traits in a 100-day stationary performance test in warmblood stallions in Poland.The possibility of reducing the number of recorded traits was estimated as well. .

MATERIAL AND METHODS
The dataset comprises all 582 warmblood stallions recorded at a 100-day stationary performance test in the years 2002-2010 at two Polish training centres in Bialy Bór and Bogusławice.The recorded individuals represent six breed groups (see Table 1).The stallions were pre-selected on nine conformation and movement traits (type -TY, head and neck -HN, trunk -FR, forelegs -FL, hind legs -HL, hoofs -HO, locomotion walk -LW, locomotion trot -LT, general impression -GI) and then inspected on nineteen performance traits scored by: the head of the training centre (training ability -TA, character -CH, temperament -TE, free jumping -FJD, jumping under rider -JRD, walk -WD, trot -TD, canter -CD, health and feed efficiency -HF); judged by a team of three persons, wherein the makeup of the team changed slightly over the years (free jumping -FJC, jumping under rider -JRC, walk -WC, trot -TC, canter -CC, general mark -GM, cross-country -EC), and test riders (rideability -RA, dressage-ability -DA, jumping-ability -JA).The sum of conformation and movement trait values has been defined in routine applied procedure as the conformation score.More details for definition of these traits and the inspection procedure are given on the website of the Polish Horse Breeding Association (http://www.pzhk.pl).
The stationary performance testing procedure changed over time.The following traits were recorded only in 2002-2003: health and feed efficiency -HF, general mark -GM.In turn, cross-country -EC was introduced for registration in 2008-2010.The number of animals recorded for these traits was relatively small, however.On the other hand, registration of these 'incomplete' traits is likely to affect the relationships among all the observed characters and, in consequence, the final evaluation.Hence, an analysis was performed within three periods: 2002-2003, 2004-2007, 2008-2010, respectively.Moreover, a joint analysis including all of the periods was done for conformation and movement traits because the scoring system had not changed over the years.A basic statistical description (averages and variability coefficients) of these traits is given in Table 2. 32.09 score range: 0.0-5.0 for HN; 0.0-15.0for TY, FR, GI; 0.0-100 for CS; 0.0-10.0for other traits; scores: TY, HN, FR, FL, HL, HO, LW, LT, GI, CS, FJC, JRC, WC, TC, CC, EC, GM (by judging commission); TA, CH, HF, TE, FJD, JRD, WD, TD, CD (by head of training centre); RA, DA, JA (by test rider) Entropy analysis was used to establish the relationships within the data.For each trait the entropy and conditional entropy in relation to the final assessment were estimated.Entropy H(A) of a discrete variable A measures the uncertainty connected with this variable: where: p(a) denotes the probability of a given value a of A. In fact H(A) is an expected value of a discrete random variable, named information, taking values -log p(a) with probabilities p(a).This variable has a property of taking great values for very rare events but for most certain events is close to zero.Conditional entropy H(A|B) quantifies the remaining uncertainty about A with the knowledge of B, i.e.: ) For each pair of traits the joint entropy, mutual information, and their quotient were determined.To discover the interaction between two variables, mutual information, namely where: H(A,B) = H(A) + H(B|A) = H(B) + H(A|B) denotes the joint entropy, was estimated.The validity classification of traits and the calculated relationships between them allowed assessing the possibility of reducing the number of traits in estimating the genetic value of horses, which leads to a reduction of error and increases precision of the evaluation.
Additional information about the relationships between the analysed traits was provided by the procedure of clustering variables.Following Ward's method (Jobson, 1992) a hierarchical clustering was used to construct a dendrogram, in which correlated traits are close together.To reduce the effect of different number of trait values, the reciprocity of normed mutual information, i.e.H(A,B)/I(A,B), was used as the distance measure.Analyses were performed using the statistical package R (R Development Core Team, 2009).

RESULTS AND DISCUSSION
The conditional entropy coefficients for conformation and movement traits are presented in Figure 1.As expected, the largest value was obtained for conformation score since it is a sum of scores for nine traits.Considerable differentiation for the other nine traits has been observed, however.The next two traits are called locomotive traits.As reported in Figure 1, locomotion trot is three times more important than locomotion walk.In the case of typical conformation traits, the conditional entropy coefficients ranged from 0.012 (for type) to 0.004 (for forelegs).The coefficient obtained for hind legs was considerably larger compared with forelegs.Magnitudes of these estimates are determined by the number of scores for particular characters.The conditional entropy coefficients for nineteen performance traits included in the final assessment for three periods are presented in Figures 2-4.Although the coefficients varied over periods, some tendencies can be observed.Large participations in the final assessment were obtained for training ability and jumping under rider scored by judges.Some traits (WD, CH, TD) are characterized by low importance on final evaluation.In fact, the above mentioned traits show moderate variability.Relative large fluctuations were registered for other traits, dressage ability, for instance.For this trait, the lowest validity was observed in the years 2004-2007, whereas in the first and last periods it was higher.The estimated coefficients may be helpful in establishing the weights for recorded traits.The results of clustering based on coefficients of mutual information for conformation and movement characters are given in Figure 5.It should be recalled that the coefficient (ranging from 0 to 1) shows relationships between pairs of traits.So, strongly related traits, with a high coefficient of mutual information, are close together on a diagram.This points to the chance to omit one of them.The length of the vertical branches represents relative proximity.It increases in the hierarchical clustering process and does not carry any quantitative information.Based on the applied algorithm, the dendrogram shows two opposite clusters containing scores for head-neck and hoofs (group 1) and the remaining seven traits (group 2).The highest similarity was registered for type and general impression.As shown in Figure 5, the three traits, namely hind legs, trunk and forelegs are gradually included.It should be stressed that both movement traits constitute a subcluster.Fluctuation of validity and relationships between these performance traits over three periods can be influenced by several factors.Huizinga et al. (1991) suggested that subjectivity may enable the judging commission to include previous knowledge into scores given during the stationary performance testing.Moreover, the authors cited above noticed that if members of a commission disagree about what to score from a trait and how to score, then reproducibility of scoring will consequently be low.Re-rankings over time in particular traits may also be affected by differentiation of the genetic structure of the population.One of the main principles of the genetic improvement programme (approved by the Polish Horse Breeding Association (http://www.pzhk.pl) is to cross the local warmblood population with foreign breeds.This corresponds with data given in Table 1, which include six breed groups.Crossbreeding effects on conformation traits in Polish halfbred stallions were discussed by Lewczuk (2005).In recent decades, globalization of horse breeding has increased considerably.It is mainly stimulated by artificial insemination.International exchange of horse genetic material has been perceived as the main reason to conduct international genetic evaluation of sport horses under the so-called Interstallion (Bruns et al., 2004).Among the objectives of the organization is the review of national testing and evaluation systems as well as improved access to and understanding of breeding information across countries.
The results obtained in the present study require analysis in this context as well.
Generally, the testing systems of stallions in European countries are different (see review by Thoren-Hellstein et al., 2006).There are three basic approaches to performance evaluation: station tests, field tests and competition.Also a variety of weighting proportions in the total score for traits has been observed (Albertsdottir et al., 2011).More and more efforts have been undertaken, however, towards genetic connectedness between country-members of Interstallion (Ruhlmann et al., 2009).Furthermore, recommendations for objectification and reduction of traits have been described in the literature.Some attempts to objectively measure the performance of horses using such parameters as heart rate, analysis of blood samples taken after the exercise test, and free jumping parameters measured by computer photo/image analysis were presented by Lewczuk et al. (2004).Stock and Distl (2005) described the application of X-ray in selection in warmblood stallions.Ducro et al. (2007) reported that the procedure was used in routine evaluation among others in the Netherlands.The orthopedic status of horses has also been recorded for Swedish Warmblood Riding Horses (Wallin et al., 2003).The range of performance tests is determined by a number of factors, for instance the assumption of a local genetic improvement programme (including genetic parameters for the traits) or international competition for sport horses.Hence, a reconstruction of the testing procedure of warmblood stallions in Poland seems to be necessary in the next years.
It should be noted, however, that real evaluation of the suitability of the currently applied procedure can be performed by comparative studies of stationary test and sport results, including both phenotypic and genetic analysis.These magnitudes of conditional entropy can be considered in derivation of weights for the recorded conformation, movement, and performance traits.

CONCLUSIONS
The variability of ranks for many traits over time is influenced by the different genetic structure of the stallions tested over nine years.On the other hand, it may suggest a somewhat unstabilized testing procedure.Relatively high relationships between traits depend on the source of variability (head of training centre, judging commission, test rider).The obtained results point to a chance for reducing the number of traits recorded during stationary performance testing.

Table 2 .
Mean and standard deviation (SD) for the conformation, movement and performance traits in Polish half-breed stallions