Modelling of individual lactation curves of Tunisian Holstein-Friesian cows for milk yield , fat , and protein contents using parametric , orthogonal and spline models

ABSTRACT. Different equations were employed to directly model individual lactation curves of Tunisian Holstein-Friesian cows using 260 241 test day re-cords for milk yield, fat, and protein contents. Eleven mathematical models were compared. Parametric curves (Legendre polynomials, Ali and Schaeffer (AS), Wood, and Wilmink models) and regression splines were tested. Good-ness of fit was assessed by considering the adjusted R -square ranked ac-cording to five classes, statistical criteria, and residuals analysis. Regression splines (quadratic and cubic spline models with three knots) showed better fitting performances and greater flexibility for all milk traits. The sixth-order Leg-endre orthogonal polynomial model and the Ali and Schaeffer model also gave the best fit for milk traits compared with the three-parameter models (Wood and Wilmink) and with the lower-order polynomial models, but the AS model was less correlated for fat content ( R = 0.87), gave a moderate correlation ( R = 0.90) for protein content, and had less similarity between the observed and estimated lactation curves. The performance of Legendre orthogonal poly-nomials and quadratic splines was strongly affected by the models’ order and the number of knots.


Introduction
The accurate description of lactation curves is an interesting topic for breeding and management purposes in dairy farming.Lactation curves are also an efficient tool for the genetic improvement of milk production traits.In the last two decades, a great deal of effort has been devoted to investigating the direct use of test day (TD) yields in the genetic evaluation of dairy cattle (Otwinowska-Mindur et al., 2013) and genetic merit based on test day models was proposed (Ptak and Schaeffer, 1993).Several methodologies have already been presented for genetic evaluation of production traits in dairy cattle based on test day records (Jaffrezic et al., 2002).Currently, lactation curve fit is used for modelling the random part of the lactation curve in the random regression models, the first functions applied were the Ali and Schaeffer and the Wilmink models (Bohmanova et al., 2008).More recently, Legendre polynomials (Misztal, 2006) and spline models (Meyer, 2005) have been used as the basic function in many analyses by random regression models.
The use of individual curves to select the most accurate model to fit the lactation curve is more ABSTRACT.Different equations were employed to directly model individual lactation curves of Tunisian Holstein-Friesian cows using 260 241 test day records for milk yield, fat, and protein contents.Eleven mathematical models were compared.Parametric curves (Legendre polynomials, Ali and Schaeffer (AS), Wood, and Wilmink models) and regression splines were tested.Goodness of fit was assessed by considering the adjusted R-square ranked according to five classes, statistical criteria, and residuals analysis.Regression splines (quadratic and cubic spline models with three knots) showed better fitting performances and greater flexibility for all milk traits.The sixth-order Legendre orthogonal polynomial model and the Ali and Schaeffer model also gave the best fit for milk traits compared with the three-parameter models (Wood and Wilmink) and with the lower-order polynomial models, but the AS model was less correlated for fat content (R = 0.87), gave a moderate correlation (R = 0.90) for protein content, and had less similarity between the observed and estimated lactation curves.The performance of Legendre orthogonal polynomials and quadratic splines was strongly affected by the models' order and the number of knots.
efficient than the use of average curves.Mean lactation curves are usually estimated with a large number of records and are characterized by rather regular patterns (Macciotta et al., 2010).Moreover, individual patterns are of interest for the estimation of the genetic merit of animals (Macciotta et al., 2005) via random regression test day models.
This study examined the properties of eleven selected models.The most popular one was first proposed by Wood (1967) to describe the lactation curve (WD).Wilmink (1987) proposed a combination of an exponential and linear term.The Wilmink model (WIL) can be reduced to a three-parameter model by setting the k exponent to a suitable fixed value.In 1987, Ali and Schaeffer suggested a polynomial regression model (AS) to fit the lactation curve; this model is also an important reference that has been used in numerous studies (Silvestre et al., 2006).Legendre orthogonal polynomials (LEG; Kirkpatrick et al., 1990) have been used to model a variety of curves, and some authors have reported their advantages in comparison with several parametric models (Silvestre et al., 2006;Macciotta et al., 2010;Steri et al., 2012).Splines are a type of segmented regression in which the curve is divided into several sections, joined at points called knots, each fitted with different polynomials; splines have been reported to provide extra flexibility in fitting lactation curves (White et al., 1999;Macciotta et al., 2010).
In order to use lactation curve functions in genetic evaluation, identification of the best fitting function is a prerequisite.The purpose of this study was to investigate the suitability of eleven mathematical models for describing the individual lactation curves of milk yield and fat and protein contents from test-day records of Tunisian Holstein-Friesian cows.

Material and methods
Data were supplied by the National Centre for Genetic Improvement (CNAG: Sidi Thabet, Tunis).A sample of 260 241 test days of 5649 cows, recorded in 188 herds from 1994 to 2002 was used.Lactation records with less than 10 consecutive testdays were not included.Likewise, test day yields recorded up to 5 days following calving and very low milk yield (< 3 kg) were omitted.The lactation period was limited to 5-305 days in milk (DIM) after calving.Table 1 displays the structure of the data (average test day milk yield (MY), and percentages of fat (FP) and protein (PP) in milk composition with parity for first, second, third, and greater.
Eleven mathematical models (Table 2) available in the literature were selected to fit the individual lactation curves of milk yield.The function of time P i (Table 2) was calculated using values published by Schaeffer (2004).Quadratic splines were used in this study with one (QSL1), two (QSL2), and three knots (QSL3), the number and locations of knots are very important in fitting splines.Actually, few procedures are available for determining them.In this study, knot positions were estimated by considering them as additional independent variables in a non-linear estimation procedure (Macciotta et al., 2010) and the position of knots N j was placed at 50, 100, and 200 DIM for cubic splines and at 40, 160, and 200 DIM for quadratic splines.The WD, WIL, and AS models were fitted by the Levenberg-Marquardt's iterative method in a nonlinear regression procedure of SAS (2001); linear regression was used to fit Legendre polynomials (LEG), cubic spline (CSPL) and quadratic spline (QSL).For the WIL model, the parameter k is connected to the time of lactation peak and usually assumes a fixed value derived from a preliminary analysis made on average production (Wilmink, 1987).Goodness of fit was assessed by considering various criteria.The adjusted R square (R 2 adj ) measures how successful the fit is in explaining the variation of the data.The error in absolute terms (RES) is computed from the difference between the predicted and observed yields per test day and refers to the capability of a model to estimate the daily yield on a specific test day (Guo and Swalve, 1995).The correlation between the measured and estimated test day yield (R) enumerates the degree of association between observed and estimated values (Guo and Swalve, 1995).The root means square error (RMSE) is a frequently used measure of the difference between values predicted by a model and the values actually observed.The standard error of the individual predicted values (STDIN) specifies a variable that contains the standard error of the individual predicted yield.The quotient (Q) between the sum of square errors and the observed sum of squares highlights the similarity between the measured lactation curve and the estimated lactation curve (Ali and Schaeffer, 1987).The analysis of residuals is a common technique in the evaluation and comparison of models, in which a residual for a given record of daily yield is the difference between the observed value and the value predicted by the regression equation (Silvestre et al., 2006); the residual distributions were obtained for each model and plotted against days in milk using the SAS GPLOT procedure (SAS, 2001).

Results
Table 3 shows the percentage of individual lactation curves within each class of adjusted R-squared (C1 to C5) obtained by fitting the different models for all milk traits.All models fit better MY and PP than FP and a difference in the proportion of individual curves was observed between models and traits.The three-parameter models (WIL and WD) and the fourth parameter model (LEG3) presented the lowest percent of curves having an R 2 adj > 0.80 for MY, FP, and PP.In particular, the fir of the LEG3 model was very poor for FP and had only about 8% of individual curves with an R 2 adj higher than 0.80.The fits of models characterized by a high number of parameters show the greatest proportion of individual lactations with R 2 adj > 0.80.An increase in the number of curves showing an R 2 adj > 0.80 was observed with the Legendre orthogonal polynomial and quadratic spline models as the order or the number of knots increased.For example, the percentage of C5 class increased from about 29% (LEG3) to 47% (LEG6) and an increase of 10% was observed for QSL between one and three knots.
Table 4 shows selection criteria for the various models fitted to individual lactation curves for MY, FP, and PP.For the comparison criterion RMSE, AS ranked as the best model to fit MY by obtaining the smallest values, followed by the QSL3, CSPL, and LEG6 models.In contrast to this, the three-parameter models gave a higher magnitude of the error illustrated by large values of RMSE (2.80-2.89).All models predicted the average milk yield with RESmargin of < 2 kg.The AS model predicted yield with an absolute error higher than LEG6 and QSL3.In addition, the QSL3 presented the lowest values of STDIN.For FP and PP results, AS and LEG5 produced similar values of RMSE and STDIN, but LEG5 was superior in terms of the difference between observed and predicted values (RES).However, on the basis of this criterion, QSL3 and LEG6 showed better fitting performances for FP and PP.Moreover, the WD model was very poor, especially for PP.The Q parameter showed a significant difference between models for all milk traits and the lowest value was obtained with LEG6, followed by QSL3.The correlation between observed and estimated yield (R) ranged from 0.94 to 0.97, 0.75 to 0.92, and 0.80 to 0.94 for MY, FP, and PP, respectively.The correlation values (R) for milk yield were high for all models (R > 0.94).Similar results were obtained by Otwinowska-Mindur et al. (2013) for Polish Holstein-Friesian cows.Estimated yields (expressed as a percentage for fat and protein) were better correlated for MY and PP than for FP.For MY the AS, LEG6, and QSL3 models showed better performance than the other models that quantified a higher degree of association between observed and estimated yield.In this direction, the AS model was less correlated, especially for FP (R = 0.87) and gave a moderate correlation (R = 0.90) for PP.
Average residual distributions were obtained for each model and plotted against days in milk in the range from 5 to 305 DIM, to determine the pattern of bias in predicting milk yields or contents.Selected residual distributions are presented in Figure 1 as an example of models giving the worst fit (example chosen for the WD model) and for QSL3 and LEG6, chosen among models giving the best performance by considering statistical criterion results.For MY, the distribution of residual for model WD (Figure 1A) fluctuated throughout the  2 and 3; RMSE -the root means square error, RES -the error in absolute term (defined as the absolute values of the difference between the predicted and observed test-day yields), STDIN -the standard error of the individual predicted values, Q -the quotient between the sum of square errors and the observed sum of squares, R -the correlation between the real and estimated values lactation between positive and negative values, it was negative at the beginning of lactation and from about 270 DIM until the end of lactation.A high positive residual variance was observed between 30 and 70 DIM and also from 160 to 260 DIM.The residual pattern obtained for the WD model for MY is related to that reported by Olori et al. (1999).For FP and PP, a positive residual was observed during the initial phase; then residuals were distributed between negative and positive values on the horizontal axis throughout the rest of lactation, with a higher and more fluctuating magnitude for FP than for PP (Figure 1B and 1C).Residuals plots for other models were not included in this paper, but a similar pattern was observed with LEG3 and LEG4 and comparable results were obtained with the LEG5, WD, and WIL models for all milk traits, but a small difference was detected for the WIL model, with an accurate estimation of only the initial milk constituent (FP and PP).
Average values of residual along the lactation for QSL3 model tended to remain close to zero with a lower magnitude for MY and FP, and were very close to the horizontal axis, especially for PP and MY.
Lactation curves were fitted adequately by QSL3, particularly in the first lactation stage and around peak (from DIM 5 to 50 DIM).QSL3 was slightly better in the second part of the lactation trajectory, but estimation improved at the end from about 280 DIM for all traits.Similar trends in the residual distribution were obtained with the CSPL and AS models for all milk traits.Although the performance of LEG6 was among the best according to the statistical criteria results, the patterns found in Figure 1A for MY suggest that the distribution of the errors for LEG6 was not random during lactation and was only close to zero at the end of lactation.While for FP and PP a similar pattern was observed for models QSL3 and LEG6.

Discussion
The coefficient of determination adjusted by the number of parameters (R 2 adj ) is equal to the percentage of the variance of daily yield explained by the model.Not all models had the same efficiency in the fitting of the MY and its compositions, the difference in proportions between the number of curves with R 2 adj > 0.80 was the highest between MY and FP and ranged from 10.42 points (AS) to 23.52 points (QSL3), similar results were noted by Silvestre et al. (2009).Poor fitting performance of individual lactation curves for FP was indicated by Quinn et al., 2006;Silvestre et al., 2009;Steri el al., 2012.In addition, the large relative frequencies of MY curves showing an R 2 adj > 0.80 are in agreement with previous results reported in dairy cattle (Macciotta et al., 2005;Silvestre et al., 2009;Macciotta et al., 2010;Steri et al., 2012).The R 2 adj classification revealed that models with a higher number of parameters (AS, LEG5 and LEG6) and spline models (CSPL, QSL2 and QSL3) showed better fitting performances for MY.However, AS and LEG5 models had very similar performances for MY, in agreement with Steri et al. (2012).The relative frequencies of R 2 adj more than 0.80 for MY were the highest for LEG6 and QSL3.Macciotta et al. (2010) reported an increase in the number of curves showing an R 2 adj > 0.80 with the quadratic and cubic splines in comparison with AS and fourth-order Legendre orthogonal polynomials for MY.For FP and PP, the AS, LEG6, and QSL3 models gave the best fit to individual lactation curves.Moreover, the LEG6 and QSL3 models presented the same relative frequencies of R 2 adj more than 0.80.Steri et al. (2012) illustrated that QSL3 and CSPL showed the best fit for all milk traits, based on the percentage of individual curves having an R 2 adj greater than 0.70.Generally, for all milk traits the goodness of fit increased with the number of function parameters describing the curve, in agreement with earlier suggestions (Kirkpatrick et al., 1990;Meyer, 2005).The high-order orthogonal Legendre polynomial (LEG6) and the quadratic spline with three knots (QSL3) presented similar performance with respect to the RES, STDIN and R parameters for MY.LEG6, recording the lowest Q parameter, indicated a closer similarity between the observed and estimated values.For FP and PP, LEG6 and QSL3 presented exactly the same values of RES and STDIN and almost similar values of the RMSE and R parameters, and presented the same result as MY with respect to the Q parameter.For MY, a general reduction of performance of fit was found with LEG6 in relation to the residual distribution for MY, because the prediction was underestimated at the beginning of lactation and overestimated around the peak yield.Pool et al. (2000) noted similar results in which variance predictions at the extremes of the lactation trajectory were still overestimated, and residuals showed a systematic pattern over the lactation period when the genetic and permanent environmental components were both modelled by a polynomial regression.The QSL3 and the AS models showed a random distribution of residuals for all milk traits.However, Druet et al. (2003) noted that the Legendre polynomials had an increased bias at the beginning and the end of lactation and showed waves in the middle of the lactation.The statistical parameter results obtained in this study for individual lactations indicated that the best performance obtained by a spline model with three knots is close to that of a high-order polynomial function.This may be explained by the suitability of spline models.However, splines play a role similar to higher order polynomials and spline curves are constructed from pieces of low-degree polynomials, joined at knots.
Non-parametric models (QSL3 and CSPL) showed better fitting performance compared with parametric models (WD and WIL), particularly in the first part of lactation and around the peak, WD compared with QSL3 (Figure 1), which is in agreement with some reports (Silvestre et al., 2006;Macciotta et al., 2010;Steri et al., 2012).This problem was also observed with polynomial models only for MY (Figure 1A) but with a lower magnitude.The difficulty for the parametric function to adequately model the peak yield can be explained by the high degree of correlation between its parameters.This is not the case for a Legendre polynomial function because of the orthogonality propriety of polynomials (Steri et al., 2012).The former models (parametric models) were unable to properly model the peak of the lactation curve.Poor fits noted for this part reached around 70 DIM (around peak yield) and were reported in several studies using parametric models (Olori et al., 1999;Silvestre et al., 2006;Macciotta et al., 2011).We clearly observed this disadvantage in this work with the WD, WIL, and orthogonal polynomials.
Variation in individual fit of the various models was observed among animals and among models.The variation in fit among animals, illustrated mainly by the Q values (Silvestre et al., 2006), suggests that the suitability of the considered model depends on the mathematical form of the model and on individual trends in TD milk yield and its composition, which vary between animal and lactation (as an indication Q WD = 7.25; Q LEG6 = 1.97 and Q WD = 1.68;Q LEG6 = 0.23 respectively for fat and protein contents).These results are similar to previous findings by Silvestre et al. (2006).In addition, Olori et al. (1999) used weekly averages of daily milk yield of 488 first-lactation cows recorded in a single herd to investigate the fit of five models and concluded that there was little difference between the models in fitted average herd lactation and that the variation in fit was almost entirely between cows.Guo and Swalve (1995) evaluated the goodness of fit of 16 lactation curve models for milk yield using data collected from three farms and reported that differences between cows is of great importance.For fat and protein contents, Quinn et al. (2006) concluded that the investigated models were similar in term of goodness of fit using the whole data set.
The results of the present study showed that the most adequate functions for modelling the individual lactation curves for all milk traits in the Tunisian population of Holstein-Friesians were LEG 6, AS, QSL3, and CSPL, with an advantage for the quadratic spline with three knots.The advantage of the spline can be explained by its flexibility.However, regression splines offer the possibility of using a number and placement of knots along the lactation trajectory (White et al., 1999;Meyer, 2005).In addition, regression splines have advantages for modelling lactation curves, such as limited number of required parameters, good flexibility, smoothness and limited sensitivity to data (Druet et al., 2003).

Conclusions
The present study used test day data to fit individual lactation curves.Eleven mathematical models were compared for accuracy of modelling lactation curves for milk yield and contents.The results indicate that higher-order Legendre orthogonal polynomials (LEG6), cubic and quadratic splines with three knots are more flexible and suitable to fit individual lactation curves for milk yield and its composition in the Tunisian population of Holstein-Friesians, although the goodness of fit shows a certain superiority of the quadratic splines with 3 knots (QSL3) model.The Ali and Schaeffer model also presents a respectable suitability, especially compared with parametric models and low-order polynomial models.

Figure 1A .
Figure 1A.Distribution of milk average error (residuals) for WD, QSL3 and LEG6 models for milk yield Figure 1B.Distribution of milk average error (residuals) for WD, QSL3 and LEG6 models for fat content

Figure 1C .
Figure 1C.Distribution of milk average error (residuals) for WD, QSL3 and LEG6 models for protein content

Table 1 .
Description of data used in this study based on mean and standard deviations for milk yield, fat and protein percentage with parity

Table 3 .
Relative frequencies of fit for individual curves for milk yield, fat and protein percentage among five classes of adjusted R 2

Table 4 .
Selection criteria for comparison of lactation models fitted with test-day records for milk yield, fat and protein content 1-11 see Tables