Model Selection and Evaluation

Numerical and statistical evaluation

`OFV`

Objective Function Value
minimising the -2LL
lower OFV indicates a better fit
likelihood ratio test [📖]
- for nested models (complex models which can be collapsed to the simpler one)
- not useful to compare distinct models
- \(χ^2\) distribution
- null hypothesis: no difference between models
- hypothesis: difference between models
- significance level (α) of 0.01:
  - OFV of 6.63 (degrees of freedom=1 or an increase of 1 parameter)
  - OFV of 9.21 (degrees of freedom=2 or an increase of 2 parameters)
  - etc.

\[OFV_{ELS} = \sum_{i=1}^{n} \left[ \frac{(y_i - \hat{y}_i)^2}{\text{var}(y_i)} + \ln(\text{var}(y_i)) \right] \]

\(n\) is the number of observations;
\(y_i\) is the observed value for the \(i^{th}\) observation;
\(\hat{y}_i\) is the predicted/expected value for the \(i^{th}\) observation;
\(\text{var}(y_i)\) is the variance of the \(i^{th}\) observation.

Degrees of Freedom	α=0.05	α=0.01	α=0.001
1	3.841	6.635	10.828
2	5.991	9.210	13.816
3	7.815	11.345	16.266
4	9.488	13.277	18.467
5	11.070	15.086	20.515
6	12.592	16.812	22.458
7	14.067	18.475	24.322
8	15.507	20.090	26.125
9	16.919	21.666	27.877
10	18.307	23.209	29.588

`AIC`

Akaike Information Criterion
penalties are applied as a function of increased number of parameters
[📖] [📖]
- not nested
- lower AIC indicates the better fit

\[AIC=OFV+2 \cdot p\]

`BIC`

Bayesian Information Criterion
penalties are applied as a function of increased number of parameters [📖] [📖]
- not nested
- lower BIC indicates the better fit

\[ \text{BIC}_{\text{mixed}} = \text{OFV} + p_{\text{random}} \cdot \ln(n_{\text{id}}) + p_{\text{fixed}} \cdot \ln(n_{\text{obs}}) \]

\(\text{OFV} = -2 \times \ln(L)\) - Objective Function Value, where \(L\) is the likelihood of the model;
\(p_{\text{random}}\) - number of random effects parameters;
\(p_{\text{fixed}}\) - number of fixed effects parameters;
\(n_{\text{id}}\) - number of unique individuals (clusters);
\(n_{\text{obs}}\) - number of observations.

Graphical evaluation

models can be evaluated by visual examination of data plots [📖]

`GOF`

Standard Goodness of Fit
comparison of the predicted versus the observed concentrations
observations should be scattered evenly around the line of identity
CWRES (conditional weighted residuals)
- adjusted based on the FOCE approximation [📖]
- should be [📖]
  - close to zero (± 2 SD)
  - randomly scattered around zero
CWRES vs. population predictions
- identification of concentration-dependencies
- to assess appropriateness of the RUV model
CWRES vs. time
- identification of time-dependencies
- specification appears in the absorption or the elimination phase.

`VPC`

Visual Predictive Check
model diagnostics [📖]
- constructing simulated data using the developed model and
- comparing it with the existing dataset
simulation-based graphical evaluation
to evaluate predictive performance of a model
the ability of a model to reproduce the observed data
procedure [📖]
- The percentiles of interest (commonly 5th,50th and 95th)
- the confidence interval of respective percentiles for the simulated concentrations
- compared graphically with the same percentiles of the observed concentrations
- derived for selected time ranges (bins)
- not at every time to ease the comparison
percentiles of the simulated and observed data are compared graphically [📖]
Categorical VPC is a useful tool to evaluate performance for categorical data [📖]

Evaluation of uncertainty in parameter estimates

variance-covariance matrix
- generated in NONMEM
- standard errors of the parameter estimates
  - the square root of the diagonal elements in variance-covariance matrix
%RSE relative standard error
- to evaluate parameter precision for fixed-effects
- <30% are acceptable [📖]

\[RSE(\theta) = 100 \cdot \frac{SE(\theta)}{\theta}\]

\(θ\) the final population parameter;
\(SE(θ)\) standard error of the population parameter.

%RSE for random-effects parameters
- 40-50% is acceptable [📖]

\[RSE(\omega^2) = 100 \cdot \frac{SE(\omega^2)}{2 \cdot \omega^2}\]

\(\omega^2\) final variance;
\(SE(\omega^2)\) standard error of final variance.

Bootstrap method [📖]
- generated from the original dataset
- sampling individuals with replacement
- new parameter estimates are generated
- derived confidence interval (e.g. 95% CI)
- \(>200\) datasets may be needed to generate the standard errors
- can be generated using PsN software
Log-Likelihood profiling
- to assess if the OFV from the final model refers to the global minimum
- surface of the likelihood between the full and reduced model
  - re-estimation by fixing the respective parameter to a slightly different estimate (e.g. ±5% or ±20%)
  - until the selected significant difference in likelihood (e.g. ΔOFV: 3.84, df=1, α=0.05) is achieved
  - the lower and upper boarder of the 95% confidence interval for the parameter has been reached [📖] [📖]

Influential Individuals

may have a large impact on
- model selection
- parameter estimates
Comparison of individual OFV in the NONMEM output
Case-deletion diagnostics
- new datasets where one individual has been removed
- influential individual if
  - a relative change in parameter estimates of ±20% [📖]

Simulations

Deterministic
- do not consider the random-effects parameters of the model
- generating the typical concentration-time profile for a given set of covariates
- useful to visualise and assess which impact changes in dose will have on e.g. exposure
Stochastic
- consider the random-effects parameters
- used when generating VPCs
- require appropriate precision of all parameters
- to guide dose selection
- to compare different dosing scenarios