The value at risk is a statistical risk management technique that monitors and quantifies the risk level associated with an investment portfolio. The value at risk measures the maximum amount of loss over a specified time horizon with a given confidence level. Backtesting measures the accuracy of the value at risk calculations. The loss forecast calculated by the value at risk is compared with actual losses at the end of the specified time horizon.
Backtesting is a technique for simulating a model or strategy on past data to gauge its accuracy and effectiveness. Backtesting in value at risk is used to compare the predicted losses from the calculated value at risk with the actual losses realized at the end of the specified time horizon. This comparison identifies the periods where the value at risk is underestimated or where the portfolio losses are greater than the original expected value at risk. The value at risk predictions can be recalculated if the backtesting values are not accurate, thereby reducing the risk of unexpected losses.
Value at risk calculates the potential maximum losses over a specified time horizon with a certain degree of confidence. For example, the one-year value at risk of an investment portfolio is $10 million with a confidence level of 95%. The value at risk indicates that there is a 5% chance of having losses that exceed $10 million at the end of the year. With 95% confidence, the worst expected portfolio loss over one trading year will not exceed $10 million.
If the value at risk is simulated over the past yearly data and the actual portfolio losses have not exceeded the expected value at risk losses, then the calculated value at risk is an appropriate measure. On the other hand, if the actual portfolio losses exceed the calculated value at risk losses, then the expected value at risk calculation may not be accurate.
When the actual portfolio losses are greater than the calculated value at risk estimated loss, it is known as a breach of value at risk. However, if the actual portfolio loss is above the estimated value at risk only a few times, it doesn't mean that the estimated value at risk has failed. The frequency of breaches needs to be determined.
For example, the daily value at risk of an investment portfolio is $500,000 with a 95% confidence level for 250 days. At the 95% confidence level, the actual losses are expected to breach $500,000 approximately 13 days out of 250 days. There is only a problem with the value at risk estimates when breaches occur more than 13 days out of 250 days; this signals the value at risk estimate is inaccurate and needs to be re-evaluated.
14.3 Backtesting With Coverage Tests
Even before J.P. Morgan’s RiskMetrics Technical Document described a graphical backtest, the concept of backtesting was familiar, at least within institutions then using value-at-risk. Two years earlier, the Group of 30 (1993) had recommended, and one month earlier the Basel Committee (1995) had also recommended, that institutions apply some form of backtesting to their value-at-risk results. Neither specified a methodology. In September 1995, Crnkovic and Drachman circulated to clients of J.P. Morgan a draft paper describing a distribution test and an independence test, which they published the next year. The first published statistical backtests were coverage tests of Paul Kupiec (1995). In 1996, the Basel Committee published its “traffic light” backtest.
14.3.1 A Recommended Standard Coverage Test
Consider a q quantile-of-loss value-at-risk measure and define a univariate exceedance process I with terms
To conduct a coverage test, we gather historical exceedance data – αi, – α +1i, … , 0i. We assume the tI are IID, which allows us to treat our data as a realization i, … , i[α – 1], i[α] of a sample I , … , I [α – 1], I [α].
We define the coverageq* of the value-at-risk measure as the actual frequency with which it experiences exceedances (i.e. instances of ti = 1). This can be expressed as an unconditional expectation:
Coverage tests are hypothesis tests with the null hypothesis that q = q*. Let x denote the number of exceedances observed in the data:
We treat x as a realization of a binomial random variable X. Our null hypothesis is then simply that X ~ B(α + 1, 1 – q). To test at some significance level ε, we must determine values x1 and x2 such that
Multiple intervals [x1, x2] will satisfy this criteria, so we seek a solution that is generally symmetric in the sense that Pr(X < x1) ≈ Pr(x2 < X) ≈ ε/2.
Formally, define a as the maximum integer such that Pr(X < a) ≤ ε/2 and b as the minimum integer such that Pr(b < X) ≤ ε/2. Consider all intervals of the form [a + n, b] or [a, b – n] where n is a non-negative integer. Set [x1, x2] equal to whichever of these maximizes Pr(X ∉ [x1, x2]) subject to the constraint that Pr(X ∉ [x1, x2]) ≤ ε.2 Our backtest procedure is then to observe the value-at-risk measure’s performance for α + 1 periods and record the number of exceedances X. If X ∉ [x1, x2], we reject the value-at-risk measure at the ε significance level.
Suppose we implement a one-day 95% value-at-risk measure and plan to backtest it at the .05 significance level after 500 trading days (about two years). Then q = 0.95 and α + 1 = 500. Assuming , we know X ~ B(500, .05). We use this distribution to determine x1 = 15 and x2 = 36. Calculations are summarized in Exhibit 14.2. We will reject the value-at-risk measure if X ∉ [16, 35].
Exhibit 14.2: Calculations to determine the non-rejection interval for our recommended standard coverage test when ε = .05, α + 1 = 500 and q = 0.95.
Exhibit 14.3 indicates similar .05 significance level non-rejection intervals [x1, x2] for other values of q and α + 1.
Exhibit 14.3: Recommended standard coverage test non-rejection intervals [x1, x2] for various values of q and α+1. The value-at-risk measure is rejected at the .05 significance level if the number of exceedances X is less than x1 or greater than x2.
14.3.2 Kupiec’s PF Coverage Test
Kupiec’s “proportion of failures” (PF) coverage test takes a circuitous—and approximate—route to an answer, offering no particular advantage over our recommended standard coverage test. Comparing the two tests can be informative, illustrating the various respects in which test designs may differ. As the first published backtesting methodology, the PF test has been widely cited.
As with the recommended standard test, a value-at-risk measure is observed for α + 1 periods, experiencing X exceedances. We adopt the same null hypothesis that q = q*. Rather than directly calculate probabilities from the B(α + 1, 1 – q) distribution of X under , the PF test uses that distribution to construct a likelihood ratio:
It is difficult to infer probabilities with this. As described in Section 4.5.4 a standard technique is to consider –2 log(Λ).
which is—see Lehmann and Romano (2005)—approximately centrally chi-squared with one degree of freedom. That is –2log(Λ) ~ χ2(1,0), assuming . Kupiec found this approximation to be reasonable based on a Monte Carlo analysis, but Lopez (1999) claims to have found “meaningful” discrepancies using his own Monte Carlo analysis.
For a given significance level ε, we construct a non-rejection interval [x1, x2] such that
under . To do so, calculate the ε quantile of the χ2(1,0) distribution. Setting this equal to [14.7], solve for X. There will be two solutions. Rounding the lower one down and the higher one up yields x1 and x2.3
Consider the example we looked at with the recommended standard coverage test. We implement a one-day 95% value-at-risk measure and plan to backtest it at the .05 significance level after 500 trading days, so q = 0.95 and α + 1 = 500. We calculate the ε = .05 quantile of the χ2(1,0) distribution as 3.841. Setting this equal to [14.7], we solve for X. There are two solutions: 16.05 and 35.11. Rounding down and up, respectively, we set x1 = 16 and x2 = 36. We will reject the value-at-risk measure if X ∉ [16, 36].
Exhibit 14.4 indicates similar .05 significance level non-rejection intervals [x1, x2] for other values of q and α + 1.
Exhibit 14.4: PF coverage test non-rejection intervals [x1, x2] for various values of q and α+1. The value-at-risk measure is rejected at the .05 significance level if the number of exceedances X is less than x1 or greater than x2.
14.3.3 The Basel Committee’s Traffic Light Coverage Test
The 1996 Amendment to the Basel Accord imposed a capital charge on banks for market risk. It allowed banks to use their own proprietary value-at-risk measures to calculate the amount. Use of a proprietary measure required approval of regulators. A bank would have to have an independent risk management function and satisfy regulators that it was following acceptable risk management practices. Regulators would also need to be satisfied that the proprietary value-at-risk measure was sound.
Proprietary measures had to support a 10-day 99% value-at-risk metric, but as a practical matter, banks were allowed to calculate 1-day 99%value-at-risk and scale the result by the square root of 10.
The Basel Committee (1996b) specified a methodology for backtesting proprietary value-at-risk measures. Banks were to backtest their one-day 99%value-at-risk results (i.e. value-at-risk before scaling by the square root of 10) against daily P&L’s. It was left to national regulators whether backtesting was based on clean or dirty P&L’s. Backtests were to be performed quarterly using the most recent 250 days of data. Based on the number of exceedances experienced during that period, the value-at-risk measure would be categorized as falling into one of three colored zones:
Exhibit 14.5: Basel Committee defined green, yellow and red zones for backtesting proprietary one-day 99% value-at-risk measures, assuming α + 1 = 250 daily observations. For banks whose value-at-risk measures fell in the yellow zone, the Basel Committee recommended that, at national regulators’ discretion, the multiplier k used to calculate market risk capital charges be increased above the base level 3, as indicated in the table. The committee required that the multiplier be increased to 4 if a value-at-risk measure fell in the red zone. Cumulative probabilities indicate the probability of achieving the indicated number of exceedances or less. They were calculated with a binomial distribution, assuming the null hypothesis q* = 0.99.
Value-at-risk measures falling in the green zone raised no particular concerns. Those falling in the yellow zone required monitoring. The Basel Committee recommended that, at national regulators’ discretion, value-at-risk results from yellow-zone value-at-risk measures be weighted more heavily in calculating banks’ capital charges for market risk—the recommended multipliers are indicated in Exhibit 14.5. Value-at-risk measures falling in the red zone had to be weighted more heavily and were presumed flawed—national regulators would investigate what caused so many exceedances and require that the value-at-risk measure be improved.
The Basel Committee’s procedure is based on no statistical theory for hypothesis testing. The three zones were justified as reasonable in light of the probabilities indicated in Table 14.5 (and probabilities assuming q* = 0.98, q* = 0.97, etc., which the committee also considered). Due to its ad hoc nature, the backtesting methodology is not theoretically interesting. It is important because of its wide use by banks.
Suppose we implement a one-day 90% value-at-risk measure and plan to backtest it with our recommended standard coverage test at the .05 significance level after 375 trading days (about eighteen months). Then q = 0.90 and α + 1 = 375. Calculate the non-rejection interval.
Suppose we want to apply Kupiec’s PF backtest to the same one-day 90% value-at-risk measure as in the previous exercise. Again, the significance level is .05, q = 0.90 and α + 1 = 375. Calculate the non-rejection interval. Compare your result with that of the previous exercise.