index.Rmd
---
title: Backtest Overfitting | Translated in R
author: TimelyPortfolio
license: by-nc-sa
widgets: [mathjax, bootstrap]
github: {user: timelyportfolio, repo: research_lopezdePrado}
mode: standalone
hitheme: solarized_light
assets:
css:
- "http://fonts.googleapis.com/css?family=Raleway:300"
- "http://fonts.googleapis.com/css?family=Oxygen"
--- dt:10
<style>
iframe{
height:450px;
width:900px;
margin:auto auto;
}
body{
font-family: 'Oxygen', sans-serif;
}
h1,h2,h3,h4 {
font-family: 'Raleway', sans-serif;
}
h3 {
background-color: #D4DAEC;
text-indent: 100px;
}
h4 {
text-indent: 100px;
}
iframe {height: 300px; width: 900px}
</style>
```{r message = F, warning = F, error = F, echo = F, tidy = F, cache = F}
require(knitr)
opts_chunk$set(
message = F,
warning = F,
error = F,
tidy = F,
cache = F,
fig.width = 11,
fig.height = 5
)
```
### Original Paper
<br>
<address>
<strong style="lineheight:40px;">Pseudo-Mathematics and Financial Charlatanism:</strong><p style="lineheight:40px;">The Effects of Backtest Overfitting on Out-of-Sample Performance</p>
<p class="muted" style="line-height:26px;">Bailey, David H. and Borwein, Jonathan M. and Lopez de Prado, Marcos and Zhu, Qiji Jim<br>
October 7, 2013<br>
Available at SSRN: <a href="http://ssrn.com/abstract=2308659">http://ssrn.com/abstract=2308659</a>
</p>
</address>
---
### Constants - R Equivalents
```{r}
# these are the constants referenced
# Euler–Mascheroni's constant
-digamma(1)
# euler constant
exp(1)
```
---
### Expected Maximum Sharpe $E[max_N]$
$$
\begin{aligned}
(1-\gamma)Z^{-1}\left[1-\frac{1}{N}\right]+\gamma Z^{-1}\left[1-\frac{1}{N}e^{-1}\right] \end{aligned}
$$
<h4>R Translation</h4>
```{r}
### Eq. 4
emax <- function(N) {
( ( 1 + digamma(1) ) * qnorm( 1 - 1/N ) ) +
(-digamma(1) * qnorm( 1 - (1/N) * exp(-1)))
}
emax( N = 10 ) # should be about = 1.57 to match paper
```
---
### Plot $E[max_N]$
```{r}
#implement upper bound
upbound <- function(N) {sqrt(2*log(N))}
#make a ugly plot for reasonableness check
curve(upbound, from = 1, to = 1000, col = "red", lty = 2)
curve(emax, from = 1, to = 1000, add=TRUE)
grid()
```
---
### rChart $E[max_N]$
```{r results = "asis"}
require(rCharts)
df <- data.frame(
list(x=c(1,2:1000), y=c(0,emax(2:1000))))
d1 <- dPlot( y ~ x, groups = "x", data = df, type = "line", height = 270, width = 800)
d1$xAxis(type = "addMeasureAxis",orderBy = "x",outputFormat = ",0.0f")
d1$yAxis( outputFormat = ".2f")
d1
```
---
### Check My Math with *Eq. 6*
```{r}
### Try next example for Eq. 6
# if y = 5
# so solve for annualized Sharpe of 1
# says no more than 45 N should be tried
# first just do this to make sure I understand
N = 45
y = 5
emax( N ) * y^-0.5 #seems like on the right path
```
---
### Minimum Backtest Length $MinBTL$
$$
\begin{aligned}
\left(\frac{(1-\gamma)Z^{-1}\left[1-\frac{1}{N}\right]+\gamma Z^{-1}\left[1-\frac{1}{N}e^{-1}\right]}{\overline{E[max_N]}}\right)^2 \end{aligned}
$$
<h4> R Translation </h4>
```{r}
#use emax from earlier for numerator
minBTL <- function( N, eMaxSharpe = 1 ) {
(emax(N) / eMaxSharpe) ^ 2
}
#then this should equal 5 if correct
minBTL( N = 45, eMaxSharpe = 1 )
```
---
### Plot $MinBTL$
```{r}
#make ugly plot for a reasonableness check
curve( minBTL, from = 1, to = 1000)
```
---
### rChart $MinBTL$
```{r results = 'asis'}
df <- data.frame(
list(x=c(1,2:1000), minBTL=c(0,minBTL(2:1000))))
n1 <- nPlot( minBTL ~ x, data = df, type = "lineChart", height = 270, width = 800)
n1$yAxis( tickFormat = "#!d3.format(',.2f')!#")
n1$chart( useInteractiveGuideline = TRUE )
n1
```
---
### Thanks
- Ramnath Vaidyanathan
- Bailey, David H. and Borwein, Jonathan M. and Lopez de Prado, Marcos and Zhu, Qiji Jim
index.md
---
title: Backtest Overfitting | Translated in R
author: TimelyPortfolio
license: by-nc-sa
widgets: [mathjax, bootstrap]
github: {user: timelyportfolio, repo: research_lopezdePrado}
mode: standalone
hitheme: solarized_light
assets:
css:
- "http://fonts.googleapis.com/css?family=Raleway:300"
- "http://fonts.googleapis.com/css?family=Oxygen"
--- dt:10
<style>
iframe{
height:450px;
width:900px;
margin:auto auto;
}
body{
font-family: 'Oxygen', sans-serif;
}
h1,h2,h3,h4 {
font-family: 'Raleway', sans-serif;
}
h3 {
background-color: #D4DAEC;
text-indent: 100px;
}
h4 {
text-indent: 100px;
}
iframe {height: 300px; width: 900px}
</style>
### Original Paper
<br>
<address>
<strong style="lineheight:40px;">Pseudo-Mathematics and Financial Charlatanism:</strong><p style="lineheight:40px;">The Effects of Backtest Overfitting on Out-of-Sample Performance</p>
<p class="muted" style="line-height:26px;">Bailey, David H. and Borwein, Jonathan M. and Lopez de Prado, Marcos and Zhu, Qiji Jim<br>
October 7, 2013<br>
Available at SSRN: <a href="http://ssrn.com/abstract=2308659">http://ssrn.com/abstract=2308659</a>
</p>
</address>
---
### Constants - R Equivalents
```r
# these are the constants referenced
# Euler–Mascheroni's constant
-digamma(1)
```
```
## [1] 0.5772
```
```r
# euler constant
exp(1)
```
```
## [1] 2.718
```
---
### Expected Maximum Sharpe $E[max_N]$
$$
\begin{aligned}
(1-\gamma)Z^{-1}\left[1-\frac{1}{N}\right]+\gamma Z^{-1}\left[1-\frac{1}{N}e^{-1}\right] \end{aligned}
$$
<h4>R Translation</h4>
```r
### Eq. 4
emax <- function(N) {
( ( 1 + digamma(1) ) * qnorm( 1 - 1/N ) ) +
(-digamma(1) * qnorm( 1 - (1/N) * exp(-1)))
}
emax( N = 10 ) # should be about = 1.57 to match paper
```
```
## [1] 1.575
```
---
### Plot $E[max_N]$
```r
#implement upper bound
upbound <- function(N) {sqrt(2*log(N))}
#make a ugly plot for reasonableness check
curve(upbound, from = 1, to = 1000, col = "red", lty = 2)
curve(emax, from = 1, to = 1000, add=TRUE)
grid()
```

---
### rChart $E[max_N]$
```r
require(rCharts)
df <- data.frame(
list(x=c(1,2:1000), y=c(0,emax(2:1000))))
d1 <- dPlot( y ~ x, groups = "x", data = df, type = "line", height = 270, width = 800)
d1$xAxis(type = "addMeasureAxis",orderBy = "x",outputFormat = ",0.0f")
d1$yAxis( outputFormat = ".2f")
d1
```
<iframe src=assets/fig/unnamed-chunk-5.html seamless></iframe>
---
### Check My Math with *Eq. 6*
```r
### Try next example for Eq. 6
# if y = 5
# so solve for annualized Sharpe of 1
# says no more than 45 N should be tried
# first just do this to make sure I understand
N = 45
y = 5
emax( N ) * y^-0.5 #seems like on the right path
```
```
## [1] 0.9998
```
---
### Minimum Backtest Length $MinBTL$
$$
\begin{aligned}
\left(\frac{(1-\gamma)Z^{-1}\left[1-\frac{1}{N}\right]+\gamma Z^{-1}\left[1-\frac{1}{N}e^{-1}\right]}{\overline{E[max_N]}}\right)^2 \end{aligned}
$$
<h4> R Translation </h4>
```r
#use emax from earlier for numerator
minBTL <- function( N, eMaxSharpe = 1 ) {
(emax(N) / eMaxSharpe) ^ 2
}
#then this should equal 5 if correct
minBTL( N = 45, eMaxSharpe = 1 )
```
```
## [1] 4.998
```
---
### Plot $MinBTL$
```r
#make ugly plot for a reasonableness check
curve( minBTL, from = 1, to = 1000)
```

---
### rChart $MinBTL$
```r
df <- data.frame(
list(x=c(1,2:1000), minBTL=c(0,minBTL(2:1000))))
n1 <- nPlot( minBTL ~ x, data = df, type = "lineChart", height = 270, width = 800)
n1$yAxis( tickFormat = "#!d3.format(',.2f')!#")
n1$chart( useInteractiveGuideline = TRUE )
n1
```
<iframe src=assets/fig/unnamed-chunk-9.html seamless></iframe>
---
### Thanks
- Ramnath Vaidyanathan
- Bailey, David H. and Borwein, Jonathan M. and Lopez de Prado, Marcos and Zhu, Qiji Jim