【课程】Applied Computational Finance
基于格拉斯哥ECON5065的笔记;
General information
Many problems in economics, finance and financial engineering do not possess a closed-form solution which could be computed easily. Typically, these problems are solved by using numerical algorithms and simulation methods. This course covers the study and implementation of such techniques. These include Monte Carlo simulation for pricing complex derivatives; finite difference methods to solve partial differential equations for option pricing; effective methods to compute hedging strategies; numerical dynamic programming for optimal portfolios, resource management and economic growth. Advanced understanding of these techniques and their implementation on computer using software packages and programming languages is the focus of this course.
Aims
The main aims of this course are to introduce students to core techniques in computational finance, such as simulation of asset prices, pricing options using stochastic models, Monte Carlo methods as applied to complex derivatives, solving Black-Scholes type equations numerically, solving dynamic optimization problems numerically, and how to implement these techniques using modelling software packages and programming languages.
ILOs
By the end of this course, students will be able to:
- Calculate the price and hedging portfolios of complex derivatives using Monte Carlo simulations;
- Calculate the price of American and other exotic options by implementing numerical methods to solve the related partial differential equations;
- Implement dynamic programming algorithms to solve problems in portfolio optimisation, resource management and monetary policy;
- Demonstrate advanced skills in modelling software package MATLAB.
Unit 1:Black-Scholes Model
Unit 1.1:P&L Approach to Black-Scholes Equation
- P&L:profit and loss
- T:maturity
- S:underlying asset price
- $F(S_T)$:payoff function
- $P(t, S)$:pricing function
Consider the P&L due to trading between time $t$ and $t + \delta t$
$\delta S$:change in the stock price, $\delta S$ independent of $\delta t$
$\Delta = \frac{dP}{dS}$
- theta:$\frac{dP}{dt}$
corrected theta:$\frac{dP}{dt}-rP+(r-q)S\frac{dP}{dS}$
gamma:$\frac{d^2P}{dS^2}$
- Dollar gamma:$S^2\frac{d^2P}{dS^2}$
Any model is unusable if $A(t, S)$ and $B(t, S)$ have the same sign. The model is usable only if $A(t, S)$ and $B(t, S)$ have different signs.
No money is made if
On average we have
where $\hat{\sigma}$ is the historical volatility of $S$ based on returns $\delta S/S$
For risk-management, our model should satisfy the condition
Thus,
where $P_{\sigma^2}$ accounts for dependence on the volatility level $\hat{\sigma}$
We then get,
For vanilla options in the market, $\hat{\sigma}$ is the implied volatility such that $P_{\sigma^2}$ is equal to the market price.
Unit 1.2:Stochastic Differential Equations
Recall a differential equation
This can be written alternatively as
or approximately as an difference equation with small time interval (increment) $\Delta t$
with the increments of Brownian motion $\Delta W = W(t+\Delta t)-W(t), \Delta W \sim N(0,\Delta t)$
stochastic differential equation
Unit 1.3:Ito Formula
Ito Formula 相当于 Taylor 展开的一种特殊形式,因为它是针对随机过程的,因此在二次导的展开项中直接略过了所有与 dt 相乘的项, 只留下 $F_{XX}$
与 dt 相乘的项都趋近 0 的原因是,对于 dtdt/dtdW,其中一个部分是作用于 $||\pi||\to 0$ 时,把累加部分相连的. $\sum f(x) (t_{i+1}-t_i)\to \int f(x) dt$. 剩下的那一部分由于 $||\pi||$ 极限趋近于 0,即两个 t 之间相隔极近,会使整个式子趋近 0. 比如 dtdt 为 $(dt)^2$, dtdW 为 $(dt)^{1.5}$, 去掉一个 dt 仍然有保留的幂,则整个式子取 0.
We are interested in the dynamics of an expression such as $f(t)=F(t,X(t))$ with $F:[0,\infty]\times \mathbb{R}\to \mathbb{R}$ .
- $F$:option pricing function
- $X(t)$:underlying share price
- $(dW)^2 = dt$
- up to order 1 in $dt$:$(dt)^2=0$, $(dW)(dt)=dt\sqrt{dt}=0$
- $(dX)^2=\sigma(t,X(t))^2dt$
Then, the stochastic process $f$ has the following dynamics
Suppose that the asset price $S$ solves the following stochastic differential equation
At $t = T$, we have
Delta Hedging
Black, Scholes, Merton approach : construct a riskless portfolio, which consists of selling the option and buying the asset
where
Substitution of $dS$ gives
riskless $\Rightarrow \Delta = P_S$
Substitution gives
Recall that
Combining the above results, leads to the Black-Scholes equation
with boundary condition
Under the Black-Scholes-Merton (BSM) model, European call option price is given as
European put option price is given as
Black-Scholes PDE can be transformed to Black-Scholes Pricing Formula through solving the heat equation
Unit 1.4:Greeks
Certain partial derivatives of option prices are widely used for hedging purposes and have been assigned Greek names and symbols :
Delta | Gamma | Rho | Theta | Vega |
---|---|---|---|---|
$\Delta$ | $\Gamma$ | $\rho$ | $\Theta$ | $vega$ |
$\frac{\partial C}{\partial S}$ | $\frac{\partial^2 C}{\partial S^2}$ | $\frac{\partial C}{\partial r}$ | $\frac{\partial C}{\partial t}$ | $\frac{\partial C}{\partial \sigma}$ |
Read more about those functions
Recall that
Unit 1.5:Implied Volatility
Suppose that the market price of a European call option with expiry $T$ and strike $K$ at time $t$ is observed to be $C^{mkt}(t)$
In our model
Now consider vega of the option given by
- It can be shown, that there exists a solution whenever $C^{mkt}(t)$ is contained in the open interval $((S_t-Ke^{-r(T-t)})^+,S_t)$
- The positivity of vega tells us that the function $C(t,T, σ,K) – C^{obs}(t)$ is strictly increasing as a function of $\sigma$
The solution of the equation, $\sigma^{impl}$ , is called implied volatility
We can therefore compute its zeros numerically by applying the Newton method (it always works if the function is convex)
Plot the Implied Volatility - Moneyness ( $\frac{ln(K/F)}{\sigma\sqrt{T}}$ ) and we could have volatility smile
然而实际上 BS model 服从的是 normal distribution ,并且只包函基本的参数。因此对于 BS model 来说 moneyness (in,at,out of money)是不影响 volatility的,即画出的 moneyness-volatility 是一条直线。
对于真正的市场,存在 fat tail 现象,除此之外流动性,手续费,最低报价间隔,以及最小报价单位等变量未包含在 BS model 中,导致了它的不准确。此外,投资者参与期权交易的目的对隐含波动率也有影响。股票期权交易者中有很大一部分的交易目的是防止股票下跌,因此他们选择的策略多是 buy put 或者 covered call ,这样导致波动率曲线向右下侧倾斜;而外汇期权的交易者的交易目的并不偏向两种方向,因此隐含波动率曲线多是smile的形状。
Unit 2:Advanced stochastic models and their calibration
Unit 2.1:local volatility
Local volatility is a class of market models which is an extension of the fixed volatility Black-Scholes-Merton model. In the model, instantaneous volatility is a function of t, S; $\sigma(t,S)$
Given the market observed European option prices, our aim is to find
the function $\sigma(t,S)$. It was shown by Bruno Dupire that
Unit 2.2:Derivation of the Dupire Formula
Suppose that we have the following dynamics
Next, recall that the price of a European call option is given as $C(T,K)=e^{-rT}\mathbb{E}[(S_T-K)^+]$
First, use Ito’s formula for $(S_T-K)^+$ over $[T,T + dT]$ :
We then get
Next, note that
where Heaviside function
and delta function
$\int_{-\infty}^\infty\delta(x-y)f(x)dx=f(y)$
Then,
To use the above formula, let us consider “undiscounted” option prices $\mathscr{C}(T,K)=\mathbb{E}[(S_T-K)^+]$. Taking derivative w.r.t. K
Note that $\mathbb{E}[\delta(S_T-K)]$ is the density of stock prices
$\mathbb{E}[\delta(S_T-K)]=\int_{-\infty}^\infty\delta(S_T-K)f(S_T)dx=f(K)$
From the identity
we get $\mathbb{E}[S_T\theta(S_T-K)]=\mathscr{C}+K\mathbb{E}[\theta(S_T-K)]=\mathscr{C}-K\frac{\partial\mathscr{C}}{\partial K}$
Next note that $\mathbb{E}[d(S_T-K)^+]=d\mathbb{E}[(S_T-K)^+]$
Then, by taking expectation in the Ito’s formula applied on $(S_T-K)^+$ with $\mathbb{E}[W_t]=0$
we evaluate the function at $S_t$ equal to $K$
Above yields
Going back to the notation of $C = e^{–rT}\mathscr{C}$ , we obtain the Dupire’se quation
Thus, we get the Dupire’s formula
For a known volatility function $\sigma(t, S)$, we can use the Dupire’s formula to price all European call options on $S_0$
with initial condition $C(T=0,K)=(S_0-K)^+$
Unit 2.3:Positivity of the right hand side of the Dupire formula
Consider the second derivative in the denominator
The right hand side can be thought of a Butterfly spread
As $\epsilon \to 0$, the butterfly spread (more narrow and more much higher) converges to a Dirac-delta function which is either 0 or strictly positive and thus, it’s price must be positive, thus $\frac{d^2C}{dK^2}>0$
The numerator in the Dupire’s formula can be re-written as
For the above to be positive, call option with a strike which is a fixed proportion $K$ of the forward $F_T=Se^{(r-q)T}$ must be an increasing function of $T$
For $T_1 \leq T_2$
In order to prove it, we suppose that the monotonicity condition is violated, that is
Set up the following strategy:
Buy one option of maturity $T_2$, strike $KF_{T_2}$
Sell $e^{–q(T2–T1)}$ options of maturity $T_1$, strike $KF_{T1}$
At inception, setting up the above strategy pays a premium ( > 0 )
At $T_1$, take the following $\Delta$ position on S:
- If $S_{T_1}<KF_{T1}$ : $\Delta =0$
- If $S_{T_1}>KF_{T1}$ : $\Delta =-1$
At $T_2$ , the P&L comprises of the following positions
- receipt of payoff from $T_2$ maturity option
- payment of payoff from $T_1$ maturity option, capitalized up to $T_2$
- P&L generated by the delta position created at $T_1$ and liquidated at $T_2$
For $S_{T_1}^*=\frac{F_{T_2}}{F_{T_1}}S_{T_1}$ , the P&L is written as
The last equation reads as
for $f(x)=(x-KF_{T_2})^+$
$f(y)-f(x)>(y-x)f(x) when f’’(x)>0$
Thus, our strategy produces positive amount at inception and at $T_2$. Therefore, there’s an arbitrage.
Thus, the numerator in the Dupire’s formula is always positive.
Unit 2.4 Further aspects of the Dupire formula and local volatility
To compute the Dupire’s local volatility function in terms of implied volatility $\hat\sigma(T,K)$, first write
Next, define
- forward price as $F_t=S_0e^{(r-q)(t-t_0)}$
- log strike as $y=ln(\frac{K}{F_t})$
- parameterized volatility as $f(t,y)=(t-t_0)\hat\sigma^2(t,K)$
Next, replace $C(S_0,T,K)$ in the Dupire’s formula with $C^{BS}(S_0,T,K,\hat\sigma(T,K))$ and compute derivatives using $f$ and $y$ rather than $\hat\sigma$and $K$
【Method 1】
这个方法需要求 $\frac{\partial f}{\partial K}$, 此处省略了, 计算比较复杂,以下$C$为$C^{BS}$
With
【Hint】
Inserting these equations into
【Method 2】
以下$c$ 为 $C^{BS}$
Step 1
按Method 1同样的方法, 但此时C考虑的是K与f不相关,与t相关
Since $K=S_0e^{(r-q)(t-t_0)}$ so that $\frac{\partial K}{\partial t}=K(r-q)=K\mu$. And $\frac{\partial y}{\partial K}=\frac{1}{K}$
we substitute into Equation
因为考虑的是forward,不用把C折现,所以不减去rC
We get
Step 2
By taking derivatives of the Black-Scholes formula, we obtain
这个部分我还没推过,但在More的第二个连接里有过程
and we have
Inserting these equations into
The local volatility function is then given as
As in the markets, the option prices are only available for discrete strikes and maturities, to use the Dupire’s formula, we need to interpolate between the available strikes and maturities, and extrapolate outside the market range.
Issues with Dupire’s Local Volatility Model
- The local volatility surface obtained by the Dupire’s formula is very sensitive to the method of data interpolation
- The only risky asset used is the underlying asset which does not account for the volatility risk (affected by random shocks)
- The calibrated local volatility surface is not stable with the passage of time (in practice the local volatility surface is is changing all the time)
- The dynamics of the volatility process assumed by the model does not correspond to that observed in the market
More
Derivation of Loval Volatility
Stochastic Volatility and Local Volatility
CEV Model
Constant Elasticity of Variance (CEV) model is given as
If $a$ less than one, then we have that non constant local volatility function
Under the risk-neutral measure, by assuming that the forward price is given as $F_t=S_te^{r(T-t)}$, the CEV model can be re-written as
The implied volatility is given by the asymptotic approximation
with $F_m=\frac{1}{2}(F_0+K)$
Trinomial Pricing Tree
The issue of non-recombining binomial tree is addressed by using a trinomial tree model where the prices are given by
For the trinomial tree to converge to a continuous process, we further impose
For a recombining tree (i.e. the number of nodes increase linearly and not exponentially), we need $u d=(1+\Delta t)^2$
The probabilities are given as
To ensure the absence of arbitrage in the model, we need that $p + q\leq 1$ whic is ensured if
Such that the above inequality is always satisfied, we choose
where $\sigma > \sigma(t, S)$ for all the nodes in the trinomial tree
By choosing $\bar\sigma$ sufficiently large, in the limit, we obtain the following model which does not admit arbitrage
Unit 2.5:Stochastic Volatility
For local volatility
The last part show that the randomness of volatility is totally depended on the randomness of stock. However, the empirical data we will find that the randomness in the stock and the randomness in the volatility differ from each other. So the volatility should affected by other sources of randomness.
The class of models where volatility is supposed to be a stochastic process is called stochastic volatility models
Unit 2.6:Pricing Equation
Incomplete Market
In an incomplete market, the risk from volatility (vega) cannot be eliminated by trading only in the underlying asset
We have two sources of randomness. However we only allow trading in the money market account. So the market is incomplete. We can make an incomplete market complete by adding additional assets (like option or future).
We create a self-financing portfolio using two risky assets $S$ and option $C^0$
The coefficients $a_t$, $b_t$ and $C^0$ are functions of ($t, S_t, \sigma$)
Often the volatility process will be chosen as a mean-reverting process
Suppose $V_t$ denotes the value of a self-financing portfolio with $\delta_t$ shares in $S$ and $\omega_t$ units of $C^0$
By using Ito’s formula, we get
where $\mathscr{L}_t$ is a operator
The value of portfolio to match the option price of C
For the two expressions of dVt to match, we need
Set $V_t=C$ and substitute it to
In the pricing equation for $C$, there is a factor which depends on $C^0$, which we add to our portfolio to complete the market
Denote
Thus, by argument of creating a self-replicating portfolio, we have shown that the price $C$ of a European option with payoff $h(S_T)$, solves the following equation
with
if chosen $\lambda=0$, the difference of BS-PDE is the last two part of $\mathscr{L}$ with $b_t$
Unit 2.7:Risk Neutral Measure and Risk Premia
Finding the Risk-Neutral Measure
From its definition, under the risk-neutral measure, price $C_t$ of a European satisfies
Applying the Ito’s formula and using the equation (PDE) in previous part, we get
Thus, under the risk-neutral measure (cancels out the $dt$ part in parenthese of the function above), we must have the following
Under the risk-neutral measure, the price of a European option is written as
For the choice of model under physical measure
By matching the coefficients, we get
where $\beta_t$(market prices of risk) and $\phi_t$ are bounded processes
$dW_t^\mathbb{P}+\theta_tdt=dW_t^\mathbb{Q}$, $\mathbb{P}:$现实测度
Risk Premiums
$\beta_t$ is typically called the risk premium. Analogously, $\phi_t$ is called the risk premium of volatility
$\lambda_t$ is calibrated directly under the risk-neutral measure (chosen by market)
Unit 2.8:Estimation Error
Estimation of Volatility
We could approximate $\sigma_t$ by its average over a period of T estimated using the historical data. However, estimation of historical volatility faces the following issues
- Variance of the estimator decreases with T
- Bias of the approximation of $\sigma_t$ by average volatility increases with T
Recall that
Thus, the quadratic variation of $log S_t$ is given as
Recall from the definition of quadratic variation of a random process
Denote the log-returns of asset price as
$dlogS_t=(logS_{t+1}-logS_t)dt=log\frac{S_{t+1}}{S_t}dt$
Then, we have
Realised Variance
Therefore, we could estimate the integrated variance by the realised variance
Next, suppose that
- $\sigma_t$ is independent of Brownian motion W driving asset prices S
- Drift function $\mu$ is deterministic
Then, for a fixed volatility path (由已知数据得出), the log-returns $R_i:=r(\frac{i-1}{N}t,\frac{t}{N})$ are independent Gaussian random variables with variance & mean
Error in Realised Variance
Thank the Professor Ankush Agarwal and Christian Ewald. They provide the precise process of following.
If we compute thxe expectation of realised variance $VR^N_t$ for a given volatility path $\sigma_t$, we get
$E(x^2)=Var(x)+E(x)^2$
with
直接取一个固定值来代替这一段的积分,积分在t时间上的宽度是 t/N
Then
Thus, the bias of estimating the integrated variance with realised variance
Similarly, the variance of $VR^N_t$ for a given volatility path $\sigma_t$ is
$R_i$ are iid
For every i
With $E[R_i^4]=m_i^4+6m_i^2v_i+3v_i^2$
关于这一定理可以参考 如何通俗的理解矩母函数,以及 wiki 中展示的 Order 4 的 Non-central moment
本站更多参考 正态分布与矩母函数/)
and
Repeat the same step as above
Also
Approximately, the variance of $VR^N_t$ for a given volatility path $\sigma_t$ is
Next, we note that the root mean squared error (RMSE) of an estimator is given by
Therefore, the dominant term in the error of realised variance is
If the Relative error is large. We should check out the Greek (sensitive of option price to the Volatility). Insensitive $\to$ no matter that much
Supplymental reading
defination
- $p_t$ = log-price of asset at time $t$ = $logS_t$
- $\Delta$ = fraction of a trading session associated with the implied sampling frequency
- $m=1/\Delta$ = number of sampled observations per trading session
- $T$ = number of days in the sample $\Rightarrow$ $mT$ total observations
- $r_{t-1+j\Delta}=p_{t-1+j\Delta}-p_{t-1+(j-1)\Delta}=log\frac{S_{t-1+j\Delta}}{S_{t-1+(j-1)\Delta}}$
- $r_t=r_{t-1+\Delta}+r_{t-1+2\Delta}+…+r_t=log\frac{S_t}{S_{t-1}}$
- Realized variance (RV=VR) on day $t$
Realized volatility (RVOL) on day $t$
Realized log-volatility (RLVOL) :
- Arithematic Brownian Motion (只参考,我们运用的$\mu,\sigma$不是固定的)
- The quadratic variation (QV) of the return process from time $0$ to $t$ is
- Let $p(t)$ be described by the stochastic differential equation
- IV$_t$ denotes integrated variance for day $t$
Example: QV for Wiener process
Q1 What does RV estimate?
That is, daily RV converges in probability to the daily increment in QV.
ABDL (2003)
- Since $\int^t_{t−1}\mu(s)ds$ is generally very small for daily returns and $RV_t^m$ is a consistent estimator of $IV_t$, for Itô processes daily returns should follow a normal mixture distribution with $RV_t^m$ as the mixing variable. As a result, returns standardized by realized volatility should be standard normal
- If there are jumps in $dp (t)$, then $RV_t^m \xrightarrow{p} IV_t$ IVt but returns are no longer conditionally normally distributed.
Define the error
BNS (2001) : For the Ito diffusion model under the assumption that mean and volatility processes are jointly independent of $W(t)$
where is the integrated quarticity (IQ)
Hence,
IQ$_t$ may be consistently estimated using the following scaled version of realized quarticity (RQ)
The feasible asymptotic distribution for $RV_t^m$ is
Unit 2.9:The Heston Model
Heston’s Stochastic Volatility Model
Under the risk-neutral measure, Heston’s stochastic volatility model is given as
本课的模型都是在风险中性 (鞅) 的测度下,对于有些文章会按现实测度写,但实际是效果都相同
If we have $W_1,\tilde W$ where $\langle W _1,\tilde W\rangle=0$
The variance process $v_t$ is also known by the name of Cox-Ingersoll-Rubinstein (CIR) process
To model the variance of asset prices, we need $v_t$ to remain positive
(Advantage)For $\frac{2k\theta}{\delta^2}>1$ the process never hits zero. This constraint on parameters is also known as Feller condition
This process is mean reverting
For the square root process
- $\theta$ is the long term mean
- k is the speed of mean-reversion
- $\delta$ is volatility of volatility
First we define the log-price $X_t$ by $S_t=S_0e^{rt+X_t}$
Next, by using the Ito’s formula, we get
European call option price with maturity T and strike K is given as
For log-strike
We first find the characteristic function of $X_T$
To compute $C(K)=S_0\mathbb{E}[(e^{X_T}-e^k)^+]$ , we then use the Fourier transform $f (t, u, x, v)$ of $X_T$
European Call Option Price Formula
where $Re(a+ib)=a$, We can only solve the $P_j$ numerical
Unit 2.10:Introduction to Calibration
Exotic Option
Not traded on the market and $\to$ need to be priced by a model
A forward start option: $(S_T-aS_{T_1})^+$
A cliquet: $\sum_{i=1}^{n-1}(S_{T_{i+1}}-aS_{T_i})^+$
A barrier option: $(S_T-K)^+1_{sup_{t\in[0,T]}S_t\leq B}$
Identify the hedging instruments : liquid products (small bid-ask spread) quoted on a market
we aim at constructing a hedging portfolio of the form
Choose a model
Reproduce the prices of simple instruments quoted on the market. (This step is model calibration)
Then compute the hedging strategies $\Delta^0,\Delta,\Delta^{C,i},\Delta^{P,j}$
General Considerations
Model calibration = determine the model parameters from market data of liquid instruments
Two approaches in model calibration
- Statistical parameter estimation
- from historical observations
- done under the physical (market) measure
- Calibration
- based on quotes of option prices (typically calls/puts)
- done under the risk-neutral measure
- reflects the market view on the future evolution of the asset
Which price? Bid or Ask?
- Typically one calibrates to the mid-price
Model Calibration
In general, a “good” model should have
A parsimonious parametrization and should be intuitive
An efficient numerical calibration method
We want to find $\theta$ such that
or a least squares approach
Calibration of local volatility models
The inverse problem of finding the local volatility function with finite number of call option prices is then ill posed. To overcome the difficulty, typically two approaches are used
Implied volatility interpolation using either parametric or semi-parametric form
- Non-parametric methods of interpolation, for example, the method of splines(三次样条插值), provide very mediocre performance for such problems
- Implied volatility interpolation methods are very intuitive and fast, but could lead to infeasible volatility surface
Regularisation approach
- Regularisation methods provide the best performance in terms of precision and regularity of the volatility surface
Stochastic Volatility Inspired(SVI) Parametrisation (by Jim Gatheral)
It is a parametrisation of implied volatility as a function of parameters in terms of log-strike $k=log\frac{K}{S}-rT$
In the parametrisation, for a given maturity $T$, the total implied variance $V(T, k) := I^2(T, k)T$ satisfies
with
$a\ge 0$: global level of smile
$b\ge 0$: slope of the wings
$\rho\in[-1,1]$: asymmetry (rotation of the smile)
$m\in\mathbb{R}$: translation to the right
$\sigma\ge 0$: at-the-money convexity
Step
- For a given maturity $T_i$, we compute the implied volatility $I(T_i, k_j)$ for all the strikes $K_j$ of the liquidly traded call options
Then using an optimisation procedure, for example, least-squares, we compute the optimal values for parameters $a,b,\rho,m$ and $\sigma$ for maturity $T_i$
- The parameters are estimated for each maturity separately
For the intermediate maturities without liquidly traded options, the implied volatility can be computed using linear interpolation and the SVI parameters can be computed
Absence of Arbitrage
- Due to the interpolation procedure, SVI parametrisation can lead to arbitrage opportunities in the model
- Option price function $P(T, k)$ which is not convex with respect to $k$
- $\frac{\partial^2P(T,k)}{\partial k^2}<0$
To ensure the absence of arbitrage: $\lim_{k\to\infty}C^{BS}(T,k)=0$
The absence of arbitrage of the “calendar spread” type: $\frac{\partial V(T,k)}{\partial T}\ge 0$
- The above condition implies that the total implied variance is an increasing function with respect to maturity. This is equivalent to non-crossing of the curves of total implied variance for different maturities
The absence of arbitrage of the “butterfly” type
- The condition leads to the convexity of option prices with respect to $k$
Regularisation Approach
The calibration of volatility surface is posed as an optimisation problem using Tikhonov regularisation
Minimise the functional
To minimise $J(\sigma)$, we use, for example, the method of gradient descent or the method of conjugate gradient
Typical algortihm
- Discretise local volatility function on a mesh of space and time
- Make an initial guess $\sigma^{(0)}$,for example, a constant function
- Evaluate the gradient of $J(\sigma)$ at point $\sigma^{(0)}$to calculate the direction of descent $\lambda^{(0)}$
- Solve the one-dimensional minimisation problem
- Set $\sigma^{(1)}:=\sigma^{(0)}+h^*\lambda^{(0)}$ and iterate again
The local volatility function is discretised using different methods, for example, finite differences or trinomial tree model
The regularisation using $||\nabla\sigma||$ is the same in regions of the surface where the influence of volatility on the price is very high and in the regions where it is very small
Unit 3.1:Monte Carlo methods
Readings for the module on Monte Carlo Methods
- Chapter 3 Section 3.1, Section 3.2 - Monte Carlo Methods in Financial Engineering by Paul Glasserman
- Chapter 6 Section 6.1 - Monte Carlo Methods in Financial Engineering by Paul Glasserman
Unit 3.1.1:Introduction
Monte Carlo methods are preferable in high dimensions as their accuracy is independent of dimensionality
If f is integrable over [0, 1], then by strong law of large numbers (SLLN)
大数定律的定义是,当随机事件发生的次数足够多时,随机事件发生的频率趋近于预期的概率。可以简单理解为样本数量越多,其平概率越接近于期望值。大数定律的条件:1、独立重复事件;2、重复次数足够多。
for
w.p. 1 (with probability 1)
If f is also square integrable, for
the error in the estimator $\hat\alpha_N$ is approximately normally distributed with mean 0 and standard deviation $\frac{\sigma_f}{\sqrt{N}}$
Typically $\sigma_f$ is unknown and is estimated as
Monte Carlo Estimation for European Option
Monte Carlo samples $S_i(T)$ of $S(T)$ can be generated as
where $Z_i\sim N(0,1)$
用 $\sqrt TZ_i$ 代替 $W_t$ 可以减少 noise
Call price estimator is given as
The estimator $\hat C_N$ is unbiased in the sense that
Unit 3.1.2:Efficiency of Monte Carlo methods
Confidence Interval
Using estimator $\hat C_N$ makes sense with a confidence interval
where
- $2z_{\delta/2}\frac{s_C}{\sqrt{N}}$ is called the width of the confidence interval
- $z_\delta$ denotes the $1-\delta$ quantile of the standard normal distribution
- $\mathbb{P}(Z\leq z_\delta)=1-\delta$ for $Z\sim N(0,1)$
- For $\delta = 0.05, z_\delta/2 \approx 1.96$, we get a 95% confidence interval
Central limit theorem (CLT)
With unbiased estimator $\hat C_N=\frac1N\sum_{i=1}^NC_i$ for iid samples $C_i$, $\mathbb{E}[C_i]=C$ and $Var(C_i)=\sigma_C^2<\infty$
- $\sigma_C$ can only be estimated
Suppose it takes $\tau$ time to generate $C_i$. Given computational budget $\Gamma$, can generate $\Gamma/\tau$ samples and create estimator $\hat C_{\Gamma/\tau}$
- $\sqrt{\Gamma/\tau}(\hat C_{\Gamma/\tau}-C)\sim N(0,\sigma_C^2)$
as $\Gamma\to\infty$, $\sqrt{\Gamma}(\hat C_{\Gamma/\tau}-C)\sim N(0,\sigma_C^2\tau)$
Given a choice, we should use an estimator with lower value of $\sigma_C^2\tau$
Mean Square Error
For an estimator $\hat\alpha$ of true value $\alpha$
两边可以尝试配平
Landau’s notation: For parameter $h\to 0$
类似于算法复杂度,大 O 是上界,小 o 是下届
Unit 3.1.3:Euler Method
Consider an ordinary differential equation (ODE) in time interval [0, 1]
Divide interval into $N$, $h=1/N$
The above ODE can be solved approximately using an Euler scheme
where
- Error criterion $e(h):=|x(1)-x^h(1)|$
- Under suitable assumptions $e(h)=\mathscr{O}(h)=\mathscr{O}(1/N)$
Euler Method更多解释可参考数值分析的部分
Euler Discretisation for BM
with $h=\frac TN,t_n^h:=nh,\Delta W_n^h:=W(t_n^h)-W(t_{n-1}^h)$, the Euler scheme for BM is given as
Note that $\Delta W_n^h$ are iid increments which are normally distributed $N(0,h)$
Euler Method for SDEs
The Euler scheme for $X(t)$ is given as $X^h(0)=x_0$
If we wish to price options based on the path of the SDE, for example, Asian option with price
we need a good approximation of the sample path of $X$ which leads to the error criterion (strong error)
If we wish to price options based only on the terminal value of the SDE, for example, European option with price
we need a good approximation of the distribution of $X(T)$ which leads to the error criterion (weak error)
for smooth functions g
光滑函数(smooth function):在其定义域内无穷阶数连续可导的函数
g 可以取 $(S(T)-K)^+$
We say that $X^h$ converges to $X$ with order $\beta > 0$
- strongly with order $\beta$ if $e_s(h)=\mathscr O(h^\beta)$
- weakly with order $\beta$ if $e_w(h)=\mathscr O(h^\beta)$
Under some regularity conditions on functions $b$ and $\sigma$, Euler scheme converges strongly with order $\beta = 1/2$ (Euler) and weakly with order $\beta = 1$ (Milstein)
当 $\beta$ 数越大,收敛速度越快,越准确
Unit 3.1.4:Milstein Scheme
Consider the SDE
and its Euler scheme approximation
Thus, the Euler scheme approximates
Milstein scheme is an improvement on the Euler scheme. The main idea is to approximate the stochastic integral in a better way
Let us estimate the error in Euler scheme by applying Itô’s formula on $\sigma(t,X(t))$. Suppose $t_{n-1}=0$
Then
Combined with
So we have
$W(h)\sim o(h^{1/2})$,$\sigma$ 采用 0 点时
这一步过程看 定义
Thus, the Milstein scheme is given as $X^h(0)=x_0$
如果 payoff 只取决于最终的价格,两种方法不明显,如果取决于path,Milstein 会明显提高
Unit 3.2:Variance reduction methods
Readings for the module on variance reduction methods
- Chapter 4 Section 4.1, 4.2 - Monte Carlo Methods in Financial Engineering by Paul Glasserman
Unit 3.2.1:Control variate method
Introduction
In financial markets and the insurance industry, most quantities of interest can be expressed as
Suppose the goal is to compute $p$. Then, use the Monte Carlo method:
Generate $(X_i)_{1\leq i\leq N}$ i.i.d. samples of $X$ & set
By the Central Limit Theorem (CLT)
binomial 分布, p 是存在的概率
95% confidence interval
The relative error is given as
which is large for small value of $p$
Thus, to estimate small $p$, using the crude Monte Carlo method, we may need a large number of samples $N$
如果 $p = 0.01 (10^{-2})$, $N$ 至少 $=10^6$ 才使得 relative error $=(\sqrt{10^4})^{-1}=10^{-2}$
As discussed in the last lecture, the estimator error is given by $s_p/\sqrt{N}$ where $s_p$ is the estimate of standard deviation of estimator $\hat p_N$
The aim of variance reduction methods is to produce an alternative estimator $\hat p_N^{var}$ such that $\mathbb{E}[\hat p_N^{var}]=p$ but with smaller variance than estimator $\hat p_N$
Control Variates
Let $Y_1, Y_2, . . . , Y_N$ be i.i.d. outputs from $N$ replications of a simulation. For example, $Y_i$ could be the discounted payoff of a derivative security on the $i$th simulated path.
Suppose that our aim is to compute $\alpha=\mathbb{E}[Y_i]$ and the crude Monte Carlo estimator $\alpha_N:=\sum_{i=1}^N\frac{Y_i}N$ is unbiased and converges with probability (w.p.) 1 as $N\to\infty$
Suppose on each replication, we generate another output $X_i$ along with $Y_i$
Suppose that $(X_i,Y_i)$ are i.i.d. $i=1,…,N$ and that $\alpha_0=\mathbb{E}[X_i]$ is known
Furthermore, denote $(X, Y)$ as a generic pair of $(X_i,Y_i)$
For fixed $b$, calculate
and compute the sample mean
$\hat\alpha_N(b)$ is a control variate estimator and the observable error $N^{-1}\sum_{i=1}^NX_i-\alpha_0$ acts as a control in estimating $\alpha$
The control variate estimator is unbiased
$\hat\alpha_N(b)$ is also strongly consistent
Each $Y_i(b)$ has variance which is equal to
$\hat\alpha_N(b)$ provides variance reduction if
Optimal coefficient $b^$ *minimizes $\sigma^2(b)$ and is given as
The ratio of the variance of optimally controlled estimator to that of uncontrolled
The variance reduction factor is given as
$Var(\hat\alpha_N)\times(1-\rho_{XY}^2)={Var(\hat\alpha_N(b^*))}$ 如果要达到原来的值,需要用 $N/(1-\rho_{XY}^2)$ 个样本
For example, for $|\rho_{XY}| = 0.95$, leads to 10-times variance reduction, whereas $|\rho_{XY}| = 0.90$, leads to 5-times variance reduction (usually 5)
Computing $b^*$
As $\alpha=\mathbb E[Y]$ is unknown, in practice, $\sigma_Y$ or $\rho_{XY}$ are also unknown
Estimate $b^*$ as
Then, instead of $Y_i(b)$, we use
Replacing $b$ with $\hat b_N$ introduces bias as the samples $Y_i(\hat b_N)$ are not independent any more, thus, application of CLT is not clear
Estimate $\alpha$ as $\hat\alpha_N(\hat b_N)=\frac1N\sum_{i=1}^NY_i(\hat b_N)$
Unit 3.2.2:Application of control variate estimator
Using Underlying Assets
Consider the Black-Scholes model
Suggest a Monte Carlo control variate method to estimate the price of option
Asian Options
An arithmetic Asian call option with discounted payoff $e^{-rT}(\bar S_m^A(T)-K)^+$ where
we have a closed-form formula for geometric Asian call option with discounted payoff $e^{-rT}(\bar S_m^G(T)-K)^+$
Unit 3.2.3:Antithetic variate method
Method of antithetic variates attempts to reduce variance by introducing negative dependence between pair of replications
Note that if $U$ is uniformly distributed over [0, 1] then so is $1 – U$
- $U_i$ and $1 – U_i$ form an antithetic pair in the sense that a big value of one is accompanied by a small value of the other
- Thus, a large value of the function value computed on one path is compensated by the small value on the antithetic path, leading to reduction in variance
If $Z$ is distributed as $N (0, 1)$ then so is $–Z$
Suppose we generate a path $Y_i$ using a sample $Z_1, Z_2 , . . . , Z_n$ of normally distributed random variable, we can create an antithetic path $\tilde Y_i$ using the sample $–Z_1, –Z_2, . . . , –Z_n$
Suppose we generate N antithetic pairs $(Y_i, \tilde Y_i )^N_{i=1}$
The confidence interval is given as
Antithetic sampling reduces variance when $Cov(Y_i,\tilde Y_i)<0$
Unit 4.1:Finite difference methods
Unit 4.1.1:Introduction
The finite difference method solves a partial differential equation on a discrete grid in time and space
Consider a general equation as
- $f$ is called the source term
- $a, b, r,$ are assumed to be bounded and continuous
- $a$ is assumed to be nonnegative
The Black-Scholes equation is given as
we note that
Change of variable
In the Black-Scholes equation, we then plug
The transformed Black-Scholes equation with bounded coefficients is then given as
The above equation has bounded coefficients and is suitable to apply finite difference methods
Unit 4.1.2:Explicit Scheme
Consider the equation
For positive integer N discretise the domain with time step size $h_0=\frac TN$ and space step size $h_1>0$
Denote by $v_j^k\approx V(jh_1,kh_0)$, $j\in \mathbb Z$, $k=0,1,…,N$
Standard explicit finite difference scheme is written as
In the scheme, we define
The ordered form of the equation is written as
The coefficients of the values at previous time step are all nonnegative if the following monotonicity condition holds:
$||a||_\infty$ : Maximum possible value of $a$
Example: Digital Option
Consider the digital option with payoff $g(x)=1_{x>0}(x)$. Suppose $f(x,t)=0$ and $a(x,t)=1$
Suppose the spatial domain is taken as $[–1, 1]$ and $T = 1$. For the boundary condition
we solve the explicit scheme
Unit 4.1.3:Implicit Scheme
For $j\in \mathbb Z$, $k=0,1,…,N$
the fixed-point form of the equation is written as
- The scheme is monotonic without any condition
Unit 4.1.4:Family of $\theta$ Schemes
For $\theta\in [0, 1]$, consider the following scheme
The scheme is explicit if $\theta=0$, and implicit otherwise (standard implicit for $\theta=1$)
The scheme is monotonic if
Application: Option Pricing
Consider the transformed Black-Scholes equation
Choose the domain $D=[0,T]\times[X_{min},X_{max}]$ with $-\infty<X_{min}<X_{max}<\infty$
Approximate $D$ with a uniform mesh
where $t^k=kh_0$ and $x_j=X_{min}+jh_1$ for $h_0=T/N$ and $h_1=(X_{max}-X_{min})/M$
Denote $v(t^k,x_j)=v_j^k$. Then, we use the finite difference derivative approximation based on the $\theta$ scheme
where $\theta\in [0, 1]$ is a constant parameter.
We also replace $v_j^k$ by $\theta v_j^k+(1-\theta)v_j^{k+1}$. For $\theta=0.5$ the method corresponds to Crank-Nicholson method
We can formulate a tridiagonal system at each time step $k = N, . . . , 0$ which can be solved by the well-known Thomas algorithm (MATLAB command: tridiag)
with non-zero coefficients of the tridiagonal matrix $A = (a_{ij})$ given by
The time dependent vector $b^{k+1}$ is given as:
where $v_0^{k+1},v_M^{k+1}$ are given by the boundary conditions and the remaining constants are defined as below
The $i$th coordinate of the vector $v^k$ is the approximation of the value $v(t^k,x_i)$
Unit 4.2:Optimisation methods in finance
Introduction
In finance, the parameters in stochastic models are estimated using the available market data. This procedure is called model calibration
Most calibration problems can be seen as either a root-finding procedure or as an optimisation problem
Convex Functions
Let $\mathscr U$ denote a convex set and let $f :\mathscr U \to \mathbb R$ be a function. Then, $f$ is convex if for all $u_1, u_2 \in \mathscr U$ , for all $\lambda \in [0, 1]$,
Any local minimun of a convex function is also its global minimum
If $w_1,…,w_n\ge0$ and $f_1,…,f_n$ are all convex, then $\sum_{i=1}^n w_if_i$ is also aconvex function
If $f_1,…,f_n$ are convex, then $g(x):=max(f_1,…,f_n)$ is also a convex function
Unit 4.2.1:Newton’s Method
Basic approach:
- Approximate the function $f$ by its tangent at a given point
- The point at which the tangent crosses the x-axis is taken as an approximate solution to the equation $f (x) = 0$
- The process is repeated with the new point
Given an initial guess $x_0$, a sequence of points is constructed using the recurrence relation
For the method to converge, we need, at least, that $|f′′| \leq K$ and the initial guess is close to the true solution
Example: Implied Volatility
Consider $C(\sigma)$ as the call option price formula given strike $K$ and time to maturity $T$ with volatility $\sigma$ as the parameter
Aim is to find the implied volatility for call option with market price $C_0$
Given an initial guess $\sigma_0$, Newton’s method can be used to construct a sequence of points as
For error tolerance $\epsilon$, stop the iteration for some $n + 1$, when $|\sigma_{n+1} – \sigma_n| < \epsilon$ (convergence)
Unit 4.2.2:Descent Methods
To solve optimisation problems, Newton’s method cannot be used as it can guarantee only local convergence for a good initial guess
In practice, to solve optimisation problems, we need methods with global convergence, that is convergence from an arbitrary initial guess (starting point)
The overall idea of descent methods is to construct a sequence of points as
where $\mu_n$ is a non-increasing sequence of learning rate and $p_n$ is direction vector
Gradient Descent Method
For model parameter $u$ taking values in the space $\mathscr U$ , we consider the problem
The sequence of points is constructed as
where $\mu > 0$ and $\Delta J(u_n)$ is the gradient of cost function evaluated at $u_n$
$\mu$ has to be carefully chosen to ensure convergence of the method
Gradient Descent Method with Constraints
Under some constraints on the parameters, we look to solve the problem
where $K$ is a closed convex subset of the parameter space $\mathscr U$
For $\mu > 0$, a sequence of points is constructed as
where $P_K(u)$ is the orthogonal projection of $u$ on $K$
For $x \in \mathscr U$ , its orthogonal projection in $K$ is given by $x_K$ where
Penalisation of Constraints
We can convert a minimisation problem with constraints into an optimisation problem without constraints
The idea is to penalise the constraints and form a new unconstrained minimisation problem
Consider the constrained minimisation problem with continuous convex cost function $J:\mathbb R^d\to\mathbb R$
where $F : \mathbb R^d \to \mathbb R^M$ is a continuous convex function representing thec onstraints on parameter $u \in \mathbb R^d$
An unconstrained problem is then given as
where the constraint $F_i(u) \leq 0$ has been penalised
The above problem can be solved using descent methods
For small value of $\epsilon > 0$, the solution of unconstrained problem approximates the solution of constrained problem
Unit 4.2.3:Stochastic Gradient Descent
Unlike the gradient descent methods which are deterministic, stochastic gradient methods choose the direction of descent randomly
Gradient descent methods perform well for convex cost functions whereas stochastic gradient descent methods can handle non-convex cost functions
Consider the model parameter $\theta \in \mathbb R^d$
For example:
consider the Heston’s stochastic volatility model
The parameter vector $\theta = (\kappa,\theta,\delta,\rho)\in \mathbb R^4$
Next, consider the cost function $J:\mathbb R^d\to\mathbb R$
For example:
Given market prices $P^{market}$ of call options with strike
$K_j, 1 \leq j \leq N$ and time to maturity $T_i, 1 \leq i \leq M$, the cost function for Heston’s model calibration can be written as
where $P^\theta$ is the call option price function under Heston’s model
Unlike the gradient descent methods, which use the entire dataset $\{K\}^N_{j=1}$, $\{T\}^M_{i=1}$, to construct the points sequence, the stochastic gradient descent methods use only a fraction of the dataset to construct the sequence
Stochastic Gradient Descent Algorithm
Stochastic gradient descent (SGD) method constructs the sequence as
where $\mu > 0$ is a positive scalar and the update is performed for a uniformly randomly chosen sample $K_j$ and $T_i$ from the dataset
The SGD method updates frequently with a high variance which leads to a slower convergence
On the other hand, due to high variance, the SGD method is able to better explore the parameter space and is able to jump to better local minima for non-convex cost functions
Mini-batch Gradient Descent
Mini-batch gradient descent algorithm randomly picks a sample from the dataset to compute the derivative of the cost function
Uniformly randomly select m samples $(K_i,T_j)$ from the dataset $\{K\}^N_{j=1}$, $\{T\}^M_{i=1}$, and denote it by $\{X_i\}^m_{i=1}$ , called the mini-batch
Compute the derivative of the cost function using the mini-batch, that is, compute $\Delta_\theta J(\theta;\{X_i\}^m_{i=1})$
Construct the sequence using the recursion
Small batch sizes lead to noisy updates whereas large batch sizes require considerable computation time
Conclusion
Stochastic gradient descent algorithms can be further improved using different statistical features. The most famous algorithms are
- AdaGrad
- AdaDelta
- AdaM
The above variations of the stochastic gradient algorithm are mostly used for very large datasets
For most model calibration purposes, mini-batch gradient descent method provides good performance