基于格拉斯哥ECON5065的笔记；

General information

Many problems in economics, finance and financial engineering do not possess a closed-form solution which could be computed easily. Typically, these problems are solved by using numerical algorithms and simulation methods. This course covers the study and implementation of such techniques. These include Monte Carlo simulation for pricing complex derivatives; finite difference methods to solve partial differential equations for option pricing; effective methods to compute hedging strategies; numerical dynamic programming for optimal portfolios, resource management and economic growth. Advanced understanding of these techniques and their implementation on computer using software packages and programming languages is the focus of this course.

Aims

The main aims of this course are to introduce students to core techniques in computational finance, such as simulation of asset prices, pricing options using stochastic models, Monte Carlo methods as applied to complex derivatives, solving Black-Scholes type equations numerically, solving dynamic optimization problems numerically, and how to implement these techniques using modelling software packages and programming languages.

ILOs

By the end of this course, students will be able to:

Calculate the price and hedging portfolios of complex derivatives using Monte Carlo simulations;
Calculate the price of American and other exotic options by implementing numerical methods to solve the related partial differential equations;
Implement dynamic programming algorithms to solve problems in portfolio optimisation, resource management and monetary policy;
Demonstrate advanced skills in modelling software package MATLAB.

Unit 1：Black-Scholes Model

Unit 1.1：P&L Approach to Black-Scholes Equation

P&L：profit and loss
T：maturity
S：underlying asset price
$F(S_T)$：payoff function
$P(t, S)$：pricing function

Consider the P&L due to trading between time $t$ and $t + \delta t$

$P\&L=-[P(t+\delta t,S+\delta S)-P(t,S)]+rP(t,S)\delta t+\Delta\delta S-(r-q)\Delta S\delta t$ $P(t+\delta t,S+\delta S)-P(t,S) \approx \frac{dP}{dt}\delta t+\frac{dP}{dS}\delta S +\frac{1}{2}\frac{d^2P}{dS^2}(\delta S)^2$

$\delta S$：change in the stock price, $\delta S$ independent of $\delta t$
$\Delta = \frac{dP}{dS}$

$\begin{align} P\&L&=-[\frac{dP}{dt}\delta t+\frac{dP}{dS}\delta S +\frac{1}{2}\frac{d^2P}{dS^2}(\delta S)^2]+rP\delta t+\frac{dP}{dS}\delta S-(r-q)\frac{dP}{dS} S\delta t\\ &=-\underbrace{(\frac{dP}{dt}-rP+(r-q)\frac{dP}{dS} S)}\delta t -\underbrace{\frac{1}{2}S^2\frac{d^2P}{dS^2}(\frac{\delta S}{S})^2}\\ &\qquad \qquad deterministic\ part \quad \quad \quad random\ exposure \end{align}$

theta：$\frac{dP}{dt}$
corrected theta：$\frac{dP}{dt}-rP+(r-q)S\frac{dP}{dS}$
gamma：$\frac{d^2P}{dS^2}$
Dollar gamma：$S^2\frac{d^2P}{dS^2}$

$P\&L=-A(t,S)\delta t -B(t,S)(\frac{\delta S}{S})^2$

Any model is unusable if $A(t, S)$ and $B(t, S)$ have the same sign. The model is usable only if $A(t, S)$ and $B(t, S)$ have different signs.

No money is made if

$\frac{\delta S}{S}=\pm \sqrt{-\frac{A(t, S)}{B(t, S)}}\delta t$

On average we have

$\langle(\frac{\delta S}{S})^2\rangle=\hat{\sigma}^2\delta t$

where $\hat{\sigma}$ is the historical volatility of $S$ based on returns $\delta S/S$

For risk-management, our model should satisfy the condition

$A(t, S) = -\hat{\sigma}^2B(t, S)$

Thus,

$\frac{dP_{\sigma^2}}{dt}-rP_{\sigma^2}+(r-q)\frac{dP_{\sigma^2}}{dS} S= -\frac{\hat{\sigma}^2}{2}S^2\frac{d^2P_{\sigma^2}}{dS^2}$

where $P_{\sigma^2}$ accounts for dependence on the volatility level $\hat{\sigma}$

We then get,

$P\&L=-\frac{S^2}{2}\frac{d^2P_{\sigma}^2}{dS^2}((\frac{\delta S}{S})^2-\hat{\sigma}^2\delta t)$

For vanilla options in the market, $\hat{\sigma}$ is the implied volatility such that $P_{\sigma^2}$ is equal to the market price.

Unit 1.2：Stochastic Differential Equations

Recall a differential equation

$\dot{X}(t)=\frac{dX(t)}{dt}=g(t,X(t))$

This can be written alternatively as

$dX(t)=g(t,X(t))dt$

or approximately as an difference equation with small time interval (increment) $\Delta t$

$\Delta X(t)=X(t+\Delta t)-X(t)=g(t,X(t))\Delta t$

with the increments of Brownian motion $\Delta W = W(t+\Delta t)-W(t), \Delta W \sim N(0,\Delta t)$

$\begin{matrix} \Delta X(t)=\underbrace{g(t,X(t))\Delta t}+\underbrace{\sigma(t,X(t))\Delta W}\\ \qquad\qquad\quad drift\ part \quad \quad diffusion\ part \end{matrix}$

stochastic differential equation

$dX(t)=g(t,X(t))d t+\sigma(t,X(t))d W$

Unit 1.3：Ito Formula

Ito Formula 相当于 Taylor 展开的一种特殊形式，因为它是针对随机过程的，因此在二次导的展开项中直接略过了所有与 dt 相乘的项，只留下 $F_{XX}$

与 dt 相乘的项都趋近 0 的原因是，对于 dtdt/dtdW，其中一个部分是作用于 $||\pi||\to 0$ 时，把累加部分相连的. $\sum f(x) (t_{i+1}-t_i)\to \int f(x) dt$. 剩下的那一部分由于 $||\pi||$ 极限趋近于 0，即两个 t 之间相隔极近，会使整个式子趋近 0. 比如 dtdt 为 $(dt)^2$, dtdW 为 $(dt)^{1.5}$, 去掉一个 dt 仍然有保留的幂，则整个式子取 0.

We are interested in the dynamics of an expression such as $f(t)=F(t,X(t))$ with $F:[0,\infty]\times \mathbb{R}\to \mathbb{R}$ .

$F$：option pricing function
$X(t)$：underlying share price
$(dW)^2 = dt$
up to order 1 in $dt$：$(dt)^2=0$， $(dW)(dt)=dt\sqrt{dt}=0$
$(dX)^2=\sigma(t,X(t))^2dt$

Then, the stochastic process $f$ has the following dynamics

$\begin{align} df&=F_tdt+F_XdX+\frac{1}{2}F_{XX}(dX)^2\\ &=F_tdt+F_Xg(t,X(t))dt+F_X\sigma(t,X(t)) dW+\frac{1}{2}F_{XX}\sigma(t,X(t))^2dt\\ &=(F_t+F_Xg(t,X(t))+\frac{1}{2}F_{XX}\sigma(t,X(t))^2)dt+F_X\sigma(t,X(t)) dW \end{align}$

Suppose that the asset price $S$ solves the following stochastic differential equation

$\frac{dS(t)}{S(t)}=\mu dt+\sigma dW(t) \sim N(\mu,\sigma^2 dt)$

At $t = T$, we have

$S(T)=S(t)exp((\mu-\frac{1}{2}\sigma^2)(T-t)+\sigma(W(T)-W(t)))$

Delta Hedging

Black, Scholes, Merton approach : construct a riskless portfolio, which consists of selling the option and buying the asset

$\begin{align} -dP+\Delta dS&=-(P_tdt+P_SdS+\frac{1}{2}P_{SS}(dS)^2)+\Delta dS\\ &=-(P_t+\frac{1}{2}S^2\sigma^2P_{SS})dt+(-P_S+\Delta)dS \end{align}$

where

$\begin{align} (dS)^2&=(S(t)(\mu dt+\sigma dW(t)))^2\\ &=S(t)^2(\mu^2(dt)^2+2\mu\sigma dtdW(t)+\sigma^2(dW(t))^2)\\ &=S(t)^2(0+0+\sigma^2dt) \end{align}$

Substitution of $dS$ gives

$-dP+\Delta dS=-(P_t+\frac{1}{2}S^2\sigma^2P_{SS}+(P_S-\Delta)S\mu)dt+(-P_S+\Delta)S\sigma dW$

riskless $\Rightarrow \Delta = P_S$

Substitution gives

$-dP+\Delta dS=-(P_t+\frac{1}{2}S^2\sigma^2P_{SS})dt$

Recall that

$-dP+\Delta dS=(-P+\Delta S)rdt$

Combining the above results, leads to the Black-Scholes equation

$-Pr+P_SSr+P_t+\frac{1}{2}S^2\sigma^2P_{SS}=0$

with boundary condition

$P(T,S(T))=F(S(T))$

Under the Black-Scholes-Merton (BSM) model, European call option price is given as

$\begin{align} C(t,S,T,K)&=S\Phi(d_1)-Ke^{-r(T-t)}\Phi(d_2)\\ d_1&=\frac{ln(\frac{S}{Ke^{-r(T-t)}})+\frac{1}{2}\sigma^2(T-t)}{\sigma\sqrt{T-t}}\\ d_2&=\frac{ln(\frac{S}{Ke^{-r(T-t)}})-\frac{1}{2}\sigma^2(T-t)}{\sigma\sqrt{T-t}}\\ &=d_1-\sigma\sqrt{T-t} \end{align}$

European put option price is given as

$P(t,S,T,K)=Ke^{-r(T-t)}\Phi(-d_2)-S\Phi(-d_1)$

Black-Scholes PDE can be transformed to Black-Scholes Pricing Formula through solving the heat equation

Unit 1.4：Greeks

Certain partial derivatives of option prices are widely used for hedging purposes and have been assigned Greek names and symbols :

Delta	Gamma	Rho	Theta	Vega
$\Delta$	$\Gamma$	$\rho$	$\Theta$	$vega$
$\frac{\partial C}{\partial S}$	$\frac{\partial^2 C}{\partial S^2}$	$\frac{\partial C}{\partial r}$	$\frac{\partial C}{\partial t}$	$\frac{\partial C}{\partial \sigma}$

Unit 1.5：Implied Volatility

Suppose that the market price of a European call option with expiry $T$ and strike $K$ at time $t$ is observed to be $C^{mkt}(t)$

In our model

$C(t,T,\sigma,K)=C^{mkt}(t)$

Now consider vega of the option given by

$\frac{\partial}{\partial\sigma}C(t,T,\sigma,K)=S_t\sqrt{T-t}\Phi'(d_1)$

It can be shown, that there exists a solution whenever $C^{mkt}(t)$ is contained in the open interval $((S_t-Ke^{-r(T-t)})^+,S_t)$
The positivity of vega tells us that the function $C(t,T, σ,K) – C^{obs}(t)$ is strictly increasing as a function of $\sigma$

The solution of the equation, $\sigma^{impl}$ , is called implied volatility

We can therefore compute its zeros numerically by applying the Newton method (it always works if the function is convex)

$\sigma_{k+1} = \sigma_k - \frac{C(t,T, \sigma_k,K)-C^{obs}(t)}{\frac{\partial}{\partial \sigma}C(t,T, \sigma_k,K)}$

Plot the Implied Volatility - Moneyness ( $\frac{ln(K/F)}{\sigma\sqrt{T}}$ ) and we could have volatility smile

然而实际上 BS model 服从的是 normal distribution ，并且只包函基本的参数。因此对于 BS model 来说 moneyness （in，at，out of money）是不影响 volatility的，即画出的 moneyness-volatility 是一条直线。

对于真正的市场，存在 fat tail 现象，除此之外流动性，手续费，最低报价间隔，以及最小报价单位等变量未包含在 BS model 中，导致了它的不准确。此外，投资者参与期权交易的目的对隐含波动率也有影响。股票期权交易者中有很大一部分的交易目的是防止股票下跌，因此他们选择的策略多是 buy put 或者 covered call ，这样导致波动率曲线向右下侧倾斜；而外汇期权的交易者的交易目的并不偏向两种方向，因此隐含波动率曲线多是smile的形状。

Unit 2：Advanced stochastic models and their calibration

Unit 2.1：local volatility

Local volatility is a class of market models which is an extension of the fixed volatility Black-Scholes-Merton model. In the model, instantaneous volatility is a function of t, S; $\sigma(t,S)$

Given the market observed European option prices, our aim is to find
the function $\sigma(t,S)$. It was shown by Bruno Dupire that

$\sigma(t,S)^2=2\frac{\frac{dC}{dt}+qC+(r-q)K\frac{dC}{dK}}{K^2\frac{d^2C}{dK^2}}| _ { K = S,T=t }$

Unit 2.2：Derivation of the Dupire Formula

Suppose that we have the following dynamics

$dS_t=(r-q)S_t dt+\sigma_tS_tdW_t$

Next, recall that the price of a European call option is given as $C(T,K)=e^{-rT}\mathbb{E}[(S_T-K)^+]$

First, use Ito’s formula for $(S_T-K)^+$ over $[T,T + dT]$ :

$d(S_T-K)^+=\frac{\partial(S_T-K)^+}{\partial S_T}dS_T+\frac{1}{2}\frac{\partial^2(S_T-K)^+}{\partial S_T^2}(dS_T)^2$

We then get

$d(S_T-K)^+=\frac{\partial(S_T-K)^+}{\partial S_T}((r-q)S_T dT+\sigma_TS_TdW_T)+\frac{1}{2}\frac{\partial^2(S_T-K)^+}{\partial S_T^2}\sigma^2_TS^2_TdT$

Next, note that

$\frac{\partial(S_T-K)^+}{\partial S_T}=\theta(S_T-K)\ Heaviside\ function$ $\frac{\partial^2(S_T-K)^+}{\partial S_T^2}=\delta(S_T-K)\ Dirac\ delta\ function$

where Heaviside function

$\theta(x)=\begin{cases}1&x>0\\0&x<0\end{cases}$

and delta function

$\delta(x)=\begin{cases}\infty&x=0\\0&x\ne0\end{cases},\ \int_{-\infty}^\infty \delta(x)dx=1$

$\int_{-\infty}^\infty\delta(x-y)f(x)dx=f(y)$

Then,

$d(S_T-K)^+=\theta(S_T-K)((r-q)S_T dT+\sigma_TS_TdW_T)+\frac{1}{2}\delta(S_T-K)\sigma^2_TS^2_TdT$

To use the above formula, let us consider “undiscounted” option prices $\mathscr{C}(T,K)=\mathbb{E}[(S_T-K)^+]$. Taking derivative w.r.t. K

$\begin{align} \frac{d\mathbb{E}[(S_T-K)^+]}{d K}&=-\mathbb{E}[\theta(S_T-K)]\\ \frac{d^2\mathbb{E}[(S_T-K)^+]}{d K^2}&=\mathbb{E}[\delta(S_T-K)] \end{align}$

Note that $\mathbb{E}[\delta(S_T-K)]$ is the density of stock prices

$\mathbb{E}[\delta(S_T-K)]=\int_{-\infty}^\infty\delta(S_T-K)f(S_T)dx=f(K)$

From the identity

$\begin{align} \mathscr{C}(T,K)&=\mathbb{E}[(S_T-K)\theta(S_T-K)]\\ &=\mathbb{E}[S_T\theta(S_T-K)]-K\mathbb{E}[\theta(S_T-K)] \end{align}$

we get $\mathbb{E}[S_T\theta(S_T-K)]=\mathscr{C}+K\mathbb{E}[\theta(S_T-K)]=\mathscr{C}-K\frac{\partial\mathscr{C}}{\partial K}$

Next note that $\mathbb{E}[d(S_T-K)^+]=d\mathbb{E}[(S_T-K)^+]$

Then, by taking expectation in the Ito’s formula applied on $(S_T-K)^+$ with $\mathbb{E}[W_t]=0$

$\begin{align} \mathbb{E}[d(S_T-K)^+]&=\mathbb{E}[\theta(S_T-K)((r-q)S_T dT+0)]+\frac{1}{2}\mathbb{E}[\delta(S_T-K)\sigma^2_TS^2_TdT]\\ d\mathscr{C}&=(r-q)\mathbb{E}[\theta(S_T-K)S_T]dT+\frac{1}{2}\mathbb{E}[\delta(S_T-K)\sigma^2_TS^2_T]dT\\ \frac{d\mathscr{C}}{dT}dT&=(r-q)(\mathscr{C}-K\frac{\partial\mathscr{C}}{\partial K})dT+\frac{1}{2}\mathbb{E}[\delta(S_T-K)\sigma^2_TS^2_T]dT \end{align}$

we evaluate the function at $S_t$ equal to $K$

$\frac{d\mathscr{C}}{dT}dT=(r-q)(\mathscr{C}-K\frac{\partial\mathscr{C}}{\partial K})dT+\frac{1}{2}K^2\mathbb{E}[\delta(S_T-K)\sigma^2_T]dT$

Above yields

$\mathbb{E}[\delta(S_T-K)\sigma^2_T]=\frac{2}{K^2}(\frac{d\mathscr{C}}{dT}-(r-q)(\mathscr{C}-K\frac{\partial\mathscr{C}}{\partial K}))$ $\frac{\mathbb{E}[\delta(S_T-K)\sigma^2_T]}{\mathbb{E}[\delta(S_T-K)]}=2\frac{\frac{d\mathscr{C}}{dT}-(r-q)(\mathscr{C}-K\frac{\partial\mathscr{C}}{\partial K})}{K^2\frac{d^2\mathscr{C}}{dK^2}}$

Going back to the notation of $C = e^{–rT}\mathscr{C}$ , we obtain the Dupire’se quation

$\begin{align} \mathbb{E}[\sigma_T^2|S_T=K]&=\frac{\sigma(T,K)f(K)}{f(K)}=\frac{\mathbb{E}[\delta(S_T-K)\sigma^2_T]}{\mathbb{E}[\delta(S_T-K)]}\\ &=2\frac{\frac{dC}{dT}+qC+(r-q)K\frac{\partial C}{\partial K}}{K^2\frac{d^2C}{dK^2}} \end{align}$

Thus, we get the Dupire’s formula

$\sigma^2(t,S)=2\frac{\frac{dC}{dT}+qC+(r-q)K\frac{\partial C}{\partial K}}{K^2\frac{d^2C}{dK^2}}|_{K=S, T=t}$

For a known volatility function $\sigma(t, S)$, we can use the Dupire’s formula to price all European call options on $S_0$

$\frac{dC}{dT}+(r-q)K\frac{\partial C}{\partial K}-\frac{1}{2}K^2\sigma^2(t=T,S=K)\frac{d^2C}{dK^2}=-qC$

with initial condition $C(T=0,K)=(S_0-K)^+$

Unit 2.3：Positivity of the right hand side of the Dupire formula

Consider the second derivative in the denominator

$\frac{d^2C}{dK^2}=\lim_{\epsilon\to 0}\frac{C(T,K-\epsilon)-2C(T,K)+C(T,K+\epsilon)}{\epsilon^2}$

The right hand side can be thought of a Butterfly spread

As $\epsilon \to 0$, the butterfly spread (more narrow and more much higher) converges to a Dirac-delta function which is either 0 or strictly positive and thus, it’s price must be positive, thus $\frac{d^2C}{dK^2}>0$

The numerator in the Dupire’s formula can be re-written as

$e^{-qT}\frac{d}{dT}(e^{qT}C(T,Ke^{(r-q)T}))|_{K=S}$

For the above to be positive, call option with a strike which is a fixed proportion $K$ of the forward $F_T=Se^{(r-q)T}$ must be an increasing function of $T$

For $T_1 \leq T_2$

$e^{qT_1}C(T_1,KF_{T_1})\leq e^{qT_2}C(T_2,KF_{T_2})$

In order to prove it, we suppose that the monotonicity condition is violated, that is

$e^{qT_1}C(T_1,KF_{T_1})> e^{qT_2}C(T_2,KF_{T_2})$

Set up the following strategy:

Buy one option of maturity $T_2$, strike $KF_{T_2}$
Sell $e^{–q(T2–T1)}$ options of maturity $T_1$, strike $KF_{T1}$

At inception, setting up the above strategy pays a premium ( > 0 )

At $T_1$, take the following $\Delta$ position on S:

If $S_{T_1}<KF_{T1}$ : $\Delta =0$
If $S_{T_1}>KF_{T1}$ : $\Delta =-1$

At $T_2$ , the P&L comprises of the following positions

receipt of payoff from $T_2$ maturity option
payment of payoff from $T_1$ maturity option, capitalized up to $T_2$
P&L generated by the delta position created at $T_1$ and liquidated at $T_2$

For $S_{T_1}^*=\frac{F_{T_2}}{F_{T_1}}S_{T_1}$ , the P&L is written as

$\begin{align} &T_2\ Option\ at\ T_2 \qquad\ \qquad T_1\ Option\ at\ T_2 \qquad\qquad\ \quad Delta\ portfolio\\ &\overbrace{(S_{T_2}-KF_{T_2})^+}-\overbrace{e^{r(T_2-T_1)}e^{-q(T_2-T_1)}(S_{T_1}-KF_{T_1})^+}+\overbrace{\Delta(S_{T_2}-\frac{F_{T_2}}{F_{T_1}}S_{T_1})}\\ &=(S_{T_2}-KF_{T_2})^+-[\frac{F_{T_2}}{F_{T_1}}(S_{T_1}-KF_{T_1})^++1_{S_{T_1}>KF_{T_1}}(S_{T_2}-\frac{F_{T_2}}{F_{T_1}}S_{T_1})]\\ &=(S_{T_2}-KF_{T_2})^+-[(S_{T_1}^*-KF_{T_2})^++1_{S_{T_1}^*>KF_{T_2}}(S_{T_2}-S_{T_1}^*)] \end{align}$

The last equation reads as

$f(S_{T_2})-[f(S_{T_1}^*)+(S_{T_2}-S_{T_1}^*)\frac{df}{dx}(S_{T_1}^*)]$

for $f(x)=(x-KF_{T_2})^+$

$f(y)-f(x)>(y-x)f(x) when f’’(x)>0$

Thus, our strategy produces positive amount at inception and at $T_2$. Therefore, there’s an arbitrage.

Thus, the numerator in the Dupire’s formula is always positive.

Unit 2.4 Further aspects of the Dupire formula and local volatility

To compute the Dupire’s local volatility function in terms of implied volatility $\hat\sigma(T,K)$, first write

$C(S_0,T,K)=C^{BS}(S_0,T,K,\hat\sigma(T,K))$

Next, define

forward price as $F_t=S_0e^{(r-q)(t-t_0)}$
log strike as $y=ln(\frac{K}{F_t})$
parameterized volatility as $f(t,y)=(t-t_0)\hat\sigma^2(t,K)$

Next, replace $C(S_0,T,K)$ in the Dupire’s formula with $C^{BS}(S_0,T,K,\hat\sigma(T,K))$ and compute derivatives using $f$ and $y$ rather than $\hat\sigma$and $K$

$\begin{align} C^{BS}(F_t,y,f)&=F_tN(d_1)-KN(d_2)\\ &=F_t(N(\frac{-y}{\sqrt{f}}+\frac{\sqrt{f}}{2})-e^{y}N(\frac{-y}{\sqrt{f}}-\frac{\sqrt{f}}{2})) \end{align}$

【Method 1】

这个方法需要求 $\frac{\partial f}{\partial K}$, 此处省略了, 计算比较复杂,以下$C$为$C^{BS}$

With

$\frac{\partial C}{\partial K}=\frac{\partial C}{\partial y}\frac{\partial y}{\partial K}+\frac{\partial C}{\partial f}\frac{\partial f}{\partial K}=\frac{1}{K}\frac{\partial C}{\partial y}+\frac{\partial C}{\partial f}\frac{\partial f}{\partial K}$ $\begin{align} \frac{\partial^2 C}{\partial K^2}&=-\frac{1}{K^2}\frac{\partial C}{\partial y}+\frac{1}{K}\frac{\partial}{\partial K}(\frac{\partial C}{\partial y})+\frac{\partial}{\partial K}(\frac{\partial C}{\partial f})\frac{\partial f}{\partial K}+\frac{\partial C}{\partial f}\frac{\partial^2 f}{\partial K^2}\\ &=-\frac{1}{K^2}\frac{\partial C}{\partial y}+\frac{1}{K}(\frac{1}{K}\frac{\partial^2 C}{\partial y^2}+\frac{\partial f}{\partial K}\frac{\partial^2 C}{\partial y\partial f})...\\ &...+\frac{\partial f}{\partial K}(\frac{1}{K}\frac{\partial^2 C}{\partial y\partial f}+\frac{\partial f}{\partial K}\frac{\partial^2 C}{\partial f^2})+\frac{\partial C}{\partial f}\frac{\partial^2 f}{\partial K^2}\\ &=\frac{1}{K^2}(\frac{\partial^2 C}{\partial y^2}-\frac{\partial C}{\partial y})+\frac{2}{K}\frac{\partial f}{\partial K}\frac{\partial^2 C}{\partial y\partial f}+(\frac{\partial f}{\partial K})^2\frac{\partial^2 C}{\partial f^2}+\frac{\partial C}{\partial f}\frac{\partial^2 f}{\partial K^2} \end{align}$

【Hint】

$\begin{align} \frac{\partial}{\partial K}(\frac{\partial C}{\partial y})&=\frac{\partial}{\partial y}(\frac{\partial C}{\partial y})\frac{\partial y}{\partial K}+\frac{\partial}{\partial f}(\frac{\partial C}{\partial y})\frac{\partial f}{\partial K}\\ &=\frac{1}{K}\frac{\partial^2 C}{\partial y^2}+\frac{\partial f}{\partial K}\frac{\partial^2 C}{\partial y\partial f} \end{align}$ $\begin{align} \frac{\partial}{\partial K}(\frac{\partial C}{\partial f})&=\frac{\partial}{\partial y}(\frac{\partial C}{\partial f})\frac{\partial y}{\partial K}+\frac{\partial}{\partial f}(\frac{\partial C}{\partial f})\frac{\partial f}{\partial K}\\ &=\frac{1}{K}\frac{\partial^2 C}{\partial y\partial f}+\frac{\partial f}{\partial K}\frac{\partial^2 C}{\partial f^2} \end{align}$

Inserting these equations into

$\begin{align} \frac{dC}{dt}=\frac{1}{2}K^2\sigma^2(t,S)\frac{d^2C}{dK^2}-(r-q)K\frac{\partial C}{\partial K}-qC \end{align}$

【Method 2】

以下$c$ 为 $C^{BS}$

Step 1

按Method 1同样的方法, 但此时C考虑的是K与f不相关,与t相关

$\frac{\partial C}{\partial K}=\frac{\partial C}{\partial y}\frac{\partial y}{\partial K}=\frac{1}{K}\frac{\partial C}{\partial y}$ $\begin{align} \frac{\partial^2 C}{\partial K^2}&=-\frac{1}{K^2}\frac{\partial C}{\partial y}+\frac{1}{K}\frac{\partial}{\partial K}(\frac{\partial C}{\partial y})\\ &=-\frac{1}{K^2}\frac{\partial C}{\partial y}+\frac{1}{K}(\frac{1}{K}\frac{\partial^2 C}{\partial y^2}+\frac{\partial f}{\partial K}\frac{\partial^2 C}{\partial y\partial f})\\ &=\frac{1}{K^2}(\frac{\partial^2 C}{\partial y^2}-\frac{\partial C}{\partial y}) \end{align}$

Since $K=S_0e^{(r-q)(t-t_0)}$ so that $\frac{\partial K}{\partial t}=K(r-q)=K\mu$. And $\frac{\partial y}{\partial K}=\frac{1}{K}$

$\frac{\partial C}{\partial t}=\frac{\partial C}{\partial t}+\frac{\partial C}{\partial y}\frac{\partial y}{\partial K}K\mu=\frac{\partial C}{\partial t}+\frac{\partial C}{\partial y}\mu$

we substitute into Equation

$\begin{align} \frac{dC}{dt}=\frac{1}{2}K^2\sigma^2(t,S)\frac{d^2C}{dK^2}+(r-q)(C-K\frac{\partial C}{\partial K}) \end{align}$

因为考虑的是forward，不用把C折现，所以不减去rC

We get

$\begin{align} \frac{\partial C}{\partial t}+\frac{\partial C}{\partial y}\mu&=\frac{1}{2}\sigma^2(\frac{\partial^2 C}{\partial y^2}-\frac{\partial C}{\partial y})+\mu(C-\frac{\partial C}{\partial y})\\ \frac{\partial C}{\partial t}&=\frac{1}{2}\sigma^2(\frac{\partial^2 C}{\partial y^2}-\frac{\partial C}{\partial y})+\mu C \end{align}$

Step 2

By taking derivatives of the Black-Scholes formula, we obtain

这个部分我还没推过，但在More的第二个连接里有过程

$\frac{\partial^2c}{\partial f^2}=(-\frac{1}{8}-\frac{1}{2f}+\frac{y^2}{2f^2})\frac{\partial c}{\partial f}$ $\frac{\partial^2 c}{\partial y\partial f}=(\frac{1}{2}-\frac{y}{f})\frac{\partial c}{\partial f}$ $\frac{\partial^2 c}{\partial y^2}-\frac{\partial c}{\partial y}=2\frac{\partial c}{\partial f}$

and we have

$\frac{\partial C}{\partial y}=\frac{\partial c}{\partial y}+\frac{\partial c}{\partial f}\frac{\partial f}{\partial y}$ $\begin{align} \frac{\partial^2 C}{\partial y^2}&=\frac{\partial^2 c}{\partial^2 y}+\frac{\partial^2 c}{\partial y\partial f}\frac{\partial f}{\partial y}+(\frac{\partial^2 c}{\partial f\partial y}+\frac{\partial^2 c}{\partial f^2}\frac{\partial f}{\partial y})\frac{\partial f}{\partial y}+\frac{\partial c}{\partial f}\frac{\partial^2 f}{\partial y^2}\\ &=\frac{\partial^2 c}{\partial^2 y}+2\frac{\partial^2 c}{\partial y\partial f}\frac{\partial f}{\partial y}+\frac{\partial^2 c}{\partial f^2}(\frac{\partial f}{\partial y})^2+\frac{\partial c}{\partial f}\frac{\partial^2 f}{\partial y^2} \end{align}$ $\frac{\partial C}{\partial t}=\frac{\partial c}{\partial t}+\frac{\partial c}{\partial f}\frac{\partial f}{\partial t}=(r-q)c+\frac{\partial c}{\partial f}\frac{\partial f}{\partial t}$

Inserting these equations into

$\frac{\partial C}{\partial t}=\frac{\sigma^2}{2}(\frac{\partial^2 C}{\partial y^2}-\frac{\partial C}{\partial y})+(r-q)C$

The local volatility function is then given as

$\sigma^2(t,S)=\frac{\frac{df}{dt}}{(\frac{y}{2f}\frac{df}{dy}-1)^2+\frac{1}{2}\frac{d^2f}{dy^2}-\frac{1}{4}(\frac{1}{4}+\frac{1}{f})(\frac{df}{dy})^2}|_{y=ln(\frac{S}{F_t})}$

As in the markets, the option prices are only available for discrete strikes and maturities, to use the Dupire’s formula, we need to interpolate between the available strikes and maturities, and extrapolate outside the market range.

Issues with Dupire’s Local Volatility Model

The local volatility surface obtained by the Dupire’s formula is very sensitive to the method of data interpolation
The only risky asset used is the underlying asset which does not account for the volatility risk (affected by random shocks)
The calibrated local volatility surface is not stable with the passage of time (in practice the local volatility surface is is changing all the time)
The dynamics of the volatility process assumed by the model does not correspond to that observed in the market

Derivation of Loval Volatility

Dupire Local Volatility

Local volatility modelling

Stochastic Volatility and Local Volatility

CEV Model

Constant Elasticity of Variance (CEV) model is given as

$\frac{dS_t}{S_t}=rdt+\frac{\sigma}{S_t^{1-a}}dW_t$

If $a$ less than one, then we have that non constant local volatility function

Under the risk-neutral measure, by assuming that the forward price is given as $F_t=S_te^{r(T-t)}$, the CEV model can be re-written as

$dF_t=\sigma_0F_t^adW_t,\ 0<a\leq1$

The implied volatility is given by the asymptotic approximation

$\sigma^{imp}(T,K)=\frac{\sigma_0}{F_m^{(1-a)}}(1+\frac{(1-a)(2+a)}{24}(\frac{F_0-K}{F_m})^2+\frac{(1-a)^2}{24}\frac{\sigma^2_0T}{F_m^{2-2a}}+...)$

with $F_m=\frac{1}{2}(F_0+K)$

Trinomial Pricing Tree

The issue of non-recombining binomial tree is addressed by using a trinomial tree model where the prices are given by

$\hat{S}_{t_{i+1}}=\begin{cases}u\hat{S}_{t_i}&p\\ (1+r\Delta t)\hat{S}_{t_i}&1-p-q\\d\hat{S}_{t_i}&q\end{cases}$

For the trinomial tree to converge to a continuous process, we further impose

$\mathbb{E}[\hat{S}_{t_{i+1}}|\hat{S}_{t_i}]=(1+r\Delta t)\hat{S}_{t_i}$ $Var(\hat{S}_{t_{i+1}}|\hat{S}_{t_i})=\hat{S}_{t_i}^2\sigma^2(t_i,\hat{S}_{t_i})\Delta t$

For a recombining tree (i.e. the number of nodes increase linearly and not exponentially), we need $u d=(1+\Delta t)^2$

The probabilities are given as

$\begin{align} p(t_i,\hat{S}_{t_i})&=\frac{\sigma(t_i,\hat{S}_{t_i})^2\Delta t}{(u-d)(u-1-r\Delta t)}\\ q(t_i,\hat{S}_{t_i})&=\frac{\sigma(t_i,\hat{S}_{t_i})^2\Delta t}{(u-d)(1+r\Delta t-d)} \end{align}$

To ensure the absence of arbitrage in the model, we need that $p + q\leq 1$ whic is ensured if

$1 + r\Delta t - d > \sigma(t_i, \hat S_{t_i}) \sqrt{∆t}$

Such that the above inequality is always satisfied, we choose

$d=1+r\Delta t-\bar\sigma\sqrt{\Delta t}$

where $\sigma > \sigma(t, S)$ for all the nodes in the trinomial tree

By choosing $\bar\sigma$ sufficiently large, in the limit, we obtain the following model which does not admit arbitrage

$\frac{dS_t}{S_t}=rdt+\min(\bar\sigma,\sigma(t,S))dW_t$

Unit 2.5：Stochastic Volatility

For local volatility

$d\sigma(t,S_t)=\frac{\partial}{\partial t}\sigma(t,S_t)dt+\frac{1}{2}\frac{\partial^2}{\partial S^2}\sigma(t,S_t)dS_tdS_t+\frac{\partial}{\partial S}\sigma(t,S_t)dS_t$

The last part show that the randomness of volatility is totally depended on the randomness of stock. However, the empirical data we will find that the randomness in the stock and the randomness in the volatility differ from each other. So the volatility should affected by other sources of randomness.

The class of models where volatility is supposed to be a stochastic process is called stochastic volatility models

$\frac{dS_t}{S_t}=\mu_tdt+\sigma_tdW_t$ $d\sigma_t=a_tdt+b_tdW'_t,\ d\langle W,W'\rangle_t=\rho dt$

Unit 2.6：Pricing Equation

Incomplete Market

In an incomplete market, the risk from volatility (vega) cannot be eliminated by trading only in the underlying asset

We have two sources of randomness. However we only allow trading in the money market account. So the market is incomplete. We can make an incomplete market complete by adding additional assets (like option or future).

We create a self-financing portfolio using two risky assets $S$ and option $C^0$

$\frac{dS_t}{S_t}=\mu_tdt+\sigma_tdW_t$ $d\sigma_t=a_tdt+b_tdW'_t,\ d\langle W,W'\rangle_t=\rho dt$ $\frac{dB_t}{B_t}=rdt$

The coefficients $a_t$, $b_t$ and $C^0$ are functions of ($t, S_t, \sigma$)

Often the volatility process will be chosen as a mean-reverting process

Suppose $V_t$ denotes the value of a self-financing portfolio with $\delta_t$ shares in $S$ and $\omega_t$ units of $C^0$

$dV_t=(V_t-\delta_tS_t-\omega_tC^0_t)rdt+\delta_tdS_t+\omega_tdC^0_t$

By using Ito’s formula, we get

$\begin{align} dC^0_t&=\frac{\partial C^0}{\partial t}dt+\frac{\partial C^0}{\partial S_t}dS_t+\frac{\partial C^0}{\partial \sigma_t}d\sigma_t...\\ &...+\frac{1}{2}\frac{\partial^2 C^0}{\partial S_t^2}(dS_t)^2+\frac{1}{2}\frac{\partial^2 C^0}{\partial \sigma_t^2}(d\sigma_t)^2+\frac{\partial^2 C^0}{\partial S_t\partial \sigma_t}dS_td\sigma_t\\ &=(\frac{\partial C^0}{\partial t}+\frac{1}{2}S_t^2\sigma_t^2\frac{\partial^2 C^0}{\partial S_t^2}+\frac{1}{2}b_t^2\frac{\partial^2 C^0}{\partial \sigma_t^2}+S_t\sigma_tb_t\rho\frac{\partial^2 C^0}{\partial S_t\partial \sigma_t})dt...\\ &...+\frac{\partial C^0}{\partial S_t}dS_t+\frac{\partial C^0}{\partial \sigma_t}d\sigma_t\\ &=\mathscr{L}_tC^0dt+\frac{\partial C^0}{\partial S_t}dS_t+\frac{\partial C^0}{\partial \sigma_t}d\sigma_t \end{align}$

where $\mathscr{L}_t$ is a operator

$dV_t=(V_t-\delta_tS_t-\omega_tC^0_t)rdt+(\delta_t+\omega_t\frac{\partial C^0}{\partial S_t})dS_t+\omega_t\frac{\partial C^0}{\partial \sigma_t}d\sigma_t+\omega_t\mathscr{L}_tC^0dt$

The value of portfolio to match the option price of C

$dV_t\Rightarrow dC_t=\mathscr{L}_tCdt+\frac{\partial C}{\partial S_t}dS_t+\frac{\partial C}{\partial \sigma_t}d\sigma_t$

For the two expressions of dVt to match, we need

$\omega_t=\frac{\frac{\partial C}{\partial \sigma_t}}{\frac{\partial C^0}{\partial \sigma_t}},\ \delta_t=\frac{\partial C}{\partial S_t}-\omega_t\frac{\partial C^0}{\partial S_t}$

Set $V_t=C$ and substitute it to

$\begin{align} \mathscr{L}_tCdt-\frac{\frac{\partial C}{\partial \sigma_t}}{\frac{\partial C^0}{\partial \sigma_t}}\mathscr{L}_tC^0dt&=(V_t-S_t\frac{\partial C}{\partial S_t}+S_t\frac{\frac{\partial C}{\partial \sigma_t}}{\frac{\partial C^0}{\partial \sigma_t}}\frac{\partial C^0}{\partial S_t}-\frac{\frac{\partial C}{\partial \sigma_t}}{\frac{\partial C^0}{\partial \sigma_t}}C^0_t)rdt\\ \mathscr{L}_tC-rC+rS_t\frac{\partial C}{\partial S_t}+&=\frac{\partial C}{\partial \sigma_t}\frac{rS_t\frac{\partial C^0}{\partial S_t}-rC^0_t+\mathscr{L}_tC^0}{\frac{\partial C^0}{\partial \sigma_t}}\\ \end{align}$

In the pricing equation for $C$, there is a factor which depends on $C^0$, which we add to our portfolio to complete the market

Denote

$\lambda(t,S_t,\sigma_t)=-\frac{rS_t\frac{\partial C^0}{\partial S_t}-rC^0_t+\mathscr{L}_tC^0}{\frac{\partial C^0}{\partial \sigma_t}}$

Thus, by argument of creating a self-replicating portfolio, we have shown that the price $C$ of a European option with payoff $h(S_T)$, solves the following equation

$\mathscr{L}_tC-rC+rS_t\frac{\partial C}{\partial S_t}+\lambda(t,S_t,\sigma_t)\frac{\partial C}{\partial \sigma_t}=0,\ C(T,S_T,\sigma_T)=h(S)$

with

$\mathscr{L}_t=\frac{\partial}{\partial t}+\frac{1}{2}S_t^2\sigma_t^2\frac{\partial^2}{\partial S_t^2}+\frac{1}{2}b_t^2\frac{\partial^2}{\partial \sigma_t^2}+S_t\sigma_tb_t\rho\frac{\partial^2}{\partial S_t\partial \sigma_t}$

if chosen $\lambda=0$, the difference of BS-PDE is the last two part of $\mathscr{L}$ with $b_t$

Unit 2.7：Risk Neutral Measure and Risk Premia

Finding the Risk-Neutral Measure

From its definition, under the risk-neutral measure, price $C_t$ of a European satisfies

$dC_t=rC_tdt+\text{ a random term with zero mean}$

Applying the Ito’s formula and using the equation (PDE) in previous part, we get

$\begin{align} dC_t&=\mathscr{L}_tC_tdt+\frac{\partial C_t}{\partial S_t}dS_t+\frac{\partial C_t}{\partial \sigma_t}d\sigma_t\\ &=(rC_t-rS_t\frac{\partial C_t}{\partial S_t}-\lambda(t,S_t,\sigma_t)\frac{\partial C_t}{\partial \sigma_t})dt+\frac{\partial C_t}{\partial S_t}dS_t+\frac{\partial C_t}{\partial \sigma_t}d\sigma_t\\ &=rC_tdt+\frac{\partial C_t}{\partial S_t}(dS_t-rS_tdt)+\frac{\partial C_t}{\partial \sigma_t}(d\sigma_t-\lambda_tdt) \end{align}$

Thus, under the risk-neutral measure (cancels out the $dt$ part in parenthese of the function above), we must have the following

$\frac{dS_t}{S_t}=rdt+\sigma_tdW_t$ $d\sigma_t=\lambda_tdt+b_tdW'_t,\ d\langle W,W'\rangle_t=\rho dt$

Under the risk-neutral measure, the price of a European option is written as

$C(t,s,\sigma)=\mathbb{E}[e^{-rT}h(S_T)|\sigma_t=\sigma,S_t=s]$

For the choice of model under physical measure

$\frac{dS_t}{S_t}=\mu_tdt+\sigma_tdW_t$ $d\sigma_t=\alpha_tdt+b_tdW'_t$

By matching the coefficients, we get

$\mu_t=r+\sigma_t\beta_t,\ \alpha_t=\lambda_t+b_t\phi_t$

where $\beta_t$(market prices of risk) and $\phi_t$ are bounded processes

$dW_t^\mathbb{P}+\theta_tdt=dW_t^\mathbb{Q}$, $\mathbb{P}:$现实测度

Risk Premiums

$\beta_t$ is typically called the risk premium. Analogously, $\phi_t$ is called the risk premium of volatility

$\beta_t=\frac{\mu_t-r}{\sigma_t},\ \phi_t=\frac{\alpha_t-\lambda_t}{b_t}$

$\lambda_t$ is calibrated directly under the risk-neutral measure (chosen by market)

Unit 2.8：Estimation Error

Estimation of Volatility

We could approximate $\sigma_t$ by its average over a period of T estimated using the historical data. However, estimation of historical volatility faces the following issues

Variance of the estimator decreases with T
Bias of the approximation of $\sigma_t$ by average volatility increases with T

Recall that

$dlogS_t=(\mu_t-\frac{1}{2}\sigma_t^2)dt+\sigma_tdW_t$

Thus, the quadratic variation of $log S_t$ is given as

$\langle log S\rangle_t=\int_0^t\sigma_s^2ds$

Recall from the definition of quadratic variation of a random process

$\langle X\rangle_t=\lim_{N\to\infty}\sum_{i=1}^N(X_{\frac{i}{N}t}-X_{\frac{i-1}{N}t})^2$

Denote the log-returns of asset price as

$r(t,h):=log\frac{S_{t+h}}{S_t}$

$dlogS_t=(logS_{t+1}-logS_t)dt=log\frac{S_{t+1}}{S_t}dt$

Then, we have

$\langle log S\rangle_t=\int_0^t\sigma_s^2ds=\lim_{N\to\infty}\sum_{i=1}^Nr(\frac{i-1}{N}t,\frac{t}{N})^2$

Realised Variance

Therefore, we could estimate the integrated variance by the realised variance

$VR_t^N:=\sum_{i=1}^Nr(\frac{i-1}{N}t,\frac{t}{N})^2$

Next, suppose that

$\sigma_t$ is independent of Brownian motion W driving asset prices S
Drift function $\mu$ is deterministic

Then, for a fixed volatility path (由已知数据得出), the log-returns $R_i:=r(\frac{i-1}{N}t,\frac{t}{N})$ are independent Gaussian random variables with variance & mean

$v_i=\int_{\frac{i-1}{N}t}^{\frac{i}{N}t}\sigma_s^2ds,\ m_i=\int_{\frac{i-1}{N}t}^{\frac{i}{N}t}(\mu_s-\frac{\sigma_s^2}{2})ds$

Error in Realised Variance

Thank the Professor Ankush Agarwal and Christian Ewald. They provide the precise process of following.

If we compute thxe expectation of realised variance $VR^N_t$ for a given volatility path $\sigma_t$, we get

$\begin{align} \mathbb{E}_\sigma[VR_t^N]&=\sum_{i=1}^NE_\sigma[R_i^2]=\sum_{i=1}^N(v_i+m_i^2)\\ &=\sum_{i=1}^N\int_{\frac{i-1}{N}t}^{\frac{i}{N}t}\sigma_s^2ds+\sum_{i=1}^Nm_i^2\\ &=\int_0^t\sigma_s^2ds+\sum_{i=1}^Nm_i^2 \end{align}$

$E(x^2)=Var(x)+E(x)^2$

with

$m_i^2=(\int_{\frac{i-1}{N}t}^{\frac{i}{N}t}(\mu_s-\frac{\sigma_s^2}{2})ds)^2 \approx(\mu_{\frac{i-1}{N}t}-\sigma_{\frac{i-1}{N}t}^2/2)^2\frac{t^2}{N^2}$

直接取一个固定值来代替这一段的积分，积分在t时间上的宽度是 t/N

Then

$\begin{align} \sum_{i=1}^Nm_i^2&\approx\sum_{i=1}^N\frac{t^2}{N^2}(\mu_{\frac{i-1}{N}t}-\sigma_{\frac{i-1}{N}t}^2/2)^2\\ &=\frac{t}{N}\sum_{i=1}^N\frac{t}{N}(\mu_{\frac{i-1}{N}t}-\sigma_{\frac{i-1}{N}t}^2/2)^2\\ &\approx\frac{t}{N}\sum_{i=1}^N\int_{\frac{i-1}{N}t}^{\frac{i}{N}t}(\mu_{s}-\frac{\sigma_{s}^2}{2})^2ds\\ &=\frac{t}{N}\int_{0}^{t}(\mu_{s}-\frac{\sigma_{s}^2}{2})^2ds \end{align}$ $\begin{align} &\ (Estimator) (True\ Value) \\ &\quad\ \overbrace{\mathbb{E}_\sigma[VR_t^N]}\approx\overbrace{\int_0^t\sigma_s^2ds}+\frac{t}{N}\int_0^t(\mu_s-\frac{\sigma_s^2}{2})^2ds \end{align}$

Thus, the bias of estimating the integrated variance with realised variance

$Bias:=\mathbb{E}[True\ Value-Estimator]\approx\mathscr{O}(\frac{t}{N})$

Similarly, the variance of $VR^N_t$ for a given volatility path $\sigma_t$ is

$Var(VR_t^N)=E[(VR_t^N-(E[VR_t^N])^2)]=E[(\sum_{i=1}^N(R_i^2-v_i-m_i^2))^2]$

$R_i$ are iid

$E[(\sum_{i=1}^N(R_i^2-v_i-m_i^2))^2]=\sum_{i=1}^NE[(R_i^2-v_i-m_i^2)^2]$

For every i

$\begin{align} E[(R_i^2-v_i-m_i^2)^2]&=E[R_i^4-2v_iR_i^2-2m_i^2R_i^2+v_i^2+2v_im_i^2+m_i^4]\\ &=E[R_i^4]-2(v_i+m_i^2)(v_i+m_i^2)+v_i^2+2v_im_i^2+m_i^4\\ &=E[R_i^4]-v_i^2-2v_im_i^2-m_i^4\\ \end{align}$

With $E[R_i^4]=m_i^4+6m_i^2v_i+3v_i^2$

关于这一定理可以参考如何通俗的理解矩母函数，以及 wiki 中展示的 Order 4 的 Non-central moment

本站更多参考正态分布与矩母函数/)

$E[(R^2-v-m^2)^2]=2v_i^2+4v_im_i^2$

and

$Var(VR_t^N)=\sum_{i=1}^N\mathbb{E}_\sigma[(R_i^2-v_i-m_i^2)^2]=\sum_{i=1}^N(2v_i^2+4v_im_i^2)$

Repeat the same step as above

$\sum_{i=1}^Nv_i^2\approx\frac{t}{N}\int_0^t\sigma_s^4ds$

Also

$\begin{align} \sum_{i=1}^Nv_im_i^2&\approx\sum_{i=1}^N\frac{t}{N}\sigma_{\frac{i-1}{N}t}^2\frac{t^2}{N^2}(\mu_{\frac{i-1}{N}t}-\sigma_{\frac{i-1}{N}t}^2/2)^2\\ &\approx\frac{t^2}{N^2}\sum_{i=1}^N\int_{\frac{i-1}{N}t}^{\frac{i}{N}t}\sigma_s^2(\mu_{s}-\frac{\sigma_{s}^2}{2})^2ds\\ &=\frac{t^2}{N^2}\int_{0}^{t}\sigma^2_s(\mu_{s}-\frac{\sigma_{s}^2}{2})^2ds \end{align}$

Approximately, the variance of $VR^N_t$ for a given volatility path $\sigma_t$ is

$Var(VR_t^N)\approx\frac{2t}{N}\int_0^t\sigma_s^4ds+\frac{4t^2}{N^2}\int_0^t\sigma_s^2(\mu_s-\frac{\sigma_s^2}{2})^2ds$

Next, we note that the root mean squared error (RMSE) of an estimator is given by

$RMSE=\sqrt{Bias+Variance}$

Therefore, the dominant term in the error of realised variance is

$\sqrt{\frac{2t}{N}\int_0^t\sigma_s^4ds}$ $Relative\ error=\frac{\sqrt{\frac{2t}{N}\int_0^t\sigma_s^4ds}}{\int_0^t\sigma_s^2ds}$

If the Relative error is large. We should check out the Greek (sensitive of option price to the Volatility). Insensitive $\to$ no matter that much

Supplymental reading

link

defination

$p_t$ = log-price of asset at time $t$ = $logS_t$
$\Delta$ = fraction of a trading session associated with the implied sampling frequency
$m=1/\Delta$ = number of sampled observations per trading session
$T$ = number of days in the sample $\Rightarrow$ $mT$ total observations
$r_{t-1+j\Delta}=p_{t-1+j\Delta}-p_{t-1+(j-1)\Delta}=log\frac{S_{t-1+j\Delta}}{S_{t-1+(j-1)\Delta}}$
$r_t=r_{t-1+\Delta}+r_{t-1+2\Delta}+…+r_t=log\frac{S_t}{S_{t-1}}$
Realized variance (RV=VR) on day $t$

$RV_t^m=\sum_{j=1}^mr_{t-1+j\Delta}^2$

Realized volatility (RVOL) on day $t$
$RVOL_t^m=\sqrt{RV_t^m}$
Realized log-volatility (RLVOL) :

$RLVOL_t^m=ln(RVOL_t^m)$

Arithematic Brownian Motion (只参考，我们运用的$\mu,\sigma$不是固定的)

$dp(t)=\mu dt+\sigma dW(t)$ $p(t)=p(0)+\mu t+\sigma W(t)$ $r(t,t-1)=p(t)-p(t-1)$

The quadratic variation (QV) of the return process from time $0$ to $t$ is

$[r](t)=\lim_{m\to\infty}\sum_{j=0}^{m-1}(p(t_{j+1})-p(t_j))^2$

Let $p(t)$ be described by the stochastic differential equation

$dp(t)=\mu(t) dt+\sigma(t) dW(t)$ $r(t,t-1)=\int_{t-1}^t\mu(s)ds+\int_{t-1}^t\sigma(s)dW(s)$ $[r](t)=\int_0^t\sigma^2(s)ds$

IV$_t$ denotes integrated variance for day $t$

$QV_T \equiv [r](t)-[r](t-1)=\int_{t-1}^t\sigma^2(s)ds=IV_t$

Example: QV for Wiener process

$dp(t)=dW(t),\ \sigma(t)=1$ $[r](t)=\int_0^t\sigma^2(s)ds=\int_0^tds=t$ $QV_t=\int_{t-1}^t\sigma^2(s)ds=\int_{t-1}^tds=1$

Q1 What does RV estimate?

$RV_t^m \xrightarrow{p} [r](t)-[r](t-1)\equiv QV_t,\ as\ m\to\infty$

That is, daily RV converges in probability to the daily increment in QV.

$var(r(t,t-1)|\mathcal{F}_{t-1})=E[QV_t|\mathcal{F}_{t-1}]=E[RV_t^m|\mathcal{F}_{t-1}]$

ABDL (2003)

$r(t,t-1)\sim N(\int_{t-1}^t\mu(s)ds,IV_t)$

Since $\int^t_{t−1}\mu(s)ds$ is generally very small for daily returns and $RV_t^m$ is a consistent estimator of $IV_t$, for Itô processes daily returns should follow a normal mixture distribution with $RV_t^m$ as the mixing variable. As a result, returns standardized by realized volatility should be standard normal

$r_t/RVOL_t^m \approx N(0,1)$

If there are jumps in $dp (t)$, then $RV_t^m \xrightarrow{p} IV_t$ IVt but returns are no longer conditionally normally distributed.

Define the error

$u_t(\Delta)=RV_t^m-IV_t$

BNS (2001) : For the Ito diffusion model under the assumption that mean and volatility processes are jointly independent of $W(t)$

$\sqrt{m}\frac{u_t(\Delta)}{\sqrt{2\ IQ_t}}=\sqrt{m}\frac{(RV_t^m-IV_t)}{\sqrt{2\ IQ_t}}\xrightarrow{d} N(0,1),\ as\ m\to\infty$

where is the integrated quarticity (IQ)

$IQ_t=\int_{t-1}^t\sigma^4(s)ds$

Hence,

$RV_t^m\sim N(IV_t,\frac{2\ IQ_t}{m})$

IQ$_t$ may be consistently estimated using the following scaled version of realized quarticity (RQ)

$RQ_t^m=\sum_{j=1}^mr_{t-1+j\Delta}^4$ $\frac{m}{3}RQ_t^m\xrightarrow{p}IQ_t,\ as\ m\to\infty$

The feasible asymptotic distribution for $RV_t^m$ is

$RV_t^m\sim N(IV_t,\frac{2}{3}RQ_t^m)$

Unit 2.9：The Heston Model

Heston’s Stochastic Volatility Model

Under the risk-neutral measure, Heston’s stochastic volatility model is given as

$\frac{dS_t}{S_t}=rdt+\sigma_tdW_t$ $\begin{align} &d\sigma_t^2=k(\theta-\sigma_t^2)dt+\delta\sigma_tdW'_t,\ d\langle W,W'\rangle_t=\rho dt\\ or,\ &dv_t=k(\theta-v_t)dt+\delta\sqrt{v_t}dW'_t,\ with\ v_t=\sigma_t^2 \end{align}$

本课的模型都是在风险中性 (鞅) 的测度下，对于有些文章会按现实测度写，但实际是效果都相同

If we have $W_1,\tilde W$ where $\langle W _1,\tilde W\rangle=0$

$W_2=\rho W_1+\sqrt{1-\rho^2}\tilde W$ $\begin{align} d\langle W_1,W_2\rangle&=d\langle W_1,\rho W_1+\sqrt{1-\rho^2}\tilde W\rangle_t\\ &=\rho d\langle W_1,W_1\rangle+\sqrt{1-\rho^2} d\langle W _1,\tilde W\rangle\\ &=\rho dt \end{align}$

The variance process $v_t$ is also known by the name of Cox-Ingersoll-Rubinstein (CIR) process

To model the variance of asset prices, we need $v_t$ to remain positive

（Advantage）For $\frac{2k\theta}{\delta^2}>1$ the process never hits zero. This constraint on parameters is also known as Feller condition

This process is mean reverting

For the square root process

$dv_t=k(\theta-v_t)dt+\delta\sqrt{v_t}dW'_t$

$\theta$ is the long term mean
k is the speed of mean-reversion
$\delta$ is volatility of volatility

First we define the log-price $X_t$ by $S_t=S_0e^{rt+X_t}$

Next, by using the Ito’s formula, we get

$dX_t=-\frac{1}{2}\sigma_t^2dt+\sigma_tdW_t$

European call option price with maturity T and strike K is given as

$C(K)=e^{-rT}\mathbb{E}[(S_0e^{rt+X_T}-K)^+]=S_0\mathbb{E}[(e^{X_T}-\frac{Ke^{-rT}}{S_0})^+]$

For log-strike

$K:=log\frac{Ke^{-rT}}{S_0},\ C(K)=S_0\mathbb{E}[(e^{X_T}-e^k)^+]$

We first find the characteristic function of $X_T$

$f(t,u,x,v):=\mathbb{E}[e^{iuX_T}|X_t=x,v_t=v]$

To compute $C(K)=S_0\mathbb{E}[(e^{X_T}-e^k)^+]$ , we then use the Fourier transform $f (t, u, x, v)$ of $X_T$

European Call Option Price Formula

$C(x,v;T-t,K)=e^xP_1-Ke^{-r(T-t)}P_2, where\ for j=1,2$ $P_j(x,v;T-t,lnK):=\frac{1}{2}+\frac{1}{\pi}\int_0^\infty Re(\frac{e^{-i\phi lnK}f_j(\phi;T-t,x,v)}{i\phi})d\phi$

where $Re(a+ib)=a$, We can only solve the $P_j$ numerical

$f_j(\phi;T-t,x,v):=exp(C_j(T-t,\phi)+D_j(T-t,\phi)+i\phi x)$ $C_j(\tau,\phi):=ri\phi\tau+\frac{a}{\delta^2}((b_j-\rho\delta i\phi+d_j)\tau-2ln\frac{1-g_je^{d_j\tau}}{1-g_j})$ $D_j(\tau,\phi):=\frac{b_j-\rho\delta i\phi+d_j}{\delta^2}(\frac{1-e^{d_j\tau}}{1-g_je^{d_j\tau}})$ $g_j:=\frac{b_j-\rho\delta i\phi+d_j}{b_j-\rho\delta i\phi-d_j}$ $d_j=\sqrt{(b_j-\rho\delta i\phi)^2-\delta^2(2iu_j\phi-\phi^2)}$

Unit 2.10：Introduction to Calibration

Exotic Option

Not traded on the market and $\to$ need to be priced by a model

A forward start option: $(S_T-aS_{T_1})^+$
A cliquet: $\sum_{i=1}^{n-1}(S_{T_{i+1}}-aS_{T_i})^+$
A barrier option: $(S_T-K)^+1_{sup_{t\in[0,T]}S_t\leq B}$

Identify the hedging instruments : liquid products (small bid-ask spread) quoted on a market

we aim at constructing a hedging portfolio of the form

$V_t=\Delta_t^0B_t+\Delta_tS_t+\sum_{i=1}^n\Delta_t^{C,i}Call^i+\sum_{j=1}^m\Delta_t^{P,j}Put^j$

Choose a model

Reproduce the prices of simple instruments quoted on the market. (This step is model calibration)

Then compute the hedging strategies $\Delta^0,\Delta,\Delta^{C,i},\Delta^{P,j}$

General Considerations

Model calibration = determine the model parameters from market data of liquid instruments

Two approaches in model calibration

Statistical parameter estimation
- from historical observations
- done under the physical (market) measure
Calibration
- based on quotes of option prices (typically calls/puts)
- done under the risk-neutral measure
- reflects the market view on the future evolution of the asset

Which price? Bid or Ask?

Typically one calibrates to the mid-price

Model Calibration

In general, a “good” model should have

A parsimonious parametrization and should be intuitive
An efficient numerical calibration method

We want to find $\theta$ such that

$P^{model(\theta)}(T_i,K_j)=P^{market(\theta)}(T_i,K_j)$

or a least squares approach

$\min_{\theta}\sum_{i,j}(P^{model}(T_i,K_j)-P^{market}(T_i,K_j))^2$

Calibration of local volatility models

The inverse problem of finding the local volatility function with finite number of call option prices is then ill posed. To overcome the difficulty, typically two approaches are used

Implied volatility interpolation using either parametric or semi-parametric form
- Non-parametric methods of interpolation, for example, the method of splines(三次样条插值), provide very mediocre performance for such problems
- Implied volatility interpolation methods are very intuitive and fast, but could lead to infeasible volatility surface
Regularisation approach
- Regularisation methods provide the best performance in terms of precision and regularity of the volatility surface

Stochastic Volatility Inspired(SVI) Parametrisation (by Jim Gatheral)

It is a parametrisation of implied volatility as a function of parameters in terms of log-strike $k=log\frac{K}{S}-rT$

In the parametrisation, for a given maturity $T$, the total implied variance $V(T, k) := I^2(T, k)T$ satisfies

$V(T, k)=a+b(\rho(k-m)+\sqrt{(k-m)^2+\sigma^2})$

with

$a\ge 0$: global level of smile
$b\ge 0$: slope of the wings
$\rho\in[-1,1]$: asymmetry (rotation of the smile)
$m\in\mathbb{R}$: translation to the right
$\sigma\ge 0$: at-the-money convexity

Step

For a given maturity $T_i$, we compute the implied volatility $I(T_i, k_j)$ for all the strikes $K_j$ of the liquidly traded call options

$\sigma^2(T,k)=\frac{\frac{\partial V(T,k)}{\partial T}}{(1-\frac{k}{2V}\frac{\partial V}{\partial k})^2+\frac{1}{2}\frac{\partial^2V}{\partial k^2}-\frac{1}{4}(\frac{1}{4}+\frac{1}{V})(\frac{\partial V}{d k})^2}$

Then using an optimisation procedure, for example, least-squares, we compute the optimal values for parameters $a,b,\rho,m$ and $\sigma$ for maturity $T_i$
- The parameters are estimated for each maturity separately
For the intermediate maturities without liquidly traded options, the implied volatility can be computed using linear interpolation and the SVI parameters can be computed

Absence of Arbitrage

Due to the interpolation procedure, SVI parametrisation can lead to arbitrage opportunities in the model
- Option price function $P(T, k)$ which is not convex with respect to $k$
- $\frac{\partial^2P(T,k)}{\partial k^2}<0$

To ensure the absence of arbitrage: $\lim_{k\to\infty}C^{BS}(T,k)=0$
The absence of arbitrage of the “calendar spread” type: $\frac{\partial V(T,k)}{\partial T}\ge 0$
- The above condition implies that the total implied variance is an increasing function with respect to maturity. This is equivalent to non-crossing of the curves of total implied variance for different maturities
The absence of arbitrage of the “butterfly” type
- The condition leads to the convexity of option prices with respect to $k$

$d(T,k)=(1-\frac{k}{2V}\frac{\partial V}{\partial k})^2+\frac{1}{2}\frac{\partial^2V}{\partial k^2}-\frac{1}{4}(\frac{1}{4}+\frac{1}{V})(\frac{\partial V}{d k})^2\ge 0$

Regularisation Approach

The calibration of volatility surface is posed as an optimisation problem using Tikhonov regularisation

Minimise the functional

$J(\sigma):=\sum_{i,j=1}^{n,m}w_{ij}(C(T_i,K_j,\sigma)-C^{market}(T_i,K_j))^2+\alpha||\nabla\sigma||_2^2$ $||\nabla\sigma||_2^2=\int_{K_{min}}^{K_{max}}dK\int_{T_{min}}^{T_{max}}dT((\frac{\partial\sigma}{\partial K})^2+(\frac{\partial\sigma}{\partial T})^2)$

To minimise $J(\sigma)$, we use, for example, the method of gradient descent or the method of conjugate gradient

Typical algortihm

Discretise local volatility function on a mesh of space and time
Make an initial guess $\sigma^{(0)}$,for example, a constant function
Evaluate the gradient of $J(\sigma)$ at point $\sigma^{(0)}$to calculate the direction of descent $\lambda^{(0)}$
Solve the one-dimensional minimisation problem

$h^*=arg\min_{h}J(\sigma^{(0)}+h\lambda^{(0)})$

Set $\sigma^{(1)}:=\sigma^{(0)}+h^*\lambda^{(0)}$ and iterate again

The local volatility function is discretised using different methods, for example, finite differences or trinomial tree model

The regularisation using $||\nabla\sigma||$ is the same in regions of the surface where the influence of volatility on the price is very high and in the regions where it is very small

Unit 3.1：Monte Carlo methods

Readings for the module on Monte Carlo Methods

Chapter 3 Section 3.1, Section 3.2 - Monte Carlo Methods in Financial Engineering by Paul Glasserman
Chapter 6 Section 6.1 - Monte Carlo Methods in Financial Engineering by Paul Glasserman

Unit 3.1.1：Introduction

Monte Carlo methods are preferable in high dimensions as their accuracy is independent of dimensionality

If f is integrable over [0, 1], then by strong law of large numbers (SLLN)

大数定律的定义是，当随机事件发生的次数足够多时，随机事件发生的频率趋近于预期的概率。可以简单理解为样本数量越多，其平概率越接近于期望值。大数定律的条件：1、独立重复事件；2、重复次数足够多。

for

$\alpha=\int_0^1f(x)dx=\mathbb{E}[f(U)]$ $\hat{\alpha}_N=\frac{1}{N}\sum_{i=1}^Nf(U_i),U_i\sim U$ $\hat{\alpha}_N\to \alpha\ w.p.\ 1\ as\ N\to\infty$

w.p. 1 (with probability 1)

If f is also square integrable, for

$\sigma_f^2=\int_0^1(f(x)-\alpha)^2dx=\mathbb{E}[(f(U)-\alpha)^2]$

the error in the estimator $\hat\alpha_N$ is approximately normally distributed with mean 0 and standard deviation $\frac{\sigma_f}{\sqrt{N}}$

Typically $\sigma_f$ is unknown and is estimated as

$s_f=\sqrt{\frac{1}{N-1}\sum_{i=1}^N(f(U_i)-\hat\alpha_N)^2}$

Monte Carlo Estimation for European Option

Monte Carlo samples $S_i(T)$ of $S(T)$ can be generated as

$S_i(T)=S_0exp((r-\frac{1}{2}\sigma^2)T+\sigma\sqrt{T}Z_i)$

where $Z_i\sim N(0,1)$

用 $\sqrt TZ_i$ 代替 $W_t$ 可以减少 noise

Call price estimator is given as

$\hat C_N=\frac1N\sum_{i=1}^Ne^{-rT}(S_i(T)-K)^+$

The estimator $\hat C_N$ is unbiased in the sense that

$\mathbb{E}[\hat C_N]=\mathbb{E}[\frac1N\sum_{i=1}^Ne^{-rT}(S_i(T)-K)^+]=\mathbb{E}[\frac1N\sum_{i=1}^NC_i]=C(T,K)$

Unit 3.1.2：Efficiency of Monte Carlo methods

Confidence Interval

Using estimator $\hat C_N$ makes sense with a confidence interval

$[\hat C_N-z_{\delta/2}\frac{s_C}{\sqrt{N}},\hat C_N+z_{\delta/2}\frac{s_C}{\sqrt{N}}]$

where

$s_C=\sqrt{\frac{1}{N-1}\sum_{i=1}^N(C_i-\hat C_N)^2}$

$2z_{\delta/2}\frac{s_C}{\sqrt{N}}$ is called the width of the confidence interval
$z_\delta$ denotes the $1-\delta$ quantile of the standard normal distribution
- $\mathbb{P}(Z\leq z_\delta)=1-\delta$ for $Z\sim N(0,1)$
- For $\delta = 0.05, z_\delta/2 \approx 1.96$, we get a 95% confidence interval

Central limit theorem (CLT)

With unbiased estimator $\hat C_N=\frac1N\sum_{i=1}^NC_i$ for iid samples $C_i$, $\mathbb{E}[C_i]=C$ and $Var(C_i)=\sigma_C^2<\infty$

$\frac{\hat C_N-C}{\sigma_C/\sqrt{N}}\sim N(0,1)$

$\sigma_C$ can only be estimated

Suppose it takes $\tau$ time to generate $C_i$. Given computational budget $\Gamma$, can generate $\Gamma/\tau$ samples and create estimator $\hat C_{\Gamma/\tau}$

$\sqrt{\Gamma/\tau}(\hat C_{\Gamma/\tau}-C)\sim N(0,\sigma_C^2)$
as $\Gamma\to\infty$, $\sqrt{\Gamma}(\hat C_{\Gamma/\tau}-C)\sim N(0,\sigma_C^2\tau)$
Given a choice, we should use an estimator with lower value of $\sigma_C^2\tau$

Mean Square Error

For an estimator $\hat\alpha$ of true value $\alpha$

$MSE(\hat\alpha)=\mathbb{E}[(\hat\alpha-\alpha)^2]=(\mathbb{E}\hat\alpha-\alpha)^2+\mathbb{E}[(\hat\alpha-\mathbb{E}\hat\alpha)^2]=Bias^2(\hat\alpha)+Variance(\hat\alpha)$

两边可以尝试配平

Landau’s notation: For parameter $h\to 0$

$f(h)=\mathscr{O}(h^n)\Rightarrow |f(h)|\leq Mh^n$ $f(h)=o(h^n)\Rightarrow \lim_{h\to 0}f(h)/h^n \to 0$

类似于算法复杂度，大 O 是上界，小 o 是下届

Unit 3.1.3：Euler Method

Consider an ordinary differential equation (ODE) in time interval [0, 1]

$\frac{dX(t)}{dt}=\dot x(t)=b(t,x(t)),\ x(0)=x_0$

Divide interval into $N$, $h=1/N$

The above ODE can be solved approximately using an Euler scheme

$x_0^h=x_0,\ x_n^h=x_{n-1}^h+b(t_{n-1}^h,x_{n-1}^h)h$

where

$x_n^h:=x^h(t_n^h),\ t_n^h:=nh,\ h=\frac1N$

Error criterion $e(h):=|x(1)-x^h(1)|$
Under suitable assumptions $e(h)=\mathscr{O}(h)=\mathscr{O}(1/N)$

Euler Method更多解释可参考数值分析的部分

Euler Discretisation for BM

with $h=\frac TN,t_n^h:=nh,\Delta W_n^h:=W(t_n^h)-W(t_{n-1}^h)$, the Euler scheme for BM is given as

$W_n^h:=W(t_n^h)=\Delta W_1^h+...+\Delta W_n^h$

Note that $\Delta W_n^h$ are iid increments which are normally distributed $N(0,h)$

Euler Method for SDEs

$dX(t)=b(t,X(t))dt+\sigma(t,X(t))dW_t$

The Euler scheme for $X(t)$ is given as $X^h(0)=x_0$

$X_n^h:=X(t_n^h)=X_{n-1}^h+b(t_{n-1}^h,X_{n-1}^h)h+\sigma(t_{n-1}^h,X_{n-1}^h)\Delta W_n^h$

If we wish to price options based on the path of the SDE, for example, Asian option with price

$\mathbb{E}[e^{-rT}(\frac1M\sum_{j=1}^MS(t_j)-K)^+]$

we need a good approximation of the sample path of $X$ which leads to the error criterion (strong error)

$e_s(h):=\mathbb E|X(T)-X_N^h(T)|$

If we wish to price options based only on the terminal value of the SDE, for example, European option with price

$\mathbb{E}[e^{-rT}(S(T)-K)^+]$

we need a good approximation of the distribution of $X(T)$ which leads to the error criterion (weak error)

$e_w(h):=|\mathbb E_gX(T)-\mathbb E_gX_N^h(T)|$

for smooth functions g

光滑函数（smooth function）:在其定义域内无穷阶数连续可导的函数

g 可以取 $(S(T)-K)^+$

We say that $X^h$ converges to $X$ with order $\beta > 0$

strongly with order $\beta$ if $e_s(h)=\mathscr O(h^\beta)$
weakly with order $\beta$ if $e_w(h)=\mathscr O(h^\beta)$

Under some regularity conditions on functions $b$ and $\sigma$, Euler scheme converges strongly with order $\beta = 1/2$ (Euler) and weakly with order $\beta = 1$ (Milstein)

当 $\beta$ 数越大，收敛速度越快，越准确

Unit 3.1.4：Milstein Scheme

Consider the SDE

$X(t_{n-1}+h)-X(t_{n-1})=\int_{t_{n-1}}^{t_{n-1}+h}b(s,X(s))ds+\int_{t_{n-1}}^{t_{n-1}+h}\sigma(s,X(s))dW(s)$

and its Euler scheme approximation

$X_n^h-X_{n-1}^h=b(t_{n-1}^h,X_{n-1}^h)h+\sigma(t_{n-1}^h,X_{n-1}^h)\Delta W_n^h$

Thus, the Euler scheme approximates

$\int_{t_{n-1}}^{t_{n-1}+h}\sigma(s,X(s))dW(s)\sim\sigma(t_{n-1}^h,X_{n-1}^h)\Delta W_n^h$

Milstein scheme is an improvement on the Euler scheme. The main idea is to approximate the stochastic integral in a better way

Let us estimate the error in Euler scheme by applying Itô’s formula on $\sigma(t,X(t))$. Suppose $t_{n-1}=0$

$\begin{align} d(\sigma(s,X(s)))&=\partial_t\sigma ds+\partial_x\sigma dX(s)+\frac12\partial_x^2\sigma dX(s)dX(s)\\ &=\partial_t\sigma ds+\partial_x\sigma (bds+\sigma dW(s))+\frac12\partial_x^2\sigma\sigma^2 ds \end{align}$

Then

$\begin{align} &\int_0^h\sigma(t,X(t))dW(t)-\sigma(0,X(0))W(h)\\ =&\int_0^h(\sigma(t,X(t))-\sigma(0,X(0)))dW(t) \end{align}$

Combined with

$\int_0^hd(\sigma(s,X(s)))=\sigma(t,X(t))-\sigma(0,X(0))$

So we have

$\begin{align} \int_0^h\int_0^t[\partial_t\sigma+\partial_x\sigma b+\frac12\partial_x^2\sigma\sigma^2]dsdW(t)+\int_0^h\int_0^t\partial_x\sigma\sigma dW(s)dW(t)\\ \sim\mathscr{O}(h^{3/2})+\partial_x\sigma(0,X(0))\sigma(0,X(0))\int_0^h\int_0^tdW(s)dW(t) \end{align}$

$W(h)\sim o(h^{1/2})$，$\sigma$ 采用 0 点时

$\int_0^h\int_0^tdW(s)dW(t)=\int_0^hW(t)dW(t)=\frac12(W^2(h)-h)$

这一步过程看定义

Thus, the Milstein scheme is given as $X^h(0)=x_0$

$X_n^h=X(t_n^h)=X_{n-1}^h+bh+\sigma\Delta W_n^h+\frac12\partial_x\sigma\sigma((\Delta W_n^h)^2-h)$

如果 payoff 只取决于最终的价格，两种方法不明显，如果取决于path，Milstein 会明显提高

Unit 3.2：Variance reduction methods

Readings for the module on variance reduction methods

Chapter 4 Section 4.1, 4.2 - Monte Carlo Methods in Financial Engineering by Paul Glasserman

Unit 3.2.1：Control variate method

Introduction

In financial markets and the insurance industry, most quantities of interest can be expressed as

$\alpha=\mathbb{E}[\psi(x)],\ p=\mathbb{P}(x\in A)$

Suppose the goal is to compute $p$. Then, use the Monte Carlo method:

Generate $(X_i)_{1\leq i\leq N}$ i.i.d. samples of $X$ & set
$\hat p_N:=(N)^{-1}\sum_{i=1}^N1_{X_i\in A}$
By the Central Limit Theorem (CLT)
$\sqrt{N}(\hat p_N-\mathbb{P}(x\in A))\Rightarrow N(0,p(1-p))$

binomial 分布, p 是存在的概率
95% confidence interval
$(\hat p_N-1.96\sqrt{\frac{p(1-p)}{N}},\hat p_N+1.96\sqrt{\frac{p(1-p)}{N}})$

The relative error is given as

$\frac{Estimator\ error}{True\ Value}=\sqrt{p(1-p)}/(p\sqrt{N})\approx(\sqrt{Np})^{-1}$

which is large for small value of $p$

Thus, to estimate small $p$, using the crude Monte Carlo method, we may need a large number of samples $N$

如果 $p = 0.01 (10^{-2})$, $N$ 至少 $=10^6$ 才使得 relative error $=(\sqrt{10^4})^{-1}=10^{-2}$

As discussed in the last lecture, the estimator error is given by $s_p/\sqrt{N}$ where $s_p$ is the estimate of standard deviation of estimator $\hat p_N$

The aim of variance reduction methods is to produce an alternative estimator $\hat p_N^{var}$ such that $\mathbb{E}[\hat p_N^{var}]=p$ but with smaller variance than estimator $\hat p_N$

Control Variates

Let $Y_1, Y_2, . . . , Y_N$ be i.i.d. outputs from $N$ replications of a simulation. For example, $Y_i$ could be the discounted payoff of a derivative security on the $i$th simulated path.

Suppose that our aim is to compute $\alpha=\mathbb{E}[Y_i]$ and the crude Monte Carlo estimator $\alpha_N:=\sum_{i=1}^N\frac{Y_i}N$ is unbiased and converges with probability (w.p.) 1 as $N\to\infty$

Suppose on each replication, we generate another output $X_i$ along with $Y_i$

Suppose that $(X_i,Y_i)$ are i.i.d. $i=1,…,N$ and that $\alpha_0=\mathbb{E}[X_i]$ is known

Furthermore, denote $(X, Y)$ as a generic pair of $(X_i,Y_i)$

For fixed $b$, calculate

$Y_i(b):=Y_i-b(X_i-\alpha_0)$

and compute the sample mean

$\hat\alpha_N(b):=\hat\alpha_N-b(N_{-1}\sum_{i=1}^NX_i-\alpha_0)=\frac1N\sum_{i=1}^NY_i(b)$

$\hat\alpha_N(b)$ is a control variate estimator and the observable error $N^{-1}\sum_{i=1}^NX_i-\alpha_0$ acts as a control in estimating $\alpha$

The control variate estimator is unbiased

$\mathbb{E}[\hat\alpha_N(b)]=\frac1N\sum_{i=1}^N\mathbb{E}[Y_i(b)]=\alpha$ $\mathbb{E}[\hat\alpha_N(b)]=\mathbb{E}[\hat\alpha_N]-\mathbb{E}[b(N_{-1}\sum_{i=1}^NX_i-\alpha_0)]=\alpha-(\alpha_0-\alpha_0)$

$\hat\alpha_N(b)$ is also strongly consistent

$Var(\hat\alpha_N(b))=\frac{Var(\sum_{i=1}^NY_i(b))}{N^2}=\frac{N\sigma^2(b)}{N^2}=\frac{\sigma^2(b)}{N}$

Each $Y_i(b)$ has variance which is equal to

$\begin{align} Var(Y_i(b))&=Var(Y_i-b(X_i-\alpha_0))\\ &=\sigma_Y^2-2b\sigma_X\sigma_Y\rho_{XY}+b^2\sigma_X^2\equiv\sigma^2(b) \end{align}$ $\sigma_X^2=Var(X),\sigma_Y^2=Var(Y),\rho_{XY}=Correl(X,Y)$ $Var(\hat\alpha_N(b))=\frac{\sigma^2(b)}N,Var(\hat\alpha_N)=\frac{\sigma^2_Y}N$

$\hat\alpha_N(b)$ provides variance reduction if

$b\sigma_X<2\sigma_Y\rho_{XY}$

Optimal coefficient $b^$ *minimizes $\sigma^2(b)$ and is given as

$b^*=\frac{\sigma_Y\rho_{XY}}{\sigma_X}=\frac{Cov(X,Y)}{Var(X)}$

The ratio of the variance of optimally controlled estimator to that of uncontrolled

$\frac{Var(\hat\alpha_N(b^*))}{Var(\hat\alpha_N)}=1-\rho_{XY}^2$

The variance reduction factor is given as

$\frac{Var(\hat\alpha_N)}{Var(\hat\alpha_N(b^*))}=\frac{1}{1-\rho_{XY}^2}$

$Var(\hat\alpha_N)\times(1-\rho_{XY}^2)={Var(\hat\alpha_N(b^*))}$ 如果要达到原来的值，需要用 $N/(1-\rho_{XY}^2)$ 个样本

For example, for $|\rho_{XY}| = 0.95$, leads to 10-times variance reduction, whereas $|\rho_{XY}| = 0.90$, leads to 5-times variance reduction (usually 5)

Computing $b^*$

As $\alpha=\mathbb E[Y]$ is unknown, in practice, $\sigma_Y$ or $\rho_{XY}$ are also unknown

Estimate $b^*$ as

$\hat b_N=\frac{\sum_{i=1}^N(X_i-\frac{\sum_{j=1}^NX_j}N)(Y_i-\frac{\sum_{j=1}^NY_j}N)}{\sum_{i=1}^N(X_i-\frac{\sum_{j=1}^N X_j}N)^2}$

Then, instead of $Y_i(b)$, we use

$Y_i(\hat b_N):=Y_i-\hat b_N(X_i-\alpha_0)$

Replacing $b$ with $\hat b_N$ introduces bias as the samples $Y_i(\hat b_N)$ are not independent any more, thus, application of CLT is not clear

Estimate $\alpha$ as $\hat\alpha_N(\hat b_N)=\frac1N\sum_{i=1}^NY_i(\hat b_N)$

Unit 3.2.2：Application of control variate estimator

Using Underlying Assets

Consider the Black-Scholes model

$\frac{dS_t}{S_t}=rdt+\sigma dW_t,\ S_0=s_0$

Suggest a Monte Carlo control variate method to estimate the price of option

Asian Options

An arithmetic Asian call option with discounted payoff $e^{-rT}(\bar S_m^A(T)-K)^+$ where

$\bar S_m^A(T):=\frac{1}{m}\sum_{j=1}^mS_{t_j},\ t_j=j\ \frac Tm,\ j=1,...,m$

we have a closed-form formula for geometric Asian call option with discounted payoff $e^{-rT}(\bar S_m^G(T)-K)^+$

$\bar S_m^G(T):=(\prod_{j=1}^mS_{t_j})^{1/m},\ t_j=j\ \frac Tm,\ j=1,...,m$

Unit 3.2.3：Antithetic variate method

Method of antithetic variates attempts to reduce variance by introducing negative dependence between pair of replications

Note that if $U$ is uniformly distributed over [0, 1] then so is $1 – U$
- $U_i$ and $1 – U_i$ form an antithetic pair in the sense that a big value of one is accompanied by a small value of the other
- Thus, a large value of the function value computed on one path is compensated by the small value on the antithetic path, leading to reduction in variance
If $Z$ is distributed as $N (0, 1)$ then so is $–Z$
- Suppose we generate a path $Y_i$ using a sample $Z_1, Z_2 , . . . , Z_n$ of normally distributed random variable, we can create an antithetic path $\tilde Y_i$ using the sample $–Z_1, –Z_2, . . . , –Z_n$
- Suppose we generate N antithetic pairs $(Y_i, \tilde Y_i )^N_{i=1}$
  $\hat Y_N^{AV}=\frac{1}{2N}(\sum_{i=1}^NY_i+\sum_{i=1}^N\tilde Y_i )=\frac{1}{N}\sum_{i=1}^N\frac12(Y_i+\tilde Y_i )$
- The confidence interval is given as
  $(\hat Y_N^{AV}-z_{\delta/2}\frac{s_N^{AV}}{\sqrt{N}},\hat Y_N^{AV}+z_{\delta/2}\frac{s_N^{AV}}{\sqrt{N}})$
- Antithetic sampling reduces variance when $Cov(Y_i,\tilde Y_i)<0$

Unit 4.1：Finite difference methods

Unit 4.1.1：Introduction

The finite difference method solves a partial differential equation on a discrete grid in time and space

Consider a general equation as

$\partial_t V+b(x,t)\partial_x V+\frac12 a(x,t)\partial_x^2 V-r(x,t)V+f(x,t)=0$ $V(x,T)=g(x),\ (x,t)\in \Omega\times[0,T)$

$f$ is called the source term
$a, b, r,$ are assumed to be bounded and continuous
$a$ is assumed to be nonnegative

The Black-Scholes equation is given as

$\partial_t V+xr\partial_x V+\frac12 \sigma^2x^2\partial_x^2 V-rV=0$ $V(x,T)=g(x),\ (x,t)\in \mathbb R_+\times[0,T)$

we note that

$b(x,t)=rx,a(x,t)=\sigma^2x^2\ and\ r(x,t)=r$

Change of variable

$z=logx\ and\ v(z,t)=V(e^z,t)$ $v_t=V_t,\ v_z=e^zV_x=xV_x,\ v_{zz}=e^{2z}V_{xx}+e^zV_x=x^2V_{xx}+xV_x$

In the Black-Scholes equation, we then plug

$xV_x=v_z,\ x^2V_{xx}=v_{zz}-v_z$

The transformed Black-Scholes equation with bounded coefficients is then given as

$v_t+(r-\frac12 \sigma^2)v_z+\frac12 \sigma^2v_{zz}-rv=0$ $V(e^z,T)=g(e^z),\ (z,t)\in \mathbb R\times[0,T)$

The above equation has bounded coefficients and is suitable to apply finite difference methods

Unit 4.1.2：Explicit Scheme

Consider the equation

$V_t+\frac12 a(x,t)V_{xx}+f(x,t)=0$ $V(x,T)=g(x),\ (x,t)\in \Omega\times[0,T)$

For positive integer N discretise the domain with time step size $h_0=\frac TN$ and space step size $h_1>0$

Denote by $v_j^k\approx V(jh_1,kh_0)$, $j\in \mathbb Z$, $k=0,1,…,N$

Standard explicit finite difference scheme is written as

$\frac{v_j^k-v_j^{k-1}}{h_0}+\frac12a_j^k\frac{v_{j+1}^k+v_{j-1}^k-2v_j^k}{h_1^2}+f_j^k=0$ $v_j^N=g_j$

In the scheme, we define

$a_j^k:=a(jh_1,kh_0),\ f_j^k:=f(jh_1,kh_0),\ g_j:=g(jh_1)$

The ordered form of the equation is written as

$v_j^{k-1}=(1-\frac{h_0}{h_1^2}a_j^k)v_j^k+\frac12\frac{h_0}{h_1^2}a_j^k(v_{j-1}^k+v_{j+1}^k)+h_0f_j^k$

The coefficients of the values at previous time step are all nonnegative if the following monotonicity condition holds:

$\frac{h_0}{h_1^2}||a||_\infty\leq 1$

$||a||_\infty$ : Maximum possible value of $a$

Example: Digital Option

Consider the digital option with payoff $g(x)=1_{x>0}(x)$. Suppose $f(x,t)=0$ and $a(x,t)=1$

$\begin{cases} v_j^{k-1}=\frac12(v_{j-1}^k+v_{j+1}^k),& if\ h_0=h_1^2\\ v_j^{k-1}=\frac12v_j^k+\frac14(v_{j-1}^k+v_{j+1}^k),& if\ h_0=\frac12h_1^2\\ \end{cases}$

Suppose the spatial domain is taken as $[–1, 1]$ and $T = 1$. For the boundary condition

$V(-1,t)=0,\ V(1,t)=1,\ t\in[0,T]$

we solve the explicit scheme

Unit 4.1.3：Implicit Scheme

For $j\in \mathbb Z$, $k=0,1,…,N$

$\frac{v_j^{k+1}-v_j^k}{h_0}+\frac12a_j^k\frac{v_{j+1}^k+v_{j-1}^k-2v_j^k}{h_1^2}+f_j^k=0$ $v_j^N=g_j$

the fixed-point form of the equation is written as

$v_j^k=(1+\frac{h_0}{h_1^2}a_j^k)^{-1}(\frac12\frac{h_0}{h_1^2}a_j^k(v_{j-1}^k+v_{j+1}^k)+v_j^{k+1}+h_0f_j^k)$

The scheme is monotonic without any condition

Unit 4.1.4：Family of $\theta$ Schemes

For $\theta\in [0, 1]$, consider the following scheme

$\frac{v_j^{k+1}-v_j^k}{h_0}+\frac12\theta a_j^k\frac{v_{j+1}^k+v_{j-1}^k-2v_j^k}{h_1^2}+\frac12(1-\theta) a_j^{k+1}\frac{v_{j+1}^{k+1}+v_{j-1}^{k+1}-2v_j^{k+1}}{h_1^2}+f_j^k=0$ $v_j^N=g_j$

The scheme is explicit if $\theta=0$, and implicit otherwise (standard implicit for $\theta=1$)

The scheme is monotonic if

$(1-\theta)\frac{h_0}{h_1^2}||a||_\infty\leq 1$

Application: Option Pricing

Consider the transformed Black-Scholes equation

$v_t+\frac12 \sigma^2v_{xx}+(r-\frac12 \sigma^2)v_x-rv=0$ $V(T,x)=(e^x-K)^+$

Choose the domain $D=[0,T]\times[X_{min},X_{max}]$ with $-\infty<X_{min}<X_{max}<\infty$

Approximate $D$ with a uniform mesh

$\mathscr D=\{(t^k,x_j):k=0,1,...,N,j=0,1,...,M\}$

where $t^k=kh_0$ and $x_j=X_{min}+jh_1$ for $h_0=T/N$ and $h_1=(X_{max}-X_{min})/M$

Denote $v(t^k,x_j)=v_j^k$. Then, we use the finite difference derivative approximation based on the $\theta$ scheme

$\frac{\partial v}{\partial t}(t^k,x_j)\approx \frac{v_j^{k+1}-v_j^k}{h_0}$ $\frac{\partial v}{\partial x}(t^k,x_j)\approx \theta\frac{v_{j+1}^k-v_j^k}{h_1}+(1-\theta)\frac{v_{j+1}^{k+1}-v_j^{k+1}}{h_1}$ $\frac{\partial^2 v}{\partial x^2}(t^k,x_j)\approx \theta \frac{v_{j+1}^k+v_{j-1}^k-2v_j^k}{h_1^2}+ (1-\theta) \frac{v_{j+1}^{k+1}+v_{j-1}^{k+1}-2v_j^{k+1}}{h_1^2}$

where $\theta\in [0, 1]$ is a constant parameter.

We also replace $v_j^k$ by $\theta v_j^k+(1-\theta)v_j^{k+1}$. For $\theta=0.5$ the method corresponds to Crank-Nicholson method

We can formulate a tridiagonal system at each time step $k = N, . . . , 0$ which can be solved by the well-known Thomas algorithm (MATLAB command: tridiag)

$Av_k=b^{k+1}$

with non-zero coefficients of the tridiagonal matrix $A = (a_{ij})$ given by

$\begin{align} &a_{0,0}=1,\ a_{0,1}=0,\\ &a_{M,M-1}=0,\ a_{M,M}=1,\\ &a_{i,i}=1+2\omega\theta+\kappa\theta+\rho\theta,\\ &a_{i,i+1}=-\omega\theta-\kappa\theta,\\ &a_{i-1,i}=-\omega\theta,\quad i=1,...,M-1 \end{align}$

The time dependent vector $b^{k+1}$ is given as:

$b_0^{k+1}=v_0^{k+1},\ b_M^{k+1}=v_M^{k+1},$ $\begin{align} b_i^{k+1}&=\omega(1-\theta)v_{i-1}^{k+1}+(1-2\omega(1-\theta)-\kappa(1-\theta)-\rho(1-\theta))v_i^{k+1}...\\ &...+(\omega(1-\theta)+\kappa(1-\theta))v_{i+1}^{k+1} \end{align}$

where $v_0^{k+1},v_M^{k+1}$ are given by the boundary conditions and the remaining constants are defined as below

$\omega=\frac{h_0\sigma^2}{2h_1^2},\ \kappa=\frac{(r-\frac12\sigma^2)h_0}{h_1},\ \rho=rh_0$

The $i$th coordinate of the vector $v^k$ is the approximation of the value $v(t^k,x_i)$

Unit 4.2：Optimisation methods in finance

Introduction

In finance, the parameters in stochastic models are estimated using the available market data. This procedure is called model calibration

Most calibration problems can be seen as either a root-finding procedure or as an optimisation problem

Convex Functions

Let $\mathscr U$ denote a convex set and let $f :\mathscr U \to \mathbb R$ be a function. Then, $f$ is convex if for all $u_1, u_2 \in \mathscr U$ , for all $\lambda \in [0, 1]$,

$f(\lambda u_1+(1-\lambda)u_2)\leq\lambda f(u_1)+(1-\lambda)f(u_2)$

Any local minimun of a convex function is also its global minimum

If $w_1,…,w_n\ge0$ and $f_1,…,f_n$ are all convex, then $\sum_{i=1}^n w_if_i$ is also aconvex function

If $f_1,…,f_n$ are convex, then $g(x):=max(f_1,…,f_n)$ is also a convex function

Unit 4.2.1：Newton’s Method

Basic approach:

Approximate the function $f$ by its tangent at a given point
The point at which the tangent crosses the x-axis is taken as an approximate solution to the equation $f (x) = 0$
The process is repeated with the new point

Given an initial guess $x_0$, a sequence of points is constructed using the recurrence relation

$x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}$

For the method to converge, we need, at least, that $|f′′| \leq K$ and the initial guess is close to the true solution

Example: Implied Volatility

Consider $C(\sigma)$ as the call option price formula given strike $K$ and time to maturity $T$ with volatility $\sigma$ as the parameter

Aim is to find the implied volatility for call option with market price $C_0$

Given an initial guess $\sigma_0$, Newton’s method can be used to construct a sequence of points as

$\sigma_{n+1}=\sigma_n-\frac{C(\sigma_n)-C_0}{\frac{\partial_\sigma C(\sigma_n)}{\partial \sigma}}$

For error tolerance $\epsilon$, stop the iteration for some $n + 1$, when $|\sigma_{n+1} – \sigma_n| < \epsilon$ (convergence)

Unit 4.2.2：Descent Methods

To solve optimisation problems, Newton’s method cannot be used as it can guarantee only local convergence for a good initial guess

In practice, to solve optimisation problems, we need methods with global convergence, that is convergence from an arbitrary initial guess (starting point)

The overall idea of descent methods is to construct a sequence of points as

$x_{n+1}=x_n+\mu_np_n$

where $\mu_n$ is a non-increasing sequence of learning rate and $p_n$ is direction vector

Gradient Descent Method

For model parameter $u$ taking values in the space $\mathscr U$ , we consider the problem

$\min_{u\in\mathscr U} J(u)$

The sequence of points is constructed as

$u_{n+1}=u_n-\mu\Delta J(u_n)$

where $\mu > 0$ and $\Delta J(u_n)$ is the gradient of cost function evaluated at $u_n$

$\mu$ has to be carefully chosen to ensure convergence of the method

Gradient Descent Method with Constraints

Under some constraints on the parameters, we look to solve the problem

$\min_{u\in K} J(u)$

where $K$ is a closed convex subset of the parameter space $\mathscr U$

For $\mu > 0$, a sequence of points is constructed as

$u_{n+1}=P_K(u_n-\mu\Delta J(u_n))$

where $P_K(u)$ is the orthogonal projection of $u$ on $K$

For $x \in \mathscr U$ , its orthogonal projection in $K$ is given by $x_K$ where

$||x-x_K||=\min_{y\in K}||x-y||$

Penalisation of Constraints

We can convert a minimisation problem with constraints into an optimisation problem without constraints

The idea is to penalise the constraints and form a new unconstrained minimisation problem

Consider the constrained minimisation problem with continuous convex cost function $J:\mathbb R^d\to\mathbb R$

$\min_{F(u)\leq 0}J(u)$

where $F : \mathbb R^d \to \mathbb R^M$ is a continuous convex function representing thec onstraints on parameter $u \in \mathbb R^d$

An unconstrained problem is then given as

$\min_{u\in\mathbb R^d}(J(u)+\frac1\epsilon\sum_{i=1}^M[\max(F_i(u),0)]^2)$

where the constraint $F_i(u) \leq 0$ has been penalised

The above problem can be solved using descent methods

For small value of $\epsilon > 0$, the solution of unconstrained problem approximates the solution of constrained problem

Unit 4.2.3：Stochastic Gradient Descent

Unlike the gradient descent methods which are deterministic, stochastic gradient methods choose the direction of descent randomly

Gradient descent methods perform well for convex cost functions whereas stochastic gradient descent methods can handle non-convex cost functions

Consider the model parameter $\theta \in \mathbb R^d$

For example:

consider the Heston’s stochastic volatility model

$\frac{d S_t}{S_t}=rdt+\sqrt{v_t}dW_t$ $dv_t=\kappa(\theta-v_t)dt+\delta\sqrt{v_t}dW_t',\ d\langle W,W'\rangle=\rho dt$

The parameter vector $\theta = (\kappa,\theta,\delta,\rho)\in \mathbb R^4$

Next, consider the cost function $J:\mathbb R^d\to\mathbb R$

For example:

Given market prices $P^{market}$ of call options with strike
$K_j, 1 \leq j \leq N$ and time to maturity $T_i, 1 \leq i \leq M$, the cost function for Heston’s model calibration can be written as

$J(\theta;\{K\}_{j=1}^N,\{T\}_{i=1}^M)=\frac1{MN}\sum_{i=1}^M\sum_{j=1}^N(P^\theta(T_i,K_j)-P^{market}(T_i,K_j))^2$

where $P^\theta$ is the call option price function under Heston’s model

Unlike the gradient descent methods, which use the entire dataset $\{K\}^N_{j=1}$, $\{T\}^M_{i=1}$, to construct the points sequence, the stochastic gradient descent methods use only a fraction of the dataset to construct the sequence

Stochastic Gradient Descent Algorithm

Stochastic gradient descent (SGD) method constructs the sequence as

$\theta_{n+1}=\theta_n-\mu\Delta_\theta J(\theta_n;K_j,T_i)$

where $\mu > 0$ is a positive scalar and the update is performed for a uniformly randomly chosen sample $K_j$ and $T_i$ from the dataset

The SGD method updates frequently with a high variance which leads to a slower convergence

On the other hand, due to high variance, the SGD method is able to better explore the parameter space and is able to jump to better local minima for non-convex cost functions

Mini-batch Gradient Descent

Mini-batch gradient descent algorithm randomly picks a sample from the dataset to compute the derivative of the cost function

Uniformly randomly select m samples $(K_i,T_j)$ from the dataset $\{K\}^N_{j=1}$, $\{T\}^M_{i=1}$, and denote it by $\{X_i\}^m_{i=1}$ , called the mini-batch

Compute the derivative of the cost function using the mini-batch, that is, compute $\Delta_\theta J(\theta;\{X_i\}^m_{i=1})$

Construct the sequence using the recursion

$\theta_{n+1}=\theta_n-\mu\Delta_\theta J(\theta_n;\{X_i\}_{i=1}^m)$

Small batch sizes lead to noisy updates whereas large batch sizes require considerable computation time

Conclusion

Stochastic gradient descent algorithms can be further improved using different statistical features. The most famous algorithms are

AdaGrad
AdaDelta
AdaM

The above variations of the stochastic gradient algorithm are mostly used for very large datasets

For most model calibration purposes, mini-batch gradient descent method provides good performance

Finance

#Financial Engineering

【课程】Applied Computational Finance

http://achlier.github.io/2021/05/12/Applied_Computational_Finance/

Author

Hailey

Posted on

May 12, 2021

Licensed under

【课程】Economic Fundamentals and Financial Markets Previous

【练习】Option Pricing Next