【课程】Applied Computational Finance

基于格拉斯哥ECON5065的笔记;

General information

Many problems in economics, finance and financial engineering do not possess a closed-form solution which could be computed easily. Typically, these problems are solved by using numerical algorithms and simulation methods. This course covers the study and implementation of such techniques. These include Monte Carlo simulation for pricing complex derivatives; finite difference methods to solve partial differential equations for option pricing; effective methods to compute hedging strategies; numerical dynamic programming for optimal portfolios, resource management and economic growth. Advanced understanding of these techniques and their implementation on computer using software packages and programming languages is the focus of this course.

Aims

The main aims of this course are to introduce students to core techniques in computational finance, such as simulation of asset prices, pricing options using stochastic models, Monte Carlo methods as applied to complex derivatives, solving Black-Scholes type equations numerically, solving dynamic optimization problems numerically, and how to implement these techniques using modelling software packages and programming languages.

ILOs

By the end of this course, students will be able to:

  1. Calculate the price and hedging portfolios of complex derivatives using Monte Carlo simulations;
  2. Calculate the price of American and other exotic options by implementing numerical methods to solve the related partial differential equations;
  3. Implement dynamic programming algorithms to solve problems in portfolio optimisation, resource management and monetary policy;
  4. Demonstrate advanced skills in modelling software package MATLAB.

Unit 1:Black-Scholes Model

Unit 1.1:P&L Approach to Black-Scholes Equation

  • P&L:profit and loss
  • T:maturity
  • S:underlying asset price
  • $F(S_T)$:payoff function
  • $P(t, S)$:pricing function

Consider the P&L due to trading between time $t$ and $t + \delta t$

  • $\delta S$:change in the stock price, $\delta S$ independent of $\delta t$

  • $\Delta = \frac{dP}{dS}$

  • theta:$\frac{dP}{dt}$
  • corrected theta:$\frac{dP}{dt}-rP+(r-q)S\frac{dP}{dS}$

  • gamma:$\frac{d^2P}{dS^2}$

  • Dollar gamma:$S^2\frac{d^2P}{dS^2}$

Any model is unusable if $A(t, S)$ and $B(t, S)$ have the same sign. The model is usable only if $A(t, S)$ and $B(t, S)$ have different signs.

No money is made if

On average we have

where $\hat{\sigma}$ is the historical volatility of $S$ based on returns $\delta S/S$

For risk-management, our model should satisfy the condition

Thus,

where $P_{\sigma^2}$ accounts for dependence on the volatility level $\hat{\sigma}$

We then get,

For vanilla options in the market, $\hat{\sigma}$ is the implied volatility such that $P_{\sigma^2}$ is equal to the market price.

Unit 1.2:Stochastic Differential Equations

Recall a differential equation

This can be written alternatively as

or approximately as an difference equation with small time interval (increment) $\Delta t$

with the increments of Brownian motion $\Delta W = W(t+\Delta t)-W(t), \Delta W \sim N(0,\Delta t)$

stochastic differential equation

Unit 1.3:Ito Formula

Ito Formula 相当于 Taylor 展开的一种特殊形式,因为它是针对随机过程的,因此在二次导的展开项中直接略过了所有与 dt 相乘的项, 只留下 $F_{XX}$

与 dt 相乘的项都趋近 0 的原因是,对于 dtdt/dtdW,其中一个部分是作用于 $||\pi||\to 0$ 时,把累加部分相连的. $\sum f(x) (t_{i+1}-t_i)\to \int f(x) dt$. 剩下的那一部分由于 $||\pi||$ 极限趋近于 0,即两个 t 之间相隔极近,会使整个式子趋近 0. 比如 dtdt 为 $(dt)^2$, dtdW 为 $(dt)^{1.5}$, 去掉一个 dt 仍然有保留的幂,则整个式子取 0.

We are interested in the dynamics of an expression such as $f(t)=F(t,X(t))$ with $F:[0,\infty]\times \mathbb{R}\to \mathbb{R}$ .

  • $F$:option pricing function
  • $X(t)$:underlying share price
  • $(dW)^2 = dt$
  • up to order 1 in $dt$:$(dt)^2=0$, $(dW)(dt)=dt\sqrt{dt}=0$
  • $(dX)^2=\sigma(t,X(t))^2dt$

Then, the stochastic process $f$ has the following dynamics

Suppose that the asset price $S$ solves the following stochastic differential equation

At $t = T$, we have

Delta Hedging

Black, Scholes, Merton approach : construct a riskless portfolio, which consists of selling the option and buying the asset

where

Substitution of $dS$ gives

riskless $\Rightarrow \Delta = P_S$

Substitution gives

Recall that

Combining the above results, leads to the Black-Scholes equation

with boundary condition

Under the Black-Scholes-Merton (BSM) model, European call option price is given as

European put option price is given as

Black-Scholes PDE can be transformed to Black-Scholes Pricing Formula through solving the heat equation

Unit 1.4:Greeks

Certain partial derivatives of option prices are widely used for hedging purposes and have been assigned Greek names and symbols :

Delta Gamma Rho Theta Vega
$\Delta$ $\Gamma$ $\rho$ $\Theta$ $vega$
$\frac{\partial C}{\partial S}$ $\frac{\partial^2 C}{\partial S^2}$ $\frac{\partial C}{\partial r}$ $\frac{\partial C}{\partial t}$ $\frac{\partial C}{\partial \sigma}$

Read more about those functions

Recall that

Unit 1.5:Implied Volatility

Suppose that the market price of a European call option with expiry $T$ and strike $K$ at time $t$ is observed to be $C^{mkt}(t)$

In our model

Now consider vega of the option given by

  • It can be shown, that there exists a solution whenever $C^{mkt}(t)$ is contained in the open interval $((S_t-Ke^{-r(T-t)})^+,S_t)$
  • The positivity of vega tells us that the function $C(t,T, σ,K) – C^{obs}(t)$ is strictly increasing as a function of $\sigma$

The solution of the equation, $\sigma^{impl}$ , is called implied volatility

We can therefore compute its zeros numerically by applying the Newton method (it always works if the function is convex)

Plot the Implied Volatility - Moneyness ( $\frac{ln(K/F)}{\sigma\sqrt{T}}$ ) and we could have volatility smile

然而实际上 BS model 服从的是 normal distribution ,并且只包函基本的参数。因此对于 BS model 来说 moneyness (in,at,out of money)是不影响 volatility的,即画出的 moneyness-volatility 是一条直线。

对于真正的市场,存在 fat tail 现象,除此之外流动性,手续费,最低报价间隔,以及最小报价单位等变量未包含在 BS model 中,导致了它的不准确。此外,投资者参与期权交易的目的对隐含波动率也有影响。股票期权交易者中有很大一部分的交易目的是防止股票下跌,因此他们选择的策略多是 buy put 或者 covered call ,这样导致波动率曲线向右下侧倾斜;而外汇期权的交易者的交易目的并不偏向两种方向,因此隐含波动率曲线多是smile的形状。

Unit 2:Advanced stochastic models and their calibration

Unit 2.1:local volatility

Local volatility is a class of market models which is an extension of the fixed volatility Black-Scholes-Merton model. In the model, instantaneous volatility is a function of t, S; $\sigma(t,S)$

Given the market observed European option prices, our aim is to find
the function $\sigma(t,S)$. It was shown by Bruno Dupire that

Unit 2.2:Derivation of the Dupire Formula

Suppose that we have the following dynamics

Next, recall that the price of a European call option is given as $C(T,K)=e^{-rT}\mathbb{E}[(S_T-K)^+]$

First, use Ito’s formula for $(S_T-K)^+$ over $[T,T + dT]$ :

We then get

Next, note that

where Heaviside function

and delta function

$\int_{-\infty}^\infty\delta(x-y)f(x)dx=f(y)$

Then,

To use the above formula, let us consider “undiscounted” option prices $\mathscr{C}(T,K)=\mathbb{E}[(S_T-K)^+]$. Taking derivative w.r.t. K

Note that $\mathbb{E}[\delta(S_T-K)]$ is the density of stock prices

$\mathbb{E}[\delta(S_T-K)]=\int_{-\infty}^\infty\delta(S_T-K)f(S_T)dx=f(K)$

From the identity

we get $\mathbb{E}[S_T\theta(S_T-K)]=\mathscr{C}+K\mathbb{E}[\theta(S_T-K)]=\mathscr{C}-K\frac{\partial\mathscr{C}}{\partial K}$

Next note that $\mathbb{E}[d(S_T-K)^+]=d\mathbb{E}[(S_T-K)^+]$

Then, by taking expectation in the Ito’s formula applied on $(S_T-K)^+$ with $\mathbb{E}[W_t]=0$

we evaluate the function at $S_t$ equal to $K$

Above yields

Going back to the notation of $C = e^{–rT}\mathscr{C}$ , we obtain the Dupire’se quation

Thus, we get the Dupire’s formula

For a known volatility function $\sigma(t, S)$, we can use the Dupire’s formula to price all European call options on $S_0$

with initial condition $C(T=0,K)=(S_0-K)^+$

Unit 2.3:Positivity of the right hand side of the Dupire formula

Consider the second derivative in the denominator

The right hand side can be thought of a Butterfly spread

As $\epsilon \to 0$, the butterfly spread (more narrow and more much higher) converges to a Dirac-delta function which is either 0 or strictly positive and thus, it’s price must be positive, thus $\frac{d^2C}{dK^2}>0$

The numerator in the Dupire’s formula can be re-written as

For the above to be positive, call option with a strike which is a fixed proportion $K$ of the forward $F_T=Se^{(r-q)T}$ must be an increasing function of $T$

For $T_1 \leq T_2$

In order to prove it, we suppose that the monotonicity condition is violated, that is

Set up the following strategy:

  • Buy one option of maturity $T_2$, strike $KF_{T_2}$

  • Sell $e^{–q(T2–T1)}$ options of maturity $T_1$, strike $KF_{T1}$

At inception, setting up the above strategy pays a premium ( > 0 )

At $T_1$, take the following $\Delta$ position on S:

  • If $S_{T_1}<KF_{T1}$ : $\Delta =0$
  • If $S_{T_1}>KF_{T1}$ : $\Delta =-1$

At $T_2$ , the P&L comprises of the following positions

  • receipt of payoff from $T_2$ maturity option
  • payment of payoff from $T_1$ maturity option, capitalized up to $T_2$
  • P&L generated by the delta position created at $T_1$ and liquidated at $T_2$

For $S_{T_1}^*=\frac{F_{T_2}}{F_{T_1}}S_{T_1}$ , the P&L is written as

The last equation reads as

for $f(x)=(x-KF_{T_2})^+$

$f(y)-f(x)>(y-x)f(x) when f’’(x)>0$

Thus, our strategy produces positive amount at inception and at $T_2$. Therefore, there’s an arbitrage.

Thus, the numerator in the Dupire’s formula is always positive.

Unit 2.4 Further aspects of the Dupire formula and local volatility

To compute the Dupire’s local volatility function in terms of implied volatility $\hat\sigma(T,K)$, first write

Next, define

  • forward price as $F_t=S_0e^{(r-q)(t-t_0)}$
  • log strike as $y=ln(\frac{K}{F_t})$
  • parameterized volatility as $f(t,y)=(t-t_0)\hat\sigma^2(t,K)$

Next, replace $C(S_0,T,K)$ in the Dupire’s formula with $C^{BS}(S_0,T,K,\hat\sigma(T,K))$ and compute derivatives using $f$ and $y$ rather than $\hat\sigma$and $K$

【Method 1】

这个方法需要求 $\frac{\partial f}{\partial K}$, 此处省略了, 计算比较复杂,以下$C$为$C^{BS}$

With

【Hint】

Inserting these equations into

【Method 2】

以下$c$ 为 $C^{BS}$

Step 1

按Method 1同样的方法, 但此时C考虑的是K与f不相关,与t相关

Since $K=S_0e^{(r-q)(t-t_0)}$ so that $\frac{\partial K}{\partial t}=K(r-q)=K\mu$. And $\frac{\partial y}{\partial K}=\frac{1}{K}$

we substitute into Equation

因为考虑的是forward,不用把C折现,所以不减去rC

We get

Step 2

By taking derivatives of the Black-Scholes formula, we obtain

这个部分我还没推过,但在More的第二个连接里有过程

and we have

Inserting these equations into

The local volatility function is then given as

As in the markets, the option prices are only available for discrete strikes and maturities, to use the Dupire’s formula, we need to interpolate between the available strikes and maturities, and extrapolate outside the market range.

Issues with Dupire’s Local Volatility Model

  • The local volatility surface obtained by the Dupire’s formula is very sensitive to the method of data interpolation
  • The only risky asset used is the underlying asset which does not account for the volatility risk (affected by random shocks)
  • The calibrated local volatility surface is not stable with the passage of time (in practice the local volatility surface is is changing all the time)
  • The dynamics of the volatility process assumed by the model does not correspond to that observed in the market

More

Derivation of Loval Volatility

Dupire Local Volatility

Local volatility modelling

Stochastic Volatility and Local Volatility

CEV Model

Constant Elasticity of Variance (CEV) model is given as

If $a$ less than one, then we have that non constant local volatility function

Under the risk-neutral measure, by assuming that the forward price is given as $F_t=S_te^{r(T-t)}$, the CEV model can be re-written as

The implied volatility is given by the asymptotic approximation

with $F_m=\frac{1}{2}(F_0+K)$

Trinomial Pricing Tree

The issue of non-recombining binomial tree is addressed by using a trinomial tree model where the prices are given by

For the trinomial tree to converge to a continuous process, we further impose

For a recombining tree (i.e. the number of nodes increase linearly and not exponentially), we need $u d=(1+\Delta t)^2$

The probabilities are given as

To ensure the absence of arbitrage in the model, we need that $p + q\leq 1$ whic is ensured if

Such that the above inequality is always satisfied, we choose

where $\sigma > \sigma(t, S)$ for all the nodes in the trinomial tree

By choosing $\bar\sigma$ sufficiently large, in the limit, we obtain the following model which does not admit arbitrage

Unit 2.5:Stochastic Volatility

For local volatility

The last part show that the randomness of volatility is totally depended on the randomness of stock. However, the empirical data we will find that the randomness in the stock and the randomness in the volatility differ from each other. So the volatility should affected by other sources of randomness.

The class of models where volatility is supposed to be a stochastic process is called stochastic volatility models

Unit 2.6:Pricing Equation

Incomplete Market

In an incomplete market, the risk from volatility (vega) cannot be eliminated by trading only in the underlying asset

We have two sources of randomness. However we only allow trading in the money market account. So the market is incomplete. We can make an incomplete market complete by adding additional assets (like option or future).

We create a self-financing portfolio using two risky assets $S$ and option $C^0$

The coefficients $a_t$, $b_t$ and $C^0$ are functions of ($t, S_t, \sigma$)

Often the volatility process will be chosen as a mean-reverting process

Suppose $V_t$ denotes the value of a self-financing portfolio with $\delta_t$ shares in $S$ and $\omega_t$ units of $C^0$

By using Ito’s formula, we get

where $\mathscr{L}_t$ is a operator

The value of portfolio to match the option price of C

For the two expressions of dVt to match, we need

Set $V_t=C$ and substitute it to

In the pricing equation for $C$, there is a factor which depends on $C^0$, which we add to our portfolio to complete the market

Denote

Thus, by argument of creating a self-replicating portfolio, we have shown that the price $C$ of a European option with payoff $h(S_T)$, solves the following equation

with

if chosen $\lambda=0$, the difference of BS-PDE is the last two part of $\mathscr{L}$ with $b_t$

Unit 2.7:Risk Neutral Measure and Risk Premia

Finding the Risk-Neutral Measure

From its definition, under the risk-neutral measure, price $C_t$ of a European satisfies

Applying the Ito’s formula and using the equation (PDE) in previous part, we get

Thus, under the risk-neutral measure (cancels out the $dt$ part in parenthese of the function above), we must have the following

Under the risk-neutral measure, the price of a European option is written as

For the choice of model under physical measure

By matching the coefficients, we get

where $\beta_t$(market prices of risk) and $\phi_t$ are bounded processes

$dW_t^\mathbb{P}+\theta_tdt=dW_t^\mathbb{Q}$, $\mathbb{P}:$现实测度

Risk Premiums

$\beta_t$ is typically called the risk premium. Analogously, $\phi_t$ is called the risk premium of volatility

$\lambda_t$ is calibrated directly under the risk-neutral measure (chosen by market)

Unit 2.8:Estimation Error

Estimation of Volatility

We could approximate $\sigma_t$ by its average over a period of T estimated using the historical data. However, estimation of historical volatility faces the following issues

  • Variance of the estimator decreases with T
  • Bias of the approximation of $\sigma_t$ by average volatility increases with T

Recall that

Thus, the quadratic variation of $log S_t$ is given as

Recall from the definition of quadratic variation of a random process

Denote the log-returns of asset price as

$dlogS_t=(logS_{t+1}-logS_t)dt=log\frac{S_{t+1}}{S_t}dt$

Then, we have

Realised Variance

Therefore, we could estimate the integrated variance by the realised variance

Next, suppose that

  • $\sigma_t$ is independent of Brownian motion W driving asset prices S
  • Drift function $\mu$ is deterministic

Then, for a fixed volatility path (由已知数据得出), the log-returns $R_i:=r(\frac{i-1}{N}t,\frac{t}{N})$ are independent Gaussian random variables with variance & mean

Error in Realised Variance

Thank the Professor Ankush Agarwal and Christian Ewald. They provide the precise process of following.

If we compute thxe expectation of realised variance $VR^N_t$ for a given volatility path $\sigma_t$, we get

$E(x^2)=Var(x)+E(x)^2$

with

直接取一个固定值来代替这一段的积分,积分在t时间上的宽度是 t/N​

Then

Thus, the bias of estimating the integrated variance with realised variance

Similarly, the variance of $VR^N_t$ for a given volatility path $\sigma_t$ is

$R_i$ are iid

For every i

With $E[R_i^4]=m_i^4+6m_i^2v_i+3v_i^2$

关于这一定理可以参考 如何通俗的理解矩母函数,以及 wiki 中展示的 Order 4 的 Non-central moment

本站更多参考 正态分布与矩母函数/)

and

Repeat the same step as above

Also

Approximately, the variance of $VR^N_t$ for a given volatility path $\sigma_t$ is

Next, we note that the root mean squared error (RMSE) of an estimator is given by

Therefore, the dominant term in the error of realised variance is

If the Relative error is large. We should check out the Greek (sensitive of option price to the Volatility). Insensitive $\to$ no matter that much

Supplymental reading

link

defination

  • $p_t$ = log-price of asset at time $t$ = $logS_t$
  • $\Delta$ = fraction of a trading session associated with the implied sampling frequency
  • $m=1/\Delta$ = number of sampled observations per trading session
  • $T$ = number of days in the sample $\Rightarrow$ $mT$ total observations
  • $r_{t-1+j\Delta}=p_{t-1+j\Delta}-p_{t-1+(j-1)\Delta}=log\frac{S_{t-1+j\Delta}}{S_{t-1+(j-1)\Delta}}$
  • $r_t=r_{t-1+\Delta}+r_{t-1+2\Delta}+…+r_t=log\frac{S_t}{S_{t-1}}$
  • Realized variance (RV=VR) on day $t$
  • Realized volatility (RVOL) on day $t$

  • Realized log-volatility (RLVOL) :

  • Arithematic Brownian Motion (只参考,我们运用的$\mu,\sigma$不是固定的)
  • The quadratic variation (QV) of the return process from time $0$ to $t$ is
  • Let $p(t)$ be described by the stochastic differential equation
  • IV$_t$ denotes integrated variance for day $t$

Example: QV for Wiener process

Q1 What does RV estimate?

That is, daily RV converges in probability to the daily increment in QV.

ABDL (2003)

  • Since $\int^t_{t−1}\mu(s)ds$ is generally very small for daily returns and $RV_t^m$ is a consistent estimator of $IV_t$, for Itô processes daily returns should follow a normal mixture distribution with $RV_t^m$ as the mixing variable. As a result, returns standardized by realized volatility should be standard normal
  • If there are jumps in $dp (t)$, then $RV_t^m \xrightarrow{p} IV_t$ IVt but returns are no longer conditionally normally distributed.

Define the error

BNS (2001) : For the Ito diffusion model under the assumption that mean and volatility processes are jointly independent of $W(t)$

where is the integrated quarticity (IQ)

Hence,

IQ$_t$ may be consistently estimated using the following scaled version of realized quarticity (RQ)

The feasible asymptotic distribution for $RV_t^m$ is

Unit 2.9:The Heston Model

Heston’s Stochastic Volatility Model

Under the risk-neutral measure, Heston’s stochastic volatility model is given as

本课的模型都是在风险中性 (鞅) 的测度下,对于有些文章会按现实测度写,但实际是效果都相同

If we have $W_1,\tilde W$ where $\langle W _1,\tilde W\rangle=0$

The variance process $v_t$ is also known by the name of Cox-Ingersoll-Rubinstein (CIR) process

To model the variance of asset prices, we need $v_t$ to remain positive

(Advantage)For $\frac{2k\theta}{\delta^2}>1$ the process never hits zero. This constraint on parameters is also known as Feller condition

This process is mean reverting

For the square root process

  • $\theta$ is the long term mean
  • k is the speed of mean-reversion
  • $\delta$ is volatility of volatility

First we define the log-price $X_t$ by $S_t=S_0e^{rt+X_t}$

Next, by using the Ito’s formula, we get

European call option price with maturity T and strike K is given as

For log-strike

We first find the characteristic function of $X_T$

To compute $C(K)=S_0\mathbb{E}[(e^{X_T}-e^k)^+]$ , we then use the Fourier transform $f (t, u, x, v)$ of $X_T$

European Call Option Price Formula

where $Re(a+ib)=a$, We can only solve the $P_j$ numerical

Unit 2.10:Introduction to Calibration

Exotic Option

Not traded on the market and $\to$ need to be priced by a model

  • A forward start option: $(S_T-aS_{T_1})^+$

  • A cliquet: $\sum_{i=1}^{n-1}(S_{T_{i+1}}-aS_{T_i})^+$

  • A barrier option: $(S_T-K)^+1_{sup_{t\in[0,T]}S_t\leq B}$

Identify the hedging instruments : liquid products (small bid-ask spread) quoted on a market

we aim at constructing a hedging portfolio of the form

Choose a model

Reproduce the prices of simple instruments quoted on the market. (This step is model calibration)

Then compute the hedging strategies $\Delta^0,\Delta,\Delta^{C,i},\Delta^{P,j}$

General Considerations

Model calibration = determine the model parameters from market data of liquid instruments

Two approaches in model calibration

  1. Statistical parameter estimation
    • from historical observations
    • done under the physical (market) measure
  2. Calibration
    • based on quotes of option prices (typically calls/puts)
    • done under the risk-neutral measure
    • reflects the market view on the future evolution of the asset

Which price? Bid or Ask?

  • Typically one calibrates to the mid-price

Model Calibration

In general, a “good” model should have

  • A parsimonious parametrization and should be intuitive

  • An efficient numerical calibration method

We want to find $\theta$ such that

or a least squares approach

Calibration of local volatility models

The inverse problem of finding the local volatility function with finite number of call option prices is then ill posed. To overcome the difficulty, typically two approaches are used

  • Implied volatility interpolation using either parametric or semi-parametric form

    • Non-parametric methods of interpolation, for example, the method of splines(三次样条插值), provide very mediocre performance for such problems
    • Implied volatility interpolation methods are very intuitive and fast, but could lead to infeasible volatility surface
  • Regularisation approach

    • Regularisation methods provide the best performance in terms of precision and regularity of the volatility surface

Stochastic Volatility Inspired(SVI) Parametrisation (by Jim Gatheral)

It is a parametrisation of implied volatility as a function of parameters in terms of log-strike $k=log\frac{K}{S}-rT$

In the parametrisation, for a given maturity $T$, the total implied variance $V(T, k) := I^2(T, k)T$ satisfies

with

  • $a\ge 0$: global level of smile

  • $b\ge 0$: slope of the wings

  • $\rho\in[-1,1]$: asymmetry (rotation of the smile)

  • $m\in\mathbb{R}$: translation to the right

  • $\sigma\ge 0$: at-the-money convexity

Step

  • For a given maturity $T_i$, we compute the implied volatility $I(T_i, k_j)$ for all the strikes $K_j$ of the liquidly traded call options
  • Then using an optimisation procedure, for example, least-squares, we compute the optimal values for parameters $a,b,\rho,m$ and $\sigma$ for maturity $T_i$

    • The parameters are estimated for each maturity separately
  • For the intermediate maturities without liquidly traded options, the implied volatility can be computed using linear interpolation and the SVI parameters can be computed

Absence of Arbitrage

  • Due to the interpolation procedure, SVI parametrisation can lead to arbitrage opportunities in the model
    • Option price function $P(T, k)$ which is not convex with respect to $k$
    • $\frac{\partial^2P(T,k)}{\partial k^2}<0$

  • To ensure the absence of arbitrage: $\lim_{k\to\infty}C^{BS}(T,k)=0$

  • The absence of arbitrage of the “calendar spread” type: $\frac{\partial V(T,k)}{\partial T}\ge 0$

    • The above condition implies that the total implied variance is an increasing function with respect to maturity. This is equivalent to non-crossing of the curves of total implied variance for different maturities
  • The absence of arbitrage of the “butterfly” type

    • The condition leads to the convexity of option prices with respect to $k$

Regularisation Approach

The calibration of volatility surface is posed as an optimisation problem using Tikhonov regularisation

Minimise the functional

To minimise $J(\sigma)$, we use, for example, the method of gradient descent or the method of conjugate gradient

Typical algortihm

  • Discretise local volatility function on a mesh of space and time
  • Make an initial guess $\sigma^{(0)}$,for example, a constant function
  • Evaluate the gradient of $J(\sigma)$ at point $\sigma^{(0)}$to calculate the direction of descent $\lambda^{(0)}$
  • Solve the one-dimensional minimisation problem
  • Set $\sigma^{(1)}:=\sigma^{(0)}+h^*\lambda^{(0)}$ and iterate again

The local volatility function is discretised using different methods, for example, finite differences or trinomial tree model

The regularisation using $||\nabla\sigma||$ is the same in regions of the surface where the influence of volatility on the price is very high and in the regions where it is very small

Unit 3.1:Monte Carlo methods

Readings for the module on Monte Carlo Methods

  • Chapter 3 Section 3.1, Section 3.2 - Monte Carlo Methods in Financial Engineering by Paul Glasserman
  • Chapter 6 Section 6.1 - Monte Carlo Methods in Financial Engineering by Paul Glasserman

Unit 3.1.1:Introduction

Monte Carlo methods are preferable in high dimensions as their accuracy is independent of dimensionality

If f is integrable over [0, 1], then by strong law of large numbers (SLLN)

大数定律的定义是,当随机事件发生的次数足够多时,随机事件发生的频率趋近于预期的概率。可以简单理解为样本数量越多,其平概率越接近于期望值。大数定律的条件:1、独立重复事件;2、重复次数足够多。

for

w.p. 1 (with probability 1)

If f is also square integrable, for

the error in the estimator $\hat\alpha_N$ is approximately normally distributed with mean 0 and standard deviation $\frac{\sigma_f}{\sqrt{N}}$

Typically $\sigma_f$ is unknown and is estimated as

Monte Carlo Estimation for European Option

Monte Carlo samples $S_i(T)$ of $S(T)$ can be generated as

where $Z_i\sim N(0,1)$

用 $\sqrt TZ_i$ 代替 $W_t$ 可以减少 noise

Call price estimator is given as

The estimator $\hat C_N$ is unbiased in the sense that

Unit 3.1.2:Efficiency of Monte Carlo methods

Confidence Interval

Using estimator $\hat C_N$ makes sense with a confidence interval

where

  • $2z_{\delta/2}\frac{s_C}{\sqrt{N}}$ is called the width of the confidence interval
  • $z_\delta$ denotes the $1-\delta$ quantile of the standard normal distribution
    • $\mathbb{P}(Z\leq z_\delta)=1-\delta$ for $Z\sim N(0,1)$
    • For $\delta = 0.05, z_\delta/2 \approx 1.96$, we get a 95% confidence interval

Central limit theorem (CLT)

With unbiased estimator $\hat C_N=\frac1N\sum_{i=1}^NC_i$ for iid samples $C_i$, $\mathbb{E}[C_i]=C$ and $Var(C_i)=\sigma_C^2<\infty$

  • $\sigma_C$ can only be estimated

Suppose it takes $\tau$ time to generate $C_i$. Given computational budget $\Gamma$, can generate $\Gamma/\tau$ samples and create estimator $\hat C_{\Gamma/\tau}$

  • $\sqrt{\Gamma/\tau}(\hat C_{\Gamma/\tau}-C)\sim N(0,\sigma_C^2)$
  • as $\Gamma\to\infty$, $\sqrt{\Gamma}(\hat C_{\Gamma/\tau}-C)\sim N(0,\sigma_C^2\tau)$

  • Given a choice, we should use an estimator with lower value of $\sigma_C^2\tau$

Mean Square Error

For an estimator $\hat\alpha$ of true value $\alpha$

两边可以尝试配平

Landau’s notation: For parameter $h\to 0$

类似于算法复杂度,大 O 是上界,小 o 是下届

Unit 3.1.3:Euler Method

Consider an ordinary differential equation (ODE) in time interval [0, 1]

Divide interval into $N$, $h=1/N$

The above ODE can be solved approximately using an Euler scheme

where

  • Error criterion $e(h):=|x(1)-x^h(1)|$
  • Under suitable assumptions $e(h)=\mathscr{O}(h)=\mathscr{O}(1/N)$

Euler Method更多解释可参考数值分析的部分

Euler Discretisation for BM

with $h=\frac TN,t_n^h:=nh,\Delta W_n^h:=W(t_n^h)-W(t_{n-1}^h)$, the Euler scheme for BM is given as

Note that $\Delta W_n^h$ are iid increments which are normally distributed $N(0,h)$

Euler Method for SDEs

The Euler scheme for $X(t)$ is given as $X^h(0)=x_0$

If we wish to price options based on the path of the SDE, for example, Asian option with price

we need a good approximation of the sample path of $X$ which leads to the error criterion (strong error)

If we wish to price options based only on the terminal value of the SDE, for example, European option with price

we need a good approximation of the distribution of $X(T)$ which leads to the error criterion (weak error)

for smooth functions g

光滑函数(smooth function):在其定义域内无穷阶数连续可导的函数

g 可以取 $(S(T)-K)^+$

We say that $X^h$ converges to $X$ with order $\beta > 0$

  • strongly with order $\beta$ if $e_s(h)=\mathscr O(h^\beta)$
  • weakly with order $\beta$ if $e_w(h)=\mathscr O(h^\beta)$

Under some regularity conditions on functions $b$ and $\sigma$, Euler scheme converges strongly with order $\beta = 1/2$ (Euler) and weakly with order $\beta = 1$ (Milstein)

当 $\beta$ 数越大,收敛速度越快,越准确

Unit 3.1.4:Milstein Scheme

Consider the SDE

and its Euler scheme approximation

Thus, the Euler scheme approximates

Milstein scheme is an improvement on the Euler scheme. The main idea is to approximate the stochastic integral in a better way

Let us estimate the error in Euler scheme by applying Itô’s formula on $\sigma(t,X(t))$. Suppose $t_{n-1}=0$

Then

Combined with

So we have

$W(h)\sim o(h^{1/2})$,$\sigma$ 采用 0 点时

这一步过程看 定义

Thus, the Milstein scheme is given as $X^h(0)=x_0$

如果 payoff 只取决于最终的价格,两种方法不明显,如果取决于path,Milstein 会明显提高

Unit 3.2:Variance reduction methods

Readings for the module on variance reduction methods

  • Chapter 4 Section 4.1, 4.2 - Monte Carlo Methods in Financial Engineering by Paul Glasserman

Unit 3.2.1:Control variate method

Introduction

In financial markets and the insurance industry, most quantities of interest can be expressed as

Suppose the goal is to compute $p$. Then, use the Monte Carlo method:

  • Generate $(X_i)_{1\leq i\leq N}$ i.i.d. samples of $X$ & set

  • By the Central Limit Theorem (CLT)

    binomial 分布, p 是存在的概率

  • 95% confidence interval

The relative error is given as

which is large for small value of $p$

Thus, to estimate small $p$, using the crude Monte Carlo method, we may need a large number of samples $N$

如果 $p = 0.01 (10^{-2})$, $N$ 至少 $=10^6$ 才使得 relative error $=(\sqrt{10^4})^{-1}=10^{-2}$

As discussed in the last lecture, the estimator error is given by $s_p/\sqrt{N}$ where $s_p$ is the estimate of standard deviation of estimator $\hat p_N$

The aim of variance reduction methods is to produce an alternative estimator $\hat p_N^{var}$ such that $\mathbb{E}[\hat p_N^{var}]=p$ but with smaller variance than estimator $\hat p_N$

Control Variates

Let $Y_1, Y_2, . . . , Y_N$ be i.i.d. outputs from $N$ replications of a simulation. For example, $Y_i$ could be the discounted payoff of a derivative security on the $i$th simulated path.

Suppose that our aim is to compute $\alpha=\mathbb{E}[Y_i]$ and the crude Monte Carlo estimator $\alpha_N:=\sum_{i=1}^N\frac{Y_i}N$ is unbiased and converges with probability (w.p.) 1 as $N\to\infty$

Suppose on each replication, we generate another output $X_i$ along with $Y_i$

Suppose that $(X_i,Y_i)$ are i.i.d. $i=1,…,N$ and that $\alpha_0=\mathbb{E}[X_i]$ is known

Furthermore, denote $(X, Y)$ as a generic pair of $(X_i,Y_i)$

For fixed $b$, calculate

and compute the sample mean

$\hat\alpha_N(b)$ is a control variate estimator and the observable error $N^{-1}\sum_{i=1}^NX_i-\alpha_0$ acts as a control in estimating $\alpha$

The control variate estimator is unbiased

$\hat\alpha_N(b)$ is also strongly consistent

Each $Y_i(b)$ has variance which is equal to

$\hat\alpha_N(b)$ provides variance reduction if

Optimal coefficient $b^$ *minimizes $\sigma^2(b)$ and is given as

The ratio of the variance of optimally controlled estimator to that of uncontrolled

The variance reduction factor is given as

$Var(\hat\alpha_N)\times(1-\rho_{XY}^2)={Var(\hat\alpha_N(b^*))}$ 如果要达到原来的值,需要用 $N/(1-\rho_{XY}^2)$ 个样本

For example, for $|\rho_{XY}| = 0.95$, leads to 10-times variance reduction, whereas $|\rho_{XY}| = 0.90$, leads to 5-times variance reduction (usually 5)

Computing $b^*$

As $\alpha=\mathbb E[Y]$ is unknown, in practice, $\sigma_Y$ or $\rho_{XY}$ are also unknown

Estimate $b^*$ as

Then, instead of $Y_i(b)$, we use

Replacing $b$ with $\hat b_N$ introduces bias as the samples $Y_i(\hat b_N)$ are not independent any more, thus, application of CLT is not clear

Estimate $\alpha$ as $\hat\alpha_N(\hat b_N)=\frac1N\sum_{i=1}^NY_i(\hat b_N)$

Unit 3.2.2:Application of control variate estimator

Using Underlying Assets

Consider the Black-Scholes model

Suggest a Monte Carlo control variate method to estimate the price of option

Asian Options

An arithmetic Asian call option with discounted payoff $e^{-rT}(\bar S_m^A(T)-K)^+$ where

we have a closed-form formula for geometric Asian call option with discounted payoff $e^{-rT}(\bar S_m^G(T)-K)^+$

Unit 3.2.3:Antithetic variate method

Method of antithetic variates attempts to reduce variance by introducing negative dependence between pair of replications

  • Note that if $U$ is uniformly distributed over [0, 1] then so is $1 – U$

    • $U_i$ and $1 – U_i$ form an antithetic pair in the sense that a big value of one is accompanied by a small value of the other
    • Thus, a large value of the function value computed on one path is compensated by the small value on the antithetic path, leading to reduction in variance
  • If $Z$ is distributed as $N (0, 1)$ then so is $–Z$

    • Suppose we generate a path $Y_i$ using a sample $Z_1, Z_2 , . . . , Z_n$ of normally distributed random variable, we can create an antithetic path $\tilde Y_i$ using the sample $–Z_1, –Z_2, . . . , –Z_n$

    • Suppose we generate N antithetic pairs $(Y_i, \tilde Y_i )^N_{i=1}$

    • The confidence interval is given as

    • Antithetic sampling reduces variance when $Cov(Y_i,\tilde Y_i)<0$

Unit 4.1:Finite difference methods

Unit 4.1.1:Introduction

The finite difference method solves a partial differential equation on a discrete grid in time and space

Consider a general equation as

  • $f$ is called the source term
  • $a, b, r,$ are assumed to be bounded and continuous
  • $a$ is assumed to be nonnegative

The Black-Scholes equation is given as

we note that

Change of variable

In the Black-Scholes equation, we then plug

The transformed Black-Scholes equation with bounded coefficients is then given as

The above equation has bounded coefficients and is suitable to apply finite difference methods

Unit 4.1.2:Explicit Scheme

Consider the equation

For positive integer N discretise the domain with time step size $h_0=\frac TN$ and space step size $h_1>0$

Denote by $v_j^k\approx V(jh_1,kh_0)$, $j\in \mathbb Z$, $k=0,1,…,N$

Standard explicit finite difference scheme is written as

In the scheme, we define

The ordered form of the equation is written as

The coefficients of the values at previous time step are all nonnegative if the following monotonicity condition holds:

$||a||_\infty$ : Maximum possible value of $a$

Example: Digital Option

Consider the digital option with payoff $g(x)=1_{x>0}(x)$. Suppose $f(x,t)=0$ and $a(x,t)=1$

Suppose the spatial domain is taken as $[–1, 1]$ and $T = 1$. For the boundary condition

we solve the explicit scheme

Unit 4.1.3:Implicit Scheme

For $j\in \mathbb Z$, $k=0,1,…,N$

the fixed-point form of the equation is written as

  • The scheme is monotonic without any condition

Unit 4.1.4:Family of $\theta$ Schemes

For $\theta\in [0, 1]$, consider the following scheme

The scheme is explicit if $\theta=0$, and implicit otherwise (standard implicit for $\theta=1$)

The scheme is monotonic if

Application: Option Pricing

Consider the transformed Black-Scholes equation

Choose the domain $D=[0,T]\times[X_{min},X_{max}]$ with $-\infty<X_{min}<X_{max}<\infty$

Approximate $D$ with a uniform mesh

where $t^k=kh_0$ and $x_j=X_{min}+jh_1$ for $h_0=T/N$ and $h_1=(X_{max}-X_{min})/M$

Denote $v(t^k,x_j)=v_j^k$. Then, we use the finite difference derivative approximation based on the $\theta$ scheme

where $\theta\in [0, 1]$ is a constant parameter.

We also replace $v_j^k$ by $\theta v_j^k+(1-\theta)v_j^{k+1}$. For $\theta=0.5$ the method corresponds to Crank-Nicholson method

We can formulate a tridiagonal system at each time step $k = N, . . . , 0$ which can be solved by the well-known Thomas algorithm (MATLAB command: tridiag)

with non-zero coefficients of the tridiagonal matrix $A = (a_{ij})$ given by

The time dependent vector $b^{k+1}$ is given as:

where $v_0^{k+1},v_M^{k+1}$ are given by the boundary conditions and the remaining constants are defined as below

The $i$th coordinate of the vector $v^k$ is the approximation of the value $v(t^k,x_i)$

Unit 4.2:Optimisation methods in finance

Introduction

In finance, the parameters in stochastic models are estimated using the available market data. This procedure is called model calibration

Most calibration problems can be seen as either a root-finding procedure or as an optimisation problem

Convex Functions

Let $\mathscr U$ denote a convex set and let $f :\mathscr U \to \mathbb R$ be a function. Then, $f$ is convex if for all $u_1, u_2 \in \mathscr U$ , for all $\lambda \in [0, 1]$,

Any local minimun of a convex function is also its global minimum

If $w_1,…,w_n\ge0$ and $f_1,…,f_n$ are all convex, then $\sum_{i=1}^n w_if_i$ is also aconvex function

If $f_1,…,f_n$ are convex, then $g(x):=max(f_1,…,f_n)$ is also a convex function

Unit 4.2.1:Newton’s Method

Basic approach:

  • Approximate the function $f$ by its tangent at a given point
  • The point at which the tangent crosses the x-axis is taken as an approximate solution to the equation $f (x) = 0$
  • The process is repeated with the new point

Given an initial guess $x_0$, a sequence of points is constructed using the recurrence relation

For the method to converge, we need, at least, that $|f′′| \leq K$ and the initial guess is close to the true solution

Example: Implied Volatility

Consider $C(\sigma)$ as the call option price formula given strike $K$ and time to maturity $T$ with volatility $\sigma$ as the parameter

Aim is to find the implied volatility for call option with market price $C_0$

Given an initial guess $\sigma_0$, Newton’s method can be used to construct a sequence of points as

For error tolerance $\epsilon$, stop the iteration for some $n + 1$, when $|\sigma_{n+1} – \sigma_n| < \epsilon$ (convergence)

Unit 4.2.2:Descent Methods

To solve optimisation problems, Newton’s method cannot be used as it can guarantee only local convergence for a good initial guess

In practice, to solve optimisation problems, we need methods with global convergence, that is convergence from an arbitrary initial guess (starting point)

The overall idea of descent methods is to construct a sequence of points as

where $\mu_n$ is a non-increasing sequence of learning rate and $p_n$ is direction vector

Gradient Descent Method

For model parameter $u$ taking values in the space $\mathscr U$ , we consider the problem

The sequence of points is constructed as

where $\mu > 0$ and $\Delta J(u_n)$ is the gradient of cost function evaluated at $u_n$

$\mu$ has to be carefully chosen to ensure convergence of the method

Gradient Descent Method with Constraints

Under some constraints on the parameters, we look to solve the problem

where $K$ is a closed convex subset of the parameter space $\mathscr U$

For $\mu > 0$, a sequence of points is constructed as

where $P_K(u)$ is the orthogonal projection of $u$ on $K$

For $x \in \mathscr U$ , its orthogonal projection in $K$ is given by $x_K$ where

Penalisation of Constraints

We can convert a minimisation problem with constraints into an optimisation problem without constraints

The idea is to penalise the constraints and form a new unconstrained minimisation problem

Consider the constrained minimisation problem with continuous convex cost function $J:\mathbb R^d\to\mathbb R$

where $F : \mathbb R^d \to \mathbb R^M$ is a continuous convex function representing thec onstraints on parameter $u \in \mathbb R^d$

An unconstrained problem is then given as

where the constraint $F_i(u) \leq 0$ has been penalised

The above problem can be solved using descent methods

For small value of $\epsilon > 0$, the solution of unconstrained problem approximates the solution of constrained problem

Unit 4.2.3:Stochastic Gradient Descent

Unlike the gradient descent methods which are deterministic, stochastic gradient methods choose the direction of descent randomly

Gradient descent methods perform well for convex cost functions whereas stochastic gradient descent methods can handle non-convex cost functions

Consider the model parameter $\theta \in \mathbb R^d$

For example:

consider the Heston’s stochastic volatility model

The parameter vector $\theta = (\kappa,\theta,\delta,\rho)\in \mathbb R^4$

Next, consider the cost function $J:\mathbb R^d\to\mathbb R$

For example:

Given market prices $P^{market}$ of call options with strike
$K_j, 1 \leq j \leq N$ and time to maturity $T_i, 1 \leq i \leq M$, the cost function for Heston’s model calibration can be written as

where $P^\theta$ is the call option price function under Heston’s model

Unlike the gradient descent methods, which use the entire dataset $\{K\}^N_{j=1}$, $\{T\}^M_{i=1}$, to construct the points sequence, the stochastic gradient descent methods use only a fraction of the dataset to construct the sequence

Stochastic Gradient Descent Algorithm

Stochastic gradient descent (SGD) method constructs the sequence as

where $\mu > 0$ is a positive scalar and the update is performed for a uniformly randomly chosen sample $K_j$ and $T_i$ from the dataset

The SGD method updates frequently with a high variance which leads to a slower convergence

On the other hand, due to high variance, the SGD method is able to better explore the parameter space and is able to jump to better local minima for non-convex cost functions

Mini-batch Gradient Descent

Mini-batch gradient descent algorithm randomly picks a sample from the dataset to compute the derivative of the cost function

Uniformly randomly select m samples $(K_i,T_j)$ from the dataset $\{K\}^N_{j=1}$, $\{T\}^M_{i=1}$, and denote it by $\{X_i\}^m_{i=1}$ , called the mini-batch

Compute the derivative of the cost function using the mini-batch, that is, compute $\Delta_\theta J(\theta;\{X_i\}^m_{i=1})$

Construct the sequence using the recursion

Small batch sizes lead to noisy updates whereas large batch sizes require considerable computation time

Conclusion

Stochastic gradient descent algorithms can be further improved using different statistical features. The most famous algorithms are

  • AdaGrad
  • AdaDelta
  • AdaM

The above variations of the stochastic gradient algorithm are mostly used for very large datasets

For most model calibration purposes, mini-batch gradient descent method provides good performance


【课程】Applied Computational Finance
http://achlier.github.io/2021/05/12/Applied_Computational_Finance/
Author
Hailey
Posted on
May 12, 2021
Licensed under