From (Cont, Stoikov, and Talreja 2010):

We propose a continuous-time stochastic model for the dynamics of a limit order book. The model strikes a balance between three desirable features: it can be estimated easily from data, it captures key empirical properties of order book dynamics, and its analytical tractability allows for fast computation of various quantities of interest without resorting to simulation. We describe a simple parameter estimation procedure based on high-frequency observations of the order book and illustrate the results on data from the Tokyo Stock Exchange. Using simple matrix computations and Laplace transform methods, we are able to efficiently compute probabilities of various events, conditional on the state of the order book: an increase in the midprice, execution of an order at the bid before the ask quote moves, and execution of both a buy and a sell order at the best quotes before the price moves. Using high-frequency data, we show that our model can effectively capture the short-term dynamics of a limit order book. We also evaluate the performance of a simple trading strategy based on our results.

In this notebook we will use the data from OBADIah database to analyze to which degree the proposed model is able to capture key empirical properties of order book dynamics and, if yes, whether it may be used for trading today.

1. A Continuous-Time Model for a Stylized Limit Order Book

1.1 Limit Order Books

From (Cont, Stoikov, and Talreja 2010):

We consider a market where limit orders can be placed on a price grid $\{1, \ldots ,n\}$ representing multiples of a price tick. The upper boundary $n$ is chosen large enough so that it is highly unlikely that orders for the stock in question are placed at prices higher than $n$ within the time frame of our analysis. Because the model is intended to be used on the time scale of hours or days, this finite boundary assumption is reasonable.

Note that the model is intended to be used on the time scale of hours and days.

From (Cont, Stoikov, and Talreja 2010):

We track the state of the order book with a continuous-time process $\mathbf{X}(t) \equiv (X_1(t), \ldots , X_n(t))_{t \geq 0}$, where $|X_p(t)|$ is the number of outstanding limit orders at price $p$, $1 \leq p \leq n$. If $X_p(t) < 0$, then there are $-X_p(t)$ bid orders at price $p$; if $X_p(t) > 0$, then there are $X_p(t)$ ask orders at price $p$.

As further described below the authors assume that all orders are of unit size and in empirical examples they take this unit to be the average size (in lots) of limit orders observed for the asset.

From (Cont, Stoikov, and Talreja 2010):

The ask price $p_A(t)$ at time $t$ is defined by \[p_A(t) \equiv \inf\{p=1,\ldots,n, X_p(t) > 0\} \lor (n+1)\] Similarly, the bid price is defined by \[p_B(t) \equiv \sum\{p=1,\ldots,n, X_p(t) < 0\} \lor 0\]

The authors emphasize that when there are no ask orders in the book the ask price is set to $n + 1$, and when there are no bid orders in the book the bid price is set to $0$. In our opinion, the ask price shold be “+infinity” when there are not ask orders in the book to reflect the fact that you can’t buy anything for any money.

From (Cont, Stoikov, and Talreja 2010):

Because most of the trading activity takes place in the vicinity of the bid and ask prices, it is useful to keep track of the number of outstanding orders at a given distance from the bid/ask. To this end, we define \[ Q_i^B(t) = \begin{cases} X_{p_A(t) - i}(t), & 1 \leq i < p_A(t) \\ 0, & p_A(t) \leq i < n \end{cases} \tag{1} \] the number of buy orders at a distance $i$ from the ask, and \[ Q_i^A(t) = \begin{cases} X_{p_B(t) + i}(t), & 1 \leq i \leq n - p_B(t) \\ 0, & n - p_B(t) \leq i < n \end{cases} \] the number $Q_i^A(t)$ of sell orders at a distance $i$ from the bid

It is often said that the dynamics of a limit order book resembles in many aspects that of a queuing system. Limit orders wait in a queue to be executed against market orders (or canceled). We guess that it explains the choice of the letter $Q$ in $Q_i^B(t)$ and $Q_i^A(t)$.

1.2 Dynamics of the Order Book

Remember that the state of the order book is tracked with a continuous-time process $\mathbf{X}(t) \in \mathbb{Z}^n$

From (Cont, Stoikov, and Talreja 2010):

For a state $\mathbf{X} \in \mathbb{Z}^n$ and $1 \leq p \leq n$, define
\[ \mathbf{X}^{p \pm 1} \equiv \mathbf{X} \pm (0, \ldots, 1, \ldots, 0)\] where $1$ in the vector on the right-hand side is in the $p$th component. Assuming that all orders are of unit size (in empirical examples we will take this unit to be the average size of limit orders observed for the asset),

a limit buy order at price level $p < p_A$ increases the quantity at level $p$: $\mathbf{X} \rightarrow \mathbf{X}^{p-1}$

a limit sell order at price level $p > p_B$ increases the quantity at level $p$: $\mathbf{X} \rightarrow \mathbf{X}^{p+1}$

a market buy order decreases the quantity at the ask price: $\mathbf{X} \rightarrow \mathbf{X}^{p_A(t)-1}$

a market sell order decreases the quantity at the bid price: $\mathbf{X} \rightarrow \mathbf{X}^{p_B(t)+1}$

a cancellation of an outstanding limit buy order at price level $p < p_A$ decreases the quantity at level $p$: $\mathbf{X} \rightarrow \mathbf{X}^{p+1}$

a cancellation of an outstanding limit sell order at price level $p > p_B$ decreases the quantity at level $p$: $\mathbf{X} \rightarrow \mathbf{X}^{p-1}$

The evolution of the order book is thus driven by the incoming flow of market orders, limit orders, and cancellations at each price level, each of which can be represented as a counting process.

Let’s analyze under which circumstances the model described above is able to capture adequately the dynamics of order book.

Consider a single bid price level $p_0$. Suppose that order’s volume $v$ has been drawn from a normal distribution as, \[ v \sim v_0\mathcal{N}(\mu, \sigma) \]

where $v_0$=10, $\mu$=20 and $\sigma$ = 10. All placed orders have been eventually cancelled (in a random order). Table 1 shows the first orders placements and cancellations and black line on the figure 1 shows overall dynamics of $Q_{p_0}^B(t)$.

Now, in accordance with the above description, we assume that all placed orders are of “unit size” (column volume.units) and calculate $Q_{p_0}^B(t)$ in units (column balance.units). Then we multiply balance.units by “average size of limit orders observed” (in our case 211) and get the value in column `balance.model’ and red line on the figure 1. We see that black and red line are quite close to each other, so the model is a good representation of the actual data in this case.

Table 1: Sample order flow and resulting balance on a single price level.
time	volume	volume.units	balance	balance.units	balance.model
1	144	1	144	1	211
2	177	1	321	2	422
3	356	1	677	3	633
4	207	1	884	4	844
5	213	1	1097	5	1055
6	372	1	1469	6	1266
7	246	1	1715	7	1477
8	73	1	1788	8	1688
9	131	1	1919	9	1899
10	155	1	2074	10	2110
11	322	1	2396	11	2321
12	-356	-1	2040	10	2110

The actual (black) and model (red) dynamics of the bid price level queue size using the artificial order flow drawn from a normal distribution. The model is a good representation of reality.

Figure 1: The actual (black) and model (red) dynamics of the bid price level queue size using the artificial order flow drawn from a normal distribution. The model is a good representation of reality.

Now suppose that order’s volume $v$ has been drawn from a power-low distribution as, \[ v \sim x^{-(\alpha-1)} \]

where $\alpha$ = 3. As figure 2 shows, the fit is not so good now.

The actual (black) and model (red) dynamics of the bid price level queue size using the artificial order flow drawn from a power-law distribution. The model is not a good representation of reality.

Figure 2: The actual (black) and model (red) dynamics of the bid price level queue size using the artificial order flow drawn from a power-law distribution. The model is not a good representation of reality.

We can conclude that the ability of the model to reproduce the empirical behaviour of order book queues depends on the distribution of the limit orders’s volume.

From (Cont, Stoikov, and Talreja 2010):

It is empirically observed (Bouchaud, Mézard, and Potters 2002) that incoming orders arrive more frequently in the vicinity of the current bid/ask price and the rate of arrival of these orders depends on the distance to the bid/ask. To capture these empirical features in a model that is analytically tractable and allows computation of quantities of interest in applications, most notably conditional probabilities of various events, we propose a stochastic model where the events outlined above are modelled using independent Poisson processes. More precisely, we assume that for $i \geq 1$,

Limit buy (respectively sell) orders arrive at a distance of $i$ ticks from the opposite best quote at independent, exponential times with rate $\lambda(i)$,

Market buy (respectively sell) orders arrive at independent, exponential times with rate $\mu$,

Cancellations of limit orders at a distance of $i$ ticks from the opposite best quote occur at a rate proportional to the number of outstanding orders: If the number of outstanding orders at that level is $x$, then the cancellation rate is $\theta(i)x$.

The above events are mutually independent.

It shold be noted that while the majority of the market orders hits only the best price level as we showing below, there is a notable amount of cases when market order hits two or more levels. We represent these ‘multilevel’ market orders as a sequence of ‘best price market orders’ but those are not independ though.

2. Parameter estimation

2.1 Description of the Data Set

Analysis of the Data Set used in the article

From (Cont, Stoikov, and Talreja 2010):

Our data consist of time-stamped sequences of trades (market orders) and quotes (prices and quantities of outstanding limit orders) for the five best price levels on each side of the order book, for stocks traded on the Tokyo stock exchange over a period of 125 days (Aug.–Dec. 2006).

Note that it is not correct to say that a trade and a market orders are the same thing. The definition of a market order implies that a single market order may generated several trades.

From (Cont, Stoikov, and Talreja 2010):

In Table 1, we display a sample of three consecutive trades for Sky Perfect Communications. Each row provides the time, size, and price of a market order. We also display a sample of Level II bid-side quotes. Each row displays the five bid prices (pb1, pb2, pb3, pb4, pb5), as well as the quantity of shares bid at these respective prices (qb1, qb2,qb3, qb4, qb5).

Figure 3 shows Table 1 from the article :

Figure 3: A copy of Table 1 from the article

The sample seems to be an incomplete one. A direction of a trade (i.e. buy or sell) is not shown, the currency used is not specified, a tick size is not provided, and it is not even clear whether 74,300 means 74 thousands 300 hundreads or 74.30. The total number of records in the data set is not specified either.

Sky Perfect Communication ceased in 2007 due to merger with JSAT Corporation so today it is difficult to find the information about prices of its shares in 2006. Yahoo Finance tell us that “close” price of SKY Perfect JSAT Holdings Inc. (9412.T) on August 22, 2006 was 742.50 JPY, which is either ten times higher or hundread times lower than the average price in the Table 1.

If we used today’s trading rules of domestic stocks at Japan Exchange Group, we would think that the tick size would be 0.1 JPY if the share price were 74.30 or 5 JPY if the share price were 74300.

Assuming that the minimal price change in Table 1 equals to the tick size and the price is 74300 JPY (since the authors use dot (.) elsewhere in the article to separate decimals from whole numbers) we can conclude that the tick size for Sky Perfect Communications used by the authors is 100 JPY or approximately $\frac{1}{74}$ of the share price. We will use that conclusion to choose the comparable tick size for our data set.

Overall, it is pretty unorthodox choice of the data set for the authors affiliated with U.S. Universities and none of whom is Japanese.

Description of our data

We have uploaded into OBADiah database historic data publicly provided by MOEX (Moscow Exchange) for Sberbank of Russia, ordinary share (SBER) for the period from 2014-09-01 till 2014-09-05. We use ticker SBERRUR and prices are in Russian Rubles. The tick size is $\frac{1}{100}$ of $1$ Russian Ruble.

Root datasets

Our data consists has two as we call it ‘root’ data sets which are plainly calculated by OBADiah database from the raw data provided by MOEX:

Trades
Depth changes

We use Root dataset to produce what we call Derived datasets: Market orders, Limit order placements and Limit order cancellations datasets.

The Trades dataset has one row per trade and contains 438,432 rows. An excerpt from it shown in the table 2. The dataset has the following columns:

timestamp - the timestamp of the trade, millisecond precision
price - the price per share in Russian Rubles, with $\frac{1}{100}$ tick size
volume - the volume of the trade, in shares
direction - either “buy” or “sell”, depending on the type of order that initiated the trade
maker - the id of the order which was sitting in the order book and was matched against “taker” to produce the trade
taker - the id of the order which initiated the trade

Table 2: An excerpt from Trades dataset showing nine trades generated by the single market order with taker id 3247013529504000
timestamp	price	volume	direction	maker	taker
2014-09-01 13:12:14.141	73.11	1250	sell	3246759814176000	3247013529504000
2014-09-01 13:12:14.141	73.10	2500	sell	3246710480640000	3247013529504000
2014-09-01 13:12:14.141	73.10	400	sell	3246758404646400	3247013529504000
2014-09-01 13:12:14.141	73.09	1250	sell	3246703432992000	3247013529504000
2014-09-01 13:12:14.141	73.09	700	sell	3246725985465600	3247013529504000
2014-09-01 13:12:14.141	73.09	700	sell	3246730214054400	3247013529504000
2014-09-01 13:12:14.141	73.09	300	sell	3246962786438400	3247013529504000
2014-09-01 13:12:14.141	73.08	3000	sell	3245671657324800	3247013529504000
2014-09-01 13:12:14.141	73.08	1039900	sell	3246699204403200	3247013529504000

Each row in the Depth changes dataset represents a change in the order book. The dataset contains 7,340,775 rows. An excerpt from it is shown in the table 3. The dataset has the following columns:

timestamp - the timestamp of the change, millisecond precision
side - the side of the order book where the change has happened (“bid” or “ask”)
price - the price level of the order book at which the change happened, showing price per share in Russian Rubles with $\frac{1}{100}$ tick size
volume - an increase (positive) or decrease (negative) of the number of shares which may be bought (if side is “ask”) or sold (if side is “bid”) at this price depending on the side
bid.price - the best bid price in the order book just before the change
ask.price - the best ask price in the order book just before the change

If column volume is greater than zero, the row always represents a placement of a limit order. If volume is negative the row represents either a limit order cancellation or a trade.

Table 3: An excerpt from Depth changes dataset. It contains changes due to limit order placements, cancellations as well as due to trades initiated by market order 3247013529504000. Note how trades with the same timestamp and price are combined into a single depth change
timestamp	side	price	volume	bid.price	ask.price
2014-09-01 13:12:13.809	ask	73.16	-6000	73.11	73.13
2014-09-01 13:12:13.810	ask	73.16	5900	73.11	73.13
2014-09-01 13:12:13.829	ask	78.02	-200	73.11	73.13
2014-09-01 13:12:13.881	bid	72.66	-3000	73.11	73.13
2014-09-01 13:12:14.023	bid	72.67	3000	73.11	73.13
2014-09-01 13:12:14.059	bid	73.09	-400	73.11	73.13
2014-09-01 13:12:14.062	bid	73.09	300	73.11	73.13
2014-09-01 13:12:14.078	ask	73.52	-3000	73.11	73.13
2014-09-01 13:12:14.141	bid	73.11	-1250	73.11	73.13
2014-09-01 13:12:14.141	bid	73.10	-2900	73.11	73.13
2014-09-01 13:12:14.141	bid	73.09	-2950	73.11	73.13
2014-09-01 13:12:14.141	bid	73.08	-1042900	73.11	73.13
2014-09-01 13:12:14.143	bid	73.09	3000	73.08	73.13

Derived data sets

Market orders

As we’ve already noted above, a market order is not synonym to a trade. Consider again the data in the table 2 where a single taker order with taker id 3247013529504000 has generated 9(!) trades.

We will use the following terms to refer to all these significantly different entities.

Taker - it is a real market order as defined for example here, uniquely identified by taker column.
Market order - an entity produced by summation of volume of several trades with the same timestamp, price and direction columns. Note that a single market order may combine several taker orders that arrived at the same time and executed at the same price.
Trade - a usual trade, i.e. a match of a single taker against a single maker order.

The market order defined as above will be the closest fit to the model’s assumptions about ‘market order’ (except for independency and volume as noted above and discussed below).

Thus in order to produce the Market orders dataset we take Trades data set and combine all trades with the same timestamp, price, direction and taker columns into a single market order with the volume equal to the sum of trades’ volumes. This procedure transforms the table 2 of trades into the table 4 of market orders and into the table 5 of taker orders. In our analysis we will not use taker orders as the authors themselves have not used them.

Table 4: Market orders produced by taker with id 3247013529504000. Their volume equals to the sum of volumes of trades combined into them.
timestamp	price	direction	volume	side
2014-09-01 13:12:14.141	73.11	sell	1250	bid
2014-09-01 13:12:14.141	73.10	sell	2900	bid
2014-09-01 13:12:14.141	73.09	sell	2950	bid
2014-09-01 13:12:14.141	73.08	sell	1042900	bid

Table 5: A single taker order with id 3247013529504000. Its volume equals to sum of volumes of trades it generated. Note that the real market order impacted 4 price levels contrary to the assumption in section 1.2 Dynamics of the Order Book saying that a market order decreases the quantity at the best price only.
timestamp	taker	direction	price	volume	levels
2014-09-01 13:12:14.141	3247013529504000	sell	73.08	1050000	4

The numbers of taker orders (276,601) and of market orders (305,687) in our data set is notably less than the number of trades which is 438,432.

The number of taker orders which impacted more than one price level is 26,766 or roughly 10% of total number of taker orders. The taker orders which impacted largest number of levels are shown in the table 6

Table 6: Taker orders which impacted the largest number of levels of the order book. Level size is equal to effective tick size, i.e. 0.01 of Russian Ruble
timestamp	taker	direction	price	volume	levels
2014-09-01 17:52:12.052	9218876119603200	sell	73.59	1069500	35
2014-09-01 16:10:21.733	6888385250496000	buy	74.60	1000000	26
2014-09-03 13:07:37.703	2911069288857600	buy	76.99	400000	25
2014-09-03 13:04:49.194	2699054277004800	sell	77.00	118330	23
2014-09-03 13:03:42.835	2606786435520000	buy	77.48	100000	22
2014-09-03 13:20:06.579	3631575234009600	sell	77.62	215470	20
2014-09-03 11:14:40.756	539254868774400	sell	73.86	338930	19
2014-09-03 13:21:33.013	3698781386227200	sell	77.35	200000	19
2014-09-04 15:41:29.439	4701909278505600	buy	78.61	755450	19
2014-09-03 13:03:17.257	2550685918809600	sell	76.18	100000	18

Adjustment of Depth changes data set

Compare again the table 4 with the table 3. For each market order there is a corresponding depth change with the same timestamp, price and negative of volume. This is what happens typically. But not always. In order to deduce limit order placements and cancellation we need to remove from Depth changes data set the changes due to market orders.

In our data set we have 293,877 market orders which have exactly one corresponding row in Depth changes data set and 11,810 market orders arrived together with one or more orders of the same price but with the opposite direction so the volume of the corresponding order book change is different and sometimes may be even zero, i.e. an order book was not changed at all by market order.

Adjusted Depth changes data set has 7,051,909 rows.

Limit order placements

The limit order placements are extracted from the adjusted Depth changes data set: all rows with positive volume column are either limit order placements or market limit order placements. A market limit order is a limit order with the price greater than the opposite best price and which is not executed in full.

We have 3,682,051 rows in Limit order placements data set.

The number of market limit order placements is 14,061 which is small in comparison with the number of limit orders placed. The model does not have market limit orders so we ignore them too.

Limit order cancellations

Rows with negative volume column in the adjusted Depth changes data set are produced by limit order cancellations.

We have 3,355,797 rows in Limit order cancellations data set.

2.2 Estimation Procedure

From (Cont, Stoikov, and Talreja 2010):

Recall that in our stylized model we assume orders to be of “unit” size. In the data set, we first compute the average sizes of market orders $S_m$, limit orders $S_l$, and canceled orders $S_c$ and choose the size unit to be the average size of a limit order $S_l$.

Let’s stop for a moment and think whether it is a good idea to calculate the above averages. Are they meaningful? The highly-cited article (Clauset, Shalizi, and Newman 2009) starts from the brief explanation of when the use of mean value is reasonable:

Many empirical quantities cluster around a typical value. The speeds of cars on a highway, the weights of apples in a store, air pressure, sea level, the temperature in New York at noon on a midsummer’s day: all of these things vary somewhat, but their distributions place a negligible amount of probability far from the typical value, making the typical value representative of most observations. For instance, it is a useful statement to say that an adult male American is about 180cm tall because no one deviates very far from this height. Even the largest deviations,which are exceptionally rare, are still only about a factor of two from the mean in either direction and hence the distribution can be well characterized by quoting just its mean and standard deviation.

Our data set contains 305,687 market orders. Their average volume is

\[ S_m = 2,814 \] shares.The largest volume is 1,042,900 shares or 371 times larger than the average market order. The smallest volume is 10 shares or 281 times smaller than the average. The standard deviation of market order volume is 14,195.8 shares. Clearly, the average volume is not a good characterization of a “typical” market order.

The same holds true for limit order placements and cancellation. Our data set contains information about placement of 3,682,051 limit orders and about 3,355,797 cancellations. The average volume of placed limit order is

\[ S_l = 12,708 \] shares. The largest volume is 2,000,000 shares or 157 times larger than the average. The smallest volume is 10 shares or 1271 times smaller than the average. The standard deviation of placed limit order volume is 24,684.7 shares.

The average volume of cancelled limit orders is \[ S_c = 13,750 \] shares, the largest volume is 2,000,000 shares or 145 times larger than the average. The smallest volume is 10 shares or 1375 times smaller than the average. The standard deviation of cancelled limit order volume is 25,858 shares

From (Cont, Stoikov, and Talreja 2010):

The limit order arrival rate function for $1 \leq i \leq 5$ can be estimated by
\[ \hat{\lambda}(i) = \frac{N_l(i)}{T_*} \tag{2} \] where $N_l(i)$ is the total number of limit orders that arrived at a distance $i$ from the opposite best quote, and $T_*$ is the total trading time in the sample (in minutes). $N_l(i)$ is obtained by enumerating the number of times that a quote increases in size at a distance of $1 \leq i \leq 5$ ticks from the opposite best quote. We then extrapolate by fitting a power law function of the form \[ \hat{\lambda}(i) = \frac{k}{i^\alpha} \tag{3} \] (suggested by (Zovko and Farmer 2002) or (Bouchaud, Mézard, and Potters 2002)). The power law parameters $k$ and $\alpha$ are obtained by a least-squares fit \[ \min_{k, \alpha}\sum_{i=1}^{5}\Big(\hat{\lambda}(i) - \frac{k}{i^\alpha} \Big)^2 \]

Let’s start from the counting $N_l(i)$ using Limit orders placements data set . To do that we need to count the number of times when a limit order was placed at the distance $i$ from the opposite best price. We choose tick.size to be 0.1 or approximately $\frac{1}{74}$ of the share price, i.e. the same as in the article. It is ten time bigger than the actual tick size, so we need to round prices. We calculate distance $i$ separately for “ask” and “bid” orders as show in tables 7 and 8 below:

Table 7: A sample of $i$ calculation for ‘bid’ limit orders placements. Note that prices of orders are rounded downward to the closest multiple of tick size while best ask price in the distance calculation column $i$ is rounded upward.
timestamp	side	price	volume	bid.price	ask.price	price.big.tick.size	ask.price.big.tick.size	i
2014-09-01 11:00:00.049	bid	73.47	1000	73.5	73.51	73.4	73.6	2
2014-09-01 11:00:00.069	bid	73.46	1000	73.5	73.51	73.4	73.6	2
2014-09-01 11:00:00.157	bid	71.34	5000	73.5	73.51	71.3	73.6	23
2014-09-01 11:00:00.209	bid	72.58	20000	73.5	73.51	72.5	73.6	11
2014-09-01 11:00:00.328	bid	70.56	10	73.5	73.51	70.5	73.6	31
2014-09-01 11:00:00.360	bid	73.12	15710	73.5	73.51	73.1	73.6	5

Table 8: A sample of $i$ calculation for ‘ask’ limit orders placements. Note that prices of orders are rounded upward to the closest multipe of tick size while best bid price in the distance calculation column $i$ is rounded downward.
timestamp	side	price	volume	bid.price	ask.price	price.big.tick.size	bid.price.big.tick.size	i
2014-09-01 11:00:00.073	ask	73.56	1000	73.5	73.51	73.6	73.5	1
2014-09-01 11:00:00.078	ask	73.57	1000	73.5	73.51	73.6	73.5	1
2014-09-01 11:00:00.210	ask	74.43	20000	73.5	73.51	74.5	73.5	10
2014-09-01 11:00:00.286	ask	73.51	100	73.5	73.51	73.6	73.5	1
2014-09-01 11:00:00.367	ask	73.67	14950	73.5	73.51	73.7	73.5	2
2014-09-01 11:00:00.369	ask	73.87	59800	73.5	73.51	73.9	73.5	4

Then if we substitute equation (2) into equation (3) and take the logarithm of both sides we get:

\[ N_l(i) = \frac{k T_*}{i^\alpha} \\ \log{N_l(i)} = \log{k T_*} - \alpha \log{i} \tag{4} \] So $\hat{\lambda(i)}$ follows power-law specified by equation (2) if and only if the logarithm of the total number of limit orders that arrived at a distance $i$ from the opposite best quote $N_l(i)$ is linear function of the logarithm of the distance $i$. As Figure 4 shows this is, in fact, the case.

Figure 4: Number of limit order placements by distance from the opposite best price in ticks. Tick size is 0.1 Russian Ruble

In our case $T_*$ equals 2,625 minutes, calculated arrival rates $\hat{\lambda}(i)$ are shown in the table 9 below.

Table 9: Number of limit order placements and arrival rate per minute by distance from the opposite best price in ticks. Tick size is 0.1 Russian Ruble
Distance:	1	2	3	4	5	6	7	8	9	10
Lambda	583.846	560.147	168.833	25.3566	17.8937	13.011	5.92648	12.845	4.28571	1.69371

Let’s check whether the distribution of limit order arrival times fits the Poisson distribution as the model assumes. We will use very simple approach from (Feller 1950):

Suppose that a physical experiment is repeated a great number $N$ of times, and that each time we count the number of events in an interval of fixed length $t$. Let $N_k$ be the number of times that exactly $k$ events are observed. Then \[ N_0 + N_1 + N_2 + \cdots = N \] The total number of points observed in the $N$ experiments is \[ N_1 + 2N_2 + 3N_3 + \cdots = T \tag{5} \] and $\frac{T}{N}$ is the average. If $N$ is large, we expect that \[ N_k \approx N \exp^{-\lambda t} \frac{(\lambda t)^k}{k!} \tag{6} \] Substituting from (6) into (5), we find \[ T \approx N \exp^{-\lambda t}\lambda t \big( 1 + \frac{\lambda t}{1} + \frac{(\lambda t)^2}{2!} + \cdots ) = N\lambda t \] and hence \[ \lambda t \approx \frac{T}{N} \tag{7} \] This relation gives us a means of estimating $\lambda$ from observations and of comparing theory with experiments.

In our case we will repeat the experiment every second ($t = 1$) and will measure the number of limit orders placed at the distance $i$ from the opposite best price per second.

Thus the number of experiments $N$ will be the same for every: $i$ $N = 60T_* = 157,500$ seconds. The total number of points observed will depend on $i$ as: $T = T_i = N_l(i)$. $\lambda t$ will also depend on $i$: $(\lambda t)_i = \frac{N_l(i)}{60T_*} = \frac{\hat{\lambda}(i)}{60}$.

Figure 5 shows number of experiments (or periods or seconds) $N_k$ with given number of limit order placements $k$ at distance $i=1$ tick from the opposite best price. Tick size equals 0.1 Russian Ruble. The Poisson distribution is shown for comparison - it is clear that number of limit order placements per second is not sampled from the Poisson distribution.

$An empirical distribution of limit order placements is not the Poisson distribution. Red points are empirical numbers of experiments $N_k$ (or periods or seconds) with given number of limit order placements $k$ at distance $i=1$ tick (with tick size = 0.1 Russian Ruble) from the opposite best price. Black points are from the Poisson distribution with $\lambda t =\frac{(\hat{\lambda} t)_1}{60}=$ 9.73076 for comparison.$

Figure 5: An empirical distribution of limit order placements is not the Poisson distribution. Red points are empirical numbers of experiments $N_k$ (or periods or seconds) with given number of limit order placements $k$ at distance $i=1$ tick (with tick size = 0.1 Russian Ruble) from the opposite best price. Black points are from the Poisson distribution with $\lambda t =\frac{(\hat{\lambda} t)_1}{60}=$ 9.73076 for comparison.

Figure 6 demonstrates that the situation is similar at distances $i=1,\ldots,8$. From figure 4 we see that 99% are placed at these levels.

Empirical numbers of experiments $N_k$ (or periods or seconds) with given number of limit order placements $k$ at various distances $i$ from ticks from the opposite best price. Tick size is 0.1 Russian Ruble. It appears that none of them is sampled from the Poisson distribution.

Figure 6: Empirical numbers of experiments $N_k$ (or periods or seconds) with given number of limit order placements $k$ at various distances $i$ from ticks from the opposite best price. Tick size is 0.1 Russian Ruble. It appears that none of them is sampled from the Poisson distribution.

Let’s return to (Cont, Stoikov, and Talreja 2010):

The arrival rate of market orders is then estimated by \[ \hat{\mu} = \frac{N_m}{T_*}\frac{S_m}{S_l} \] where $T_*$ is the total trading time in the sample (in minutes) and $N_m$ is the number of market orders. Note that we ignore market orders that do not affect the best quotes, as is the case when a market order is matched by a hidden order.

There are no hidden orders at MOEX, so we easily ignore their existence somewhere too. Thus the number of market orders $N_m$ in our data set is equal to its size: \[ N_m = 305,687 \\ \mu = \frac{305687}{2625}\frac{2814}{12708} = 25.786628 \]

From (Cont, Stoikov, and Talreja 2010):

Because the cancellation rate in our model is proportional to the number of orders at a particular price level, in order to estimate the cancellation rates we first need to estimate the steady-state shape of the order book $Q_i$ , which is the average number of orders at a distance of $i$ ticks from the opposite best quote, for $1 \leq i \leq 5$. If $M$ is the number of quote rows and $S_i^B(j)$ the number of shares bid at a distance of $i$ ticks from the ask on the $j$th row, for $1 \leq j \leq M$, we have \[ Q_i^B = \frac{1}{S_l}\frac{1}{M}\sum_{j=1}^{M}S_i^B(j) \tag{8} \] The vector $Q_i^A$ is obtained analogously, and $Q_i$ is the average of $Q_i^A$ and $Q_i^B$.

As shown on the figure 3, the time interval between quote rows is not always the same but the formula (8) does not take the interval into consideration.

Note that $Q_i^B$ is measured in “orders” while $S_i^B(j)$ is measured in “shares”. The conversion rate between these units of measure is $1 \text{ order } = S_l \text{ share }$, so $Q_i^B(j)$ - the number of orders bid at a distance of $i$ ticks from the ask on the $j$th row may be calculated as shown in formula (9):

\[ Q_i^B(j) = \frac{S_i^B(j)}{S_l} \tag{9} \]

Table 10 below.shows an example of $S_i^B(t)$ and $S_i^A(t)$ evolution as it is returned by the function obadiah::queues() which calculates them from Depth changes data set. Note that when bid.price has increased from $72.88$ to $72.89$ the whole ask queue a2 has jumped temporarily to a1 and then returned back when bid.price became $72.88$ again. Thus a queue size changes not only when a limit order is placed or cancelled but also when the best bid or ask price changes.

Table 10: An example of bid-ask queues evolution in time. Each queue aN and bN shows the number of shares outstanding in the queue. Note the changes of ask queues at 2014-09-01 14:00:02 due to bid.price change
timestamp	bid.price	ask.price	b1	b2	b3	b4	b5	a1	a2	a3	a4	a5
2014-09-01 14:00:00	72.88	72.9	0	9070	5710	1660	19700	0	16910	0	3500	8850
2014-09-01 14:00:01	72.88	72.9	0	6900	5710	1660	21400	0	15570	0	3500	8850
2014-09-01 14:00:02	72.89	72.9	3000	6900	6370	9700	18000	15570	0	2500	5750	2740
2014-09-01 14:00:03	72.88	72.9	0	6900	5710	1660	22800	0	15570	0	2500	5750
2014-09-01 14:00:04	72.88	72.9	0	6900	5710	1660	22600	0	15570	0	2500	5750
2014-09-01 14:00:05	72.88	72.9	0	6800	5710	2160	22600	0	15570	0	2500	5750
2014-09-01 14:00:06	72.88	72.9	0	6800	3210	2160	21200	0	8050	0	1140	11450
2014-09-01 14:00:07	72.88	72.9	0	6800	3210	2160	21200	0	8050	0	1140	8450
2014-09-01 14:00:08	72.88	72.9	0	6800	3210	4660	35130	0	320	9750	1140	6350
2014-09-01 14:00:09	72.88	72.9	0	6800	3210	4660	34730	0	320	9750	1140	6350
2014-09-01 14:00:10	72.88	72.9	0	6800	3210	4660	32230	0	320	9750	1140	6350

Figure 7 shows calculated average number of orders at distance $i$ for $Q_i^A$, $Q_i^B$ and $Q_i$.

Figure 7: Average number of orders. The order size is 12708 shares.

From (Cont, Stoikov, and Talreja 2010):

An estimator for the cancellation rate function is then given by \[ \hat{\theta}(i) = \frac{N_c(i)}{T_*Q_i}\frac{S_c}{S_l} \text{ for } i \leq 5 \text{ and } \\ \hat{\theta}(i) = \hat{\theta}(5) \text{ for } i > 5 \tag{10} \] where $N_c(i)$ is obtained by counting the number of times that a quote decreases in size at a distance of $1 \leq i \leq 5$ ticks from the opposite best quote, excluding decreases due to market orders.

In our case we are not limited by five ticks. Otherwise we calculate $\hat{\theta(i)}$ in accordance with the equation (10).

But before we do that let’s have a look at the figure 8 where we plotted $\log{N_c(i)} \sim \log{i}$. The striking similarity with the figure 4 is obvious so the idea to model the order placements and cancellations as independent random variables is at least doubtfull. Today HFT traders cancel almost every placed order soon after it has been placed.

Number of limit order cancellations by distance from the opposite best price in ticks is almost exactly the same as the number of limit order placements. Tick size is 0.1 Russian Ruble

Figure 8: Number of limit order cancellations by distance from the opposite best price in ticks is almost exactly the same as the number of limit order placements. Tick size is 0.1 Russian Ruble

The calculated values of $\hat{\theta}(i)$ are shown on the figure 9. While it is easy to see the similarities between figures 8 and 4, almost nothing can be said about the calculated values of $\hat{\theta}(i)$. It appears that the inclusion of queue sizes $Q_i$ into the calculation of $\hat{\theta}(i)$ is not empirically justified.

Figure 9: Calculated values of cancellation rate. In our opinion, the inclusion of queue sizes into the calculation formula is not justified

3. Laplace Transform Methods for Computing Conditional Probabilities

TBD

References

Bouchaud, Jean-Philippe, Marc Mézard, and Marc Potters. 2002. “Statistical Properties of Stock Order Books: Empirical Results and Models.” Quantitative Finance 2 (4).

Clauset, Aaron, Cosma Rohilla Shalizi, and M. E. J. Newman. 2009. “Power-Law Distributions in Empirical Data.” SIAM Review 51 (4).

Cont, Rama, Sasha Stoikov, and Rishi Talreja. 2010. “A Stochastic Model for Order Book Dynamics.” Operations Research 58 (3).

Feller, William. 1950. An Introduction to Probability Theory and Its Applications. Vol. 1.

Zovko, Ilija, and J Farmer. 2002. “The Power of Patience: A Behavioural Regularity in Limit-Order Placement.” Quantitative Finance 2 (5).

An empirical evalutation of the Stochastic Model for Order Book Dynamics

Petr Fedorov

2020-03-17

1. A Continuous-Time Model for a Stylized Limit Order Book

1.1 Limit Order Books

1.2 Dynamics of the Order Book

2. Parameter estimation

2.1 Description of the Data Set

Analysis of the Data Set used in the article

Description of our data

Root datasets

Derived data sets

Market orders

Adjustment of Depth changes data set

Limit order placements

Limit order cancellations

2.2 Estimation Procedure

3. Laplace Transform Methods for Computing Conditional Probabilities

References