From (Cont, Stoikov, and Talreja 2010):

We propose a continuous-time stochastic model for the dynamics of a limit order book. The model strikes a balance between three desirable features: it can be estimated easily from data, it captures key empirical properties of order book dynamics, and its analytical tractability allows for fast computation of various quantities of interest without resorting to simulation. We describe a simple parameter estimation procedure based on high-frequency observations of the order book and illustrate the results on data from the Tokyo Stock Exchange. Using simple matrix computations and Laplace transform methods, we are able to efficiently compute probabilities of various events, conditional on the state of the order book: an increase in the midprice, execution of an order at the bid before the ask quote moves, and execution of both a buy and a sell order at the best quotes before the price moves. Using high-frequency data, we show that our model can effectively capture the short-term dynamics of a limit order book. We also evaluate the performance of a simple trading strategy based on our results.

In this notebook we will use the data from OBADIah database to analyze to which degree the proposed model is able to capture key empirical properties of order book dynamics and, if yes, whether it may be used for trading today.

1. A Continuous-Time Model for a Stylized Limit Order Book

1.1 Limit Order Books

From (Cont, Stoikov, and Talreja 2010):

We consider a market where limit orders can be placed on a price grid \(\{1, \ldots ,n\}\) representing multiples of a price tick. The upper boundary \(n\) is chosen large enough so that it is highly unlikely that orders for the stock in question are placed at prices higher than \(n\) within the time frame of our analysis. Because the model is intended to be used on the time scale of hours or days, this finite boundary assumption is reasonable.

Note that the model is intended to be used on the time scale of hours and days.

From (Cont, Stoikov, and Talreja 2010):

We track the state of the order book with a continuous-time process \(\mathbf{X}(t) \equiv (X_1(t), \ldots , X_n(t))_{t \geq 0}\), where \(|X_p(t)|\) is the number of outstanding limit orders at price \(p\), \(1 \leq p \leq n\). If \(X_p(t) < 0\), then there are \(-X_p(t)\) bid orders at price \(p\); if \(X_p(t) > 0\), then there are \(X_p(t)\) ask orders at price \(p\).

As further described below the authors assume that all orders are of unit size and in empirical examples they take this unit to be the average size (in lots) of limit orders observed for the asset.

From (Cont, Stoikov, and Talreja 2010):

The ask price \(p_A(t)\) at time \(t\) is defined by \[p_A(t) \equiv \inf\{p=1,\ldots,n, X_p(t) > 0\} \lor (n+1)\] Similarly, the bid price is defined by \[p_B(t) \equiv \sum\{p=1,\ldots,n, X_p(t) < 0\} \lor 0\]

The authors emphasize that when there are no ask orders in the book the ask price is set to \(n + 1\), and when there are no bid orders in the book the bid price is set to \(0\). In our opinion, the ask price shold be “+infinity” when there are not ask orders in the book to reflect the fact that you can’t buy anything for any money.

From (Cont, Stoikov, and Talreja 2010):

Because most of the trading activity takes place in the vicinity of the bid and ask prices, it is useful to keep track of the number of outstanding orders at a given distance from the bid/ask. To this end, we define \[ Q_i^B(t) = \begin{cases} X_{p_A(t) - i}(t), & 1 \leq i < p_A(t) \\ 0, & p_A(t) \leq i < n \end{cases} \tag{1} \] the number of buy orders at a distance \(i\) from the ask, and \[ Q_i^A(t) = \begin{cases} X_{p_B(t) + i}(t), & 1 \leq i \leq n - p_B(t) \\ 0, & n - p_B(t) \leq i < n \end{cases} \] the number \(Q_i^A(t)\) of sell orders at a distance \(i\) from the bid

It is often said that the dynamics of a limit order book resembles in many aspects that of a queuing system. Limit orders wait in a queue to be executed against market orders (or canceled). We guess that it explains the choice of the letter \(Q\) in \(Q_i^B(t)\) and \(Q_i^A(t)\).

1.2 Dynamics of the Order Book

Remember that the state of the order book is tracked with a continuous-time process \(\mathbf{X}(t) \in \mathbb{Z}^n\)

From (Cont, Stoikov, and Talreja 2010):

For a state \(\mathbf{X} \in \mathbb{Z}^n\) and \(1 \leq p \leq n\), define
\[ \mathbf{X}^{p \pm 1} \equiv \mathbf{X} \pm (0, \ldots, 1, \ldots, 0)\] where \(1\) in the vector on the right-hand side is in the \(p\)th component. Assuming that all orders are of unit size (in empirical examples we will take this unit to be the average size of limit orders observed for the asset),

  • a limit buy order at price level \(p < p_A\) increases the quantity at level \(p\): \(\mathbf{X} \rightarrow \mathbf{X}^{p-1}\)
  • a limit sell order at price level \(p > p_B\) increases the quantity at level \(p\): \(\mathbf{X} \rightarrow \mathbf{X}^{p+1}\)
  • a market buy order decreases the quantity at the ask price: \(\mathbf{X} \rightarrow \mathbf{X}^{p_A(t)-1}\)
  • a market sell order decreases the quantity at the bid price: \(\mathbf{X} \rightarrow \mathbf{X}^{p_B(t)+1}\)
  • a cancellation of an outstanding limit buy order at price level \(p < p_A\) decreases the quantity at level \(p\): \(\mathbf{X} \rightarrow \mathbf{X}^{p+1}\)
  • a cancellation of an outstanding limit sell order at price level \(p > p_B\) decreases the quantity at level \(p\): \(\mathbf{X} \rightarrow \mathbf{X}^{p-1}\)

The evolution of the order book is thus driven by the incoming flow of market orders, limit orders, and cancellations at each price level, each of which can be represented as a counting process.

Let’s analyze under which circumstances the model described above is able to capture adequately the dynamics of order book.

Consider a single bid price level \(p_0\). Suppose that order’s volume \(v\) has been drawn from a normal distribution as, \[ v \sim v_0\mathcal{N}(\mu, \sigma) \]

where \(v_0\)=10, \(\mu\)=20 and \(\sigma\) = 10. All placed orders have been eventually cancelled (in a random order). Table 1 shows the first orders placements and cancellations and black line on the figure 1 shows overall dynamics of \(Q_{p_0}^B(t)\).

Now, in accordance with the above description, we assume that all placed orders are of “unit size” (column volume.units) and calculate \(Q_{p_0}^B(t)\) in units (column balance.units). Then we multiply balance.units by “average size of limit orders observed” (in our case 211) and get the value in column `balance.model’ and red line on the figure 1. We see that black and red line are quite close to each other, so the model is a good representation of the actual data in this case.

Table 1: Sample order flow and resulting balance on a single price level.
time volume volume.units balance balance.units balance.model
1 144 1 144 1 211
2 177 1 321 2 422
3 356 1 677 3 633
4 207 1 884 4 844
5 213 1 1097 5 1055
6 372 1 1469 6 1266
7 246 1 1715 7 1477
8 73 1 1788 8 1688
9 131 1 1919 9 1899
10 155 1 2074 10 2110
11 322 1 2396 11 2321
12 -356 -1 2040 10 2110
The actual (black) and model (red) dynamics of the bid price level queue size using the artificial order flow drawn from a normal distribution. The model is a good representation of reality.

Figure 1: The actual (black) and model (red) dynamics of the bid price level queue size using the artificial order flow drawn from a normal distribution. The model is a good representation of reality.

Now suppose that order’s volume \(v\) has been drawn from a power-low distribution as, \[ v \sim x^{-(\alpha-1)} \]

where \(\alpha\) = 3. As figure 2 shows, the fit is not so good now.

The actual (black) and model (red) dynamics of the bid price level queue size using the artificial order flow drawn from a power-law distribution. The model is not a good representation of reality.

Figure 2: The actual (black) and model (red) dynamics of the bid price level queue size using the artificial order flow drawn from a power-law distribution. The model is not a good representation of reality.

We can conclude that the ability of the model to reproduce the empirical behaviour of order book queues depends on the distribution of the limit orders’s volume.

From (Cont, Stoikov, and Talreja 2010):

It is empirically observed (Bouchaud, Mézard, and Potters 2002) that incoming orders arrive more frequently in the vicinity of the current bid/ask price and the rate of arrival of these orders depends on the distance to the bid/ask. To capture these empirical features in a model that is analytically tractable and allows computation of quantities of interest in applications, most notably conditional probabilities of various events, we propose a stochastic model where the events outlined above are modelled using independent Poisson processes. More precisely, we assume that for \(i \geq 1\),

  • Limit buy (respectively sell) orders arrive at a distance of \(i\) ticks from the opposite best quote at independent, exponential times with rate \(\lambda(i)\),
  • Market buy (respectively sell) orders arrive at independent, exponential times with rate \(\mu\),
  • Cancellations of limit orders at a distance of \(i\) ticks from the opposite best quote occur at a rate proportional to the number of outstanding orders: If the number of outstanding orders at that level is \(x\), then the cancellation rate is \(\theta(i)x\).
  • The above events are mutually independent.

It shold be noted that while the majority of the market orders hits only the best price level as we showing below, there is a notable amount of cases when market order hits two or more levels. We represent these ‘multilevel’ market orders as a sequence of ‘best price market orders’ but those are not independ though.

2. Parameter estimation

2.1 Description of the Data Set

Analysis of the Data Set used in the article

From (Cont, Stoikov, and Talreja 2010):

Our data consist of time-stamped sequences of trades (market orders) and quotes (prices and quantities of outstanding limit orders) for the five best price levels on each side of the order book, for stocks traded on the Tokyo stock exchange over a period of 125 days (Aug.–Dec. 2006).

Note that it is not correct to say that a trade and a market orders are the same thing. The definition of a market order implies that a single market order may generated several trades.

From (Cont, Stoikov, and Talreja 2010):

In Table 1, we display a sample of three consecutive trades for Sky Perfect Communications. Each row provides the time, size, and price of a market order. We also display a sample of Level II bid-side quotes. Each row displays the five bid prices (pb1, pb2, pb3, pb4, pb5), as well as the quantity of shares bid at these respective prices (qb1, qb2,qb3, qb4, qb5).

Figure 3 shows Table 1 from the article :

A copy of Table 1 from the article

Figure 3: A copy of Table 1 from the article

The sample seems to be an incomplete one. A direction of a trade (i.e. buy or sell) is not shown, the currency used is not specified, a tick size is not provided, and it is not even clear whether 74,300 means 74 thousands 300 hundreads or 74.30. The total number of records in the data set is not specified either.

Sky Perfect Communication ceased in 2007 due to merger with JSAT Corporation so today it is difficult to find the information about prices of its shares in 2006. Yahoo Finance tell us that “close” price of SKY Perfect JSAT Holdings Inc. (9412.T) on August 22, 2006 was 742.50 JPY, which is either ten times higher or hundread times lower than the average price in the Table 1.

If we used today’s trading rules of domestic stocks at Japan Exchange Group, we would think that the tick size would be 0.1 JPY if the share price were 74.30 or 5 JPY if the share price were 74300.

Assuming that the minimal price change in Table 1 equals to the tick size and the price is 74300 JPY (since the authors use dot (.) elsewhere in the article to separate decimals from whole numbers) we can conclude that the tick size for Sky Perfect Communications used by the authors is 100 JPY or approximately \(\frac{1}{74}\) of the share price. We will use that conclusion to choose the comparable tick size for our data set.

Overall, it is pretty unorthodox choice of the data set for the authors affiliated with U.S. Universities and none of whom is Japanese.

Description of our data

We have uploaded into OBADiah database historic data publicly provided by MOEX (Moscow Exchange) for Sberbank of Russia, ordinary share (SBER) for the period from 2014-09-01 till 2014-09-05. We use ticker SBERRUR and prices are in Russian Rubles. The tick size is \(\frac{1}{100}\) of \(1\) Russian Ruble.

Root datasets

Our data consists has two as we call it ‘root’ data sets which are plainly calculated by OBADiah database from the raw data provided by MOEX:

  • Trades

  • Depth changes

We use Root dataset to produce what we call Derived datasets: Market orders, Limit order placements and Limit order cancellations datasets.

The Trades dataset has one row per trade and contains 438,432 rows. An excerpt from it shown in the table 2. The dataset has the following columns:

  • timestamp - the timestamp of the trade, millisecond precision
  • price - the price per share in Russian Rubles, with \(\frac{1}{100}\) tick size
  • volume - the volume of the trade, in shares
  • direction - either “buy” or “sell”, depending on the type of order that initiated the trade
  • maker - the id of the order which was sitting in the order book and was matched against “taker” to produce the trade
  • taker - the id of the order which initiated the trade
Table 2: An excerpt from Trades dataset showing nine trades generated by the single market order with taker id 3247013529504000
timestamp price volume direction maker taker
2014-09-01 13:12:14.141 73.11 1250 sell 3246759814176000 3247013529504000
2014-09-01 13:12:14.141 73.10 2500 sell 3246710480640000 3247013529504000
2014-09-01 13:12:14.141 73.10 400 sell 3246758404646400 3247013529504000
2014-09-01 13:12:14.141 73.09 1250 sell 3246703432992000 3247013529504000
2014-09-01 13:12:14.141 73.09 700 sell 3246725985465600 3247013529504000
2014-09-01 13:12:14.141 73.09 700 sell 3246730214054400 3247013529504000
2014-09-01 13:12:14.141 73.09 300 sell 3246962786438400 3247013529504000
2014-09-01 13:12:14.141 73.08 3000 sell 3245671657324800 3247013529504000
2014-09-01 13:12:14.141 73.08 1039900 sell 3246699204403200 3247013529504000

Each row in the Depth changes dataset represents a change in the order book. The dataset contains 7,340,775 rows. An excerpt from it is shown in the table 3. The dataset has the following columns:

  • timestamp - the timestamp of the change, millisecond precision
  • side - the side of the order book where the change has happened (“bid” or “ask”)
  • price - the price level of the order book at which the change happened, showing price per share in Russian Rubles with \(\frac{1}{100}\) tick size
  • volume - an increase (positive) or decrease (negative) of the number of shares which may be bought (if side is “ask”) or sold (if side is “bid”) at this price depending on the side
  • bid.price - the best bid price in the order book just before the change
  • ask.price - the best ask price in the order book just before the change

If column volume is greater than zero, the row always represents a placement of a limit order. If volume is negative the row represents either a limit order cancellation or a trade.

Table 3: An excerpt from Depth changes dataset. It contains changes due to limit order placements, cancellations as well as due to trades initiated by market order 3247013529504000. Note how trades with the same timestamp and price are combined into a single depth change
timestamp side price volume bid.price ask.price
2014-09-01 13:12:13.809 ask 73.16 -6000 73.11 73.13
2014-09-01 13:12:13.810 ask 73.16 5900 73.11 73.13
2014-09-01 13:12:13.829 ask 78.02 -200 73.11 73.13
2014-09-01 13:12:13.881 bid 72.66 -3000 73.11 73.13
2014-09-01 13:12:14.023 bid 72.67 3000 73.11 73.13
2014-09-01 13:12:14.059 bid 73.09 -400 73.11 73.13
2014-09-01 13:12:14.062 bid 73.09 300 73.11 73.13
2014-09-01 13:12:14.078 ask 73.52 -3000 73.11 73.13
2014-09-01 13:12:14.141 bid 73.11 -1250 73.11 73.13
2014-09-01 13:12:14.141 bid 73.10 -2900 73.11 73.13
2014-09-01 13:12:14.141 bid 73.09 -2950 73.11 73.13
2014-09-01 13:12:14.141 bid 73.08 -1042900 73.11 73.13
2014-09-01 13:12:14.143 bid 73.09 3000 73.08 73.13

Derived data sets

Market orders

As we’ve already noted above, a market order is not synonym to a trade. Consider again the data in the table 2 where a single taker order with taker id 3247013529504000 has generated 9(!) trades.

We will use the following terms to refer to all these significantly different entities.

  • Taker - it is a real market order as defined for example here, uniquely identified by taker column.
  • Market order - an entity produced by summation of volume of several trades with the same timestamp, price and direction columns. Note that a single market order may combine several taker orders that arrived at the same time and executed at the same price.
  • Trade - a usual trade, i.e. a match of a single taker against a single maker order.

The market order defined as above will be the closest fit to the model’s assumptions about ‘market order’ (except for independency and volume as noted above and discussed below).

Thus in order to produce the Market orders dataset we take Trades data set and combine all trades with the same timestamp, price, direction and taker columns into a single market order with the volume equal to the sum of trades’ volumes. This procedure transforms the table 2 of trades into the table 4 of market orders and into the table 5 of taker orders. In our analysis we will not use taker orders as the authors themselves have not used them.

Table 4: Market orders produced by taker with id 3247013529504000. Their volume equals to the sum of volumes of trades combined into them.
timestamp price direction volume side
2014-09-01 13:12:14.141 73.11 sell 1250 bid
2014-09-01 13:12:14.141 73.10 sell 2900 bid
2014-09-01 13:12:14.141 73.09 sell 2950 bid
2014-09-01 13:12:14.141 73.08 sell 1042900 bid
Table 5: A single taker order with id 3247013529504000. Its volume equals to sum of volumes of trades it generated. Note that the real market order impacted 4 price levels contrary to the assumption in section 1.2 Dynamics of the Order Book saying that a market order decreases the quantity at the best price only.
timestamp taker direction price volume levels
2014-09-01 13:12:14.141 3247013529504000 sell 73.08 1050000 4

The numbers of taker orders (276,601) and of market orders (305,687) in our data set is notably less than the number of trades which is 438,432.

The number of taker orders which impacted more than one price level is 26,766 or roughly 10% of total number of taker orders. The taker orders which impacted largest number of levels are shown in the table 6

Table 6: Taker orders which impacted the largest number of levels of the order book. Level size is equal to effective tick size, i.e. 0.01 of Russian Ruble
timestamp taker direction price volume levels
2014-09-01 17:52:12.052 9218876119603200 sell 73.59 1069500 35
2014-09-01 16:10:21.733 6888385250496000 buy 74.60 1000000 26
2014-09-03 13:07:37.703 2911069288857600 buy 76.99 400000 25
2014-09-03 13:04:49.194 2699054277004800 sell 77.00 118330 23
2014-09-03 13:03:42.835 2606786435520000 buy 77.48 100000 22
2014-09-03 13:20:06.579 3631575234009600 sell 77.62 215470 20
2014-09-03 11:14:40.756 539254868774400 sell 73.86 338930 19
2014-09-03 13:21:33.013 3698781386227200 sell 77.35 200000 19
2014-09-04 15:41:29.439 4701909278505600 buy 78.61 755450 19
2014-09-03 13:03:17.257 2550685918809600 sell 76.18 100000 18
Adjustment of Depth changes data set

Compare again the table 4 with the table 3. For each market order there is a corresponding depth change with the same timestamp, price and negative of volume. This is what happens typically. But not always. In order to deduce limit order placements and cancellation we need to remove from Depth changes data set the changes due to market orders.

In our data set we have 293,877 market orders which have exactly one corresponding row in Depth changes data set and 11,810 market orders arrived together with one or more orders of the same price but with the opposite direction so the volume of the corresponding order book change is different and sometimes may be even zero, i.e. an order book was not changed at all by market order.

Adjusted Depth changes data set has 7,051,909 rows.

Limit order placements

The limit order placements are extracted from the adjusted Depth changes data set: all rows with positive volume column are either limit order placements or market limit order placements. A market limit order is a limit order with the price greater than the opposite best price and which is not executed in full.

We have 3,682,051 rows in Limit order placements data set.

The number of market limit order placements is 14,061 which is small in comparison with the number of limit orders placed. The model does not have market limit orders so we ignore them too.

Limit order cancellations

Rows with negative volume column in the adjusted Depth changes data set are produced by limit order cancellations.

We have 3,355,797 rows in Limit order cancellations data set.

2.2 Estimation Procedure

From (Cont, Stoikov, and Talreja 2010):

Recall that in our stylized model we assume orders to be of “unit” size. In the data set, we first compute the average sizes of market orders \(S_m\), limit orders \(S_l\), and canceled orders \(S_c\) and choose the size unit to be the average size of a limit order \(S_l\).

Let’s stop for a moment and think whether it is a good idea to calculate the above averages. Are they meaningful? The highly-cited article (Clauset, Shalizi, and Newman 2009) starts from the brief explanation of when the use of mean value is reasonable:

Many empirical quantities cluster around a typical value. The speeds of cars on a highway, the weights of apples in a store, air pressure, sea level, the temperature in New York at noon on a midsummer’s day: all of these things vary somewhat, but their distributions place a negligible amount of probability far from the typical value, making the typical value representative of most observations. For instance, it is a useful statement to say that an adult male American is about 180cm tall because no one deviates very far from this height. Even the largest deviations,which are exceptionally rare, are still only about a factor of two from the mean in either direction and hence the distribution can be well characterized by quoting just its mean and standard deviation.

Our data set contains 305,687 market orders. Their average volume is

\[ S_m = 2,814 \] shares.The largest volume is 1,042,900 shares or 371 times larger than the average market order. The smallest volume is 10 shares or 281 times smaller than the average. The standard deviation of market order volume is 14,195.8 shares. Clearly, the average volume is not a good characterization of a “typical” market order.

The same holds true for limit order placements and cancellation. Our data set contains information about placement of 3,682,051 limit orders and about 3,355,797 cancellations. The average volume of placed limit order is

\[ S_l = 12,708 \] shares. The largest volume is 2,000,000 shares or 157 times larger than the average. The smallest volume is 10 shares or 1271 times smaller than the average. The standard deviation of placed limit order volume is 24,684.7 shares.

The average volume of cancelled limit orders is \[ S_c = 13,750 \] shares, the largest volume is 2,000,000 shares or 145 times larger than the average. The smallest volume is 10 shares or 1375 times smaller than the average. The standard deviation of cancelled limit order volume is 25,858 shares

From (Cont, Stoikov, and Talreja 2010):

The limit order arrival rate function for \(1 \leq i \leq 5\) can be estimated by
\[ \hat{\lambda}(i) = \frac{N_l(i)}{T_*} \tag{2} \] where \(N_l(i)\) is the total number of limit orders that arrived at a distance \(i\) from the opposite best quote, and \(T_*\) is the total trading time in the sample (in minutes). \(N_l(i)\) is obtained by enumerating the number of times that a quote increases in size at a distance of \(1 \leq i \leq 5\) ticks from the opposite best quote. We then extrapolate by fitting a power law function of the form \[ \hat{\lambda}(i) = \frac{k}{i^\alpha} \tag{3} \] (suggested by (Zovko and Farmer 2002) or (Bouchaud, Mézard, and Potters 2002)). The power law parameters \(k\) and \(\alpha\) are obtained by a least-squares fit \[ \min_{k, \alpha}\sum_{i=1}^{5}\Big(\hat{\lambda}(i) - \frac{k}{i^\alpha} \Big)^2 \]

Let’s start from the counting \(N_l(i)\) using Limit orders placements data set . To do that we need to count the number of times when a limit order was placed at the distance \(i\) from the opposite best price. We choose tick.size to be 0.1 or approximately \(\frac{1}{74}\) of the share price, i.e. the same as in the article. It is ten time bigger than the actual tick size, so we need to round prices. We calculate distance \(i\) separately for “ask” and “bid” orders as show in tables 7 and 8 below:

Table 7: A sample of \(i\) calculation for ‘bid’ limit orders placements. Note that prices of orders are rounded downward to the closest multiple of tick size while best ask price in the distance calculation column \(i\) is rounded upward.
timestamp side price volume bid.price ask.price price.big.tick.size ask.price.big.tick.size i
2014-09-01 11:00:00.049 bid 73.47 1000 73.5 73.51 73.4 73.6 2
2014-09-01 11:00:00.069 bid 73.46 1000 73.5 73.51 73.4 73.6 2
2014-09-01 11:00:00.157 bid 71.34 5000 73.5 73.51 71.3 73.6 23
2014-09-01 11:00:00.209 bid 72.58 20000 73.5 73.51 72.5 73.6 11
2014-09-01 11:00:00.328 bid 70.56 10 73.5 73.51 70.5 73.6 31
2014-09-01 11:00:00.360 bid 73.12 15710 73.5 73.51 73.1 73.6 5
Table 8: A sample of \(i\) calculation for ‘ask’ limit orders placements. Note that prices of orders are rounded upward to the closest multipe of tick size while best bid price in the distance calculation column \(i\) is rounded downward.
timestamp side price volume bid.price ask.price price.big.tick.size bid.price.big.tick.size i
2014-09-01 11:00:00.073 ask 73.56 1000 73.5 73.51 73.6 73.5 1
2014-09-01 11:00:00.078 ask 73.57 1000 73.5 73.51 73.6 73.5 1
2014-09-01 11:00:00.210 ask 74.43 20000 73.5 73.51 74.5 73.5 10
2014-09-01 11:00:00.286 ask 73.51 100 73.5 73.51 73.6 73.5 1
2014-09-01 11:00:00.367 ask 73.67 14950 73.5 73.51 73.7 73.5 2
2014-09-01 11:00:00.369 ask 73.87 59800 73.5 73.51 73.9 73.5 4

Then if we substitute equation (2) into equation (3) and take the logarithm of both sides we get:

\[ N_l(i) = \frac{k T_*}{i^\alpha} \\ \log{N_l(i)} = \log{k T_*} - \alpha \log{i} \tag{4} \] So \(\hat{\lambda(i)}\) follows power-law specified by equation (2) if and only if the logarithm of the total number of limit orders that arrived at a distance \(i\) from the opposite best quote \(N_l(i)\) is linear function of the logarithm of the distance \(i\). As Figure 4 shows this is, in fact, the case.

Number of limit order placements by distance from the opposite best price in ticks. Tick size is 0.1 Russian Ruble

Figure 4: Number of limit order placements by distance from the opposite best price in ticks. Tick size is 0.1 Russian Ruble

In our case \(T_*\) equals 2,625 minutes, calculated arrival rates \(\hat{\lambda}(i)\) are shown in the table 9 below.

Table 9: Number of limit order placements and arrival rate per minute by distance from the opposite best price in ticks. Tick size is 0.1 Russian Ruble
Distance: 1 2 3 4 5 6 7 8 9 10
Lambda 583.846 560.147 168.833 25.3566 17.8937 13.011 5.92648 12.845 4.28571 1.69371

Let’s check whether the distribution of limit order arrival times fits the Poisson distribution as the model assumes. We will use very simple approach from (Feller 1950):

Suppose that a physical experiment is repeated a great number \(N\) of times, and that each time we count the number of events in an interval of fixed length \(t\). Let \(N_k\) be the number of times that exactly \(k\) events are observed. Then \[ N_0 + N_1 + N_2 + \cdots = N \] The total number of points observed in the \(N\) experiments is \[ N_1 + 2N_2 + 3N_3 + \cdots = T \tag{5} \] and \(\frac{T}{N}\) is the average. If \(N\) is large, we expect that \[ N_k \approx N \exp^{-\lambda t} \frac{(\lambda t)^k}{k!} \tag{6} \] Substituting from (6) into (5), we find \[ T \approx N \exp^{-\lambda t}\lambda t \big( 1 + \frac{\lambda t}{1} + \frac{(\lambda t)^2}{2!} + \cdots ) = N\lambda t \] and hence \[ \lambda t \approx \frac{T}{N} \tag{7} \] This relation gives us a means of estimating \(\lambda\) from observations and of comparing theory with experiments.

In our case we will repeat the experiment every second (\(t = 1\)) and will measure the number of limit orders placed at the distance \(i\) from the opposite best price per second.

Thus the number of experiments \(N\) will be the same for every: \(i\) \(N = 60T_* = 157,500\) seconds. The total number of points observed will depend on \(i\) as: \(T = T_i = N_l(i)\). \(\lambda t\) will also depend on \(i\): \((\lambda t)_i = \frac{N_l(i)}{60T_*} = \frac{\hat{\lambda}(i)}{60}\).

Figure 5 shows number of experiments (or periods or seconds) \(N_k\) with given number of limit order placements \(k\) at distance \(i=1\) tick from the opposite best price. Tick size equals 0.1 Russian Ruble. The Poisson distribution is shown for comparison - it is clear that number of limit order placements per second is not sampled from the Poisson distribution.

An empirical distribution of limit order placements is not the Poisson distribution. Red points are empirical numbers of experiments $N_k$ (or periods or seconds) with given number of limit order placements $k$ at distance $i=1$ tick (with tick size = 0.1 Russian Ruble) from the opposite best price. Black points are from the Poisson distribution with $\lambda t =\frac{(\hat{\lambda} t)_1}{60}=$ 9.73076 for comparison.

Figure 5: An empirical distribution of limit order placements is not the Poisson distribution. Red points are empirical numbers of experiments \(N_k\) (or periods or seconds) with given number of limit order placements \(k\) at distance \(i=1\) tick (with tick size = 0.1 Russian Ruble) from the opposite best price. Black points are from the Poisson distribution with \(\lambda t =\frac{(\hat{\lambda} t)_1}{60}=\) 9.73076 for comparison.

Figure 6 demonstrates that the situation is similar at distances \(i=1,\ldots,8\). From figure 4 we see that 99% are placed at these levels.

Empirical numbers of experiments $N_k$ (or periods or seconds) with given number of limit order placements $k$ at various distances $i$ from ticks from the opposite best price. Tick size is 0.1 Russian Ruble. It appears that none of them is sampled from the Poisson distribution.

Figure 6: Empirical numbers of experiments \(N_k\) (or periods or seconds) with given number of limit order placements \(k\) at various distances \(i\) from ticks from the opposite best price. Tick size is 0.1 Russian Ruble. It appears that none of them is sampled from the Poisson distribution.

Let’s return to (Cont, Stoikov, and Talreja 2010):

The arrival rate of market orders is then estimated by \[ \hat{\mu} = \frac{N_m}{T_*}\frac{S_m}{S_l} \] where \(T_*\) is the total trading time in the sample (in minutes) and \(N_m\) is the number of market orders. Note that we ignore market orders that do not affect the best quotes, as is the case when a market order is matched by a hidden order.

There are no hidden orders at MOEX, so we easily ignore their existence somewhere too. Thus the number of market orders \(N_m\) in our data set is equal to its size: \[ N_m = 305,687 \\ \mu = \frac{305687}{2625}\frac{2814}{12708} = 25.786628 \]

From (Cont, Stoikov, and Talreja 2010):

Because the cancellation rate in our model is proportional to the number of orders at a particular price level, in order to estimate the cancellation rates we first need to estimate the steady-state shape of the order book \(Q_i\) , which is the average number of orders at a distance of \(i\) ticks from the opposite best quote, for \(1 \leq i \leq 5\). If \(M\) is the number of quote rows and \(S_i^B(j)\) the number of shares bid at a distance of \(i\) ticks from the ask on the \(j\)th row, for \(1 \leq j \leq M\), we have \[ Q_i^B = \frac{1}{S_l}\frac{1}{M}\sum_{j=1}^{M}S_i^B(j) \tag{8} \] The vector \(Q_i^A\) is obtained analogously, and \(Q_i\) is the average of \(Q_i^A\) and \(Q_i^B\).

As shown on the figure 3, the time interval between quote rows is not always the same but the formula (8) does not take the interval into consideration.

Note that \(Q_i^B\) is measured in “orders” while \(S_i^B(j)\) is measured in “shares”. The conversion rate between these units of measure is \(1 \text{ order } = S_l \text{ share }\), so \(Q_i^B(j)\) - the number of orders bid at a distance of \(i\) ticks from the ask on the \(j\)th row may be calculated as shown in formula (9):

\[ Q_i^B(j) = \frac{S_i^B(j)}{S_l} \tag{9} \]

Table 10 below.shows an example of \(S_i^B(t)\) and \(S_i^A(t)\) evolution as it is returned by the function obadiah::queues() which calculates them from Depth changes data set. Note that when bid.price has increased from \(72.88\) to \(72.89\) the whole ask queue a2 has jumped temporarily to a1 and then returned back when bid.price became \(72.88\) again. Thus a queue size changes not only when a limit order is placed or cancelled but also when the best bid or ask price changes.

Table 10: An example of bid-ask queues evolution in time. Each queue aN and bN shows the number of shares outstanding in the queue. Note the changes of ask queues at 2014-09-01 14:00:02 due to bid.price change
timestamp bid.price ask.price b1 b2 b3 b4 b5 a1 a2 a3 a4 a5
2014-09-01 14:00:00 72.88 72.9 0 9070 5710 1660 19700 0 16910 0 3500 8850
2014-09-01 14:00:01 72.88 72.9 0 6900 5710 1660 21400 0 15570 0 3500 8850
2014-09-01 14:00:02 72.89 72.9 3000 6900 6370 9700 18000 15570 0 2500 5750 2740
2014-09-01 14:00:03 72.88 72.9 0 6900 5710 1660 22800 0 15570 0 2500 5750
2014-09-01 14:00:04 72.88 72.9 0 6900 5710 1660 22600 0 15570 0 2500 5750
2014-09-01 14:00:05 72.88 72.9 0 6800 5710 2160 22600 0 15570 0 2500 5750
2014-09-01 14:00:06 72.88 72.9 0 6800 3210 2160 21200 0 8050 0 1140 11450
2014-09-01 14:00:07 72.88 72.9 0 6800 3210 2160 21200 0 8050 0 1140 8450
2014-09-01 14:00:08 72.88 72.9 0 6800 3210 4660 35130 0 320 9750 1140 6350
2014-09-01 14:00:09 72.88 72.9 0 6800 3210 4660 34730 0 320 9750 1140 6350
2014-09-01 14:00:10 72.88 72.9 0 6800 3210 4660 32230 0 320 9750 1140 6350

Figure 7 shows calculated average number of orders at distance \(i\) for \(Q_i^A\), \(Q_i^B\) and \(Q_i\).

Average number of orders. The order size is 12708 shares.

Figure 7: Average number of orders. The order size is 12708 shares.

From (Cont, Stoikov, and Talreja 2010):

An estimator for the cancellation rate function is then given by \[ \hat{\theta}(i) = \frac{N_c(i)}{T_*Q_i}\frac{S_c}{S_l} \text{ for } i \leq 5 \text{ and } \\ \hat{\theta}(i) = \hat{\theta}(5) \text{ for } i > 5 \tag{10} \] where \(N_c(i)\) is obtained by counting the number of times that a quote decreases in size at a distance of \(1 \leq i \leq 5\) ticks from the opposite best quote, excluding decreases due to market orders.

In our case we are not limited by five ticks. Otherwise we calculate \(\hat{\theta(i)}\) in accordance with the equation (10).

But before we do that let’s have a look at the figure 8 where we plotted \(\log{N_c(i)} \sim \log{i}\). The striking similarity with the figure 4 is obvious so the idea to model the order placements and cancellations as independent random variables is at least doubtfull. Today HFT traders cancel almost every placed order soon after it has been placed.

Number of limit order cancellations by distance from the opposite best price in ticks is almost exactly the same as the number of limit order placements. Tick size is 0.1 Russian Ruble

Figure 8: Number of limit order cancellations by distance from the opposite best price in ticks is almost exactly the same as the number of limit order placements. Tick size is 0.1 Russian Ruble

The calculated values of \(\hat{\theta}(i)\) are shown on the figure 9. While it is easy to see the similarities between figures 8 and 4, almost nothing can be said about the calculated values of \(\hat{\theta}(i)\). It appears that the inclusion of queue sizes \(Q_i\) into the calculation of \(\hat{\theta}(i)\) is not empirically justified.

Calculated values of cancellation rate. In our opinion, the inclusion of queue sizes into the calculation formula is not justified

Figure 9: Calculated values of cancellation rate. In our opinion, the inclusion of queue sizes into the calculation formula is not justified

3. Laplace Transform Methods for Computing Conditional Probabilities

TBD

References

Bouchaud, Jean-Philippe, Marc Mézard, and Marc Potters. 2002. “Statistical Properties of Stock Order Books: Empirical Results and Models.” Quantitative Finance 2 (4).

Clauset, Aaron, Cosma Rohilla Shalizi, and M. E. J. Newman. 2009. “Power-Law Distributions in Empirical Data.” SIAM Review 51 (4).

Cont, Rama, Sasha Stoikov, and Rishi Talreja. 2010. “A Stochastic Model for Order Book Dynamics.” Operations Research 58 (3).

Feller, William. 1950. An Introduction to Probability Theory and Its Applications. Vol. 1.

Zovko, Ilija, and J Farmer. 2002. “The Power of Patience: A Behavioural Regularity in Limit-Order Placement.” Quantitative Finance 2 (5).