Order book reconstruction from quotes and trades streams
2020-03-30
1 Introduction
1.1 Continous-time double auction and LOB
Today, most liquid markets – including stocks, futures, and foreign exchange – are electronic, and adopt a continuous-time double auction mechanism using a limit order book (LOB), in which a transaction occurs whenever a buyer and a seller agree on a price(Bouchaud and Bonart 2018). Cryptocurrenices are not exception.
The mechanics of the countinous double auction or LOB trading may be briefly described as follows:
- Traders submit limit orders also called quotes and market or market-limit orders (i.e. quotes with the limit price better than the opposite quote best price already in LOB)
- Unmatched quotes or unmatched amounts of market-limit orders reside in LOB’s queues until matched with another market or market-limit order or cancelled by the participant who submitted it.
Market or market-limit orders are also often called taker orders while limit orders sitting in queues are called maker orders.
Note that not all matched orders are executed and produce trades. This happens due to self-match or self-trade prevention rules. See for example CME Globex Self-Match Prevention or Coinbase Markets Trading Rules 2.4 Self-trade prevention.
Thus in order to reconstruct the dynamics of the trading process and order book one needs information about submitted quotes, market and market-limit orders and trades produced. As we will see below this information is not always provided by exchanges.
1.2 Available data sets
Most of data sets containing information about quotes and trades consist of two separate loosely coupled files: a trades file recording trades and an quotes file recording quote placements, changes and cancellations. Loosely coupled in this context means that records in trades file do not always have clearly identifiable corresponding records in quotes file as one would expect. By definition every trade shoud change some quote in LOB. So a matching or coupling procedure is required in order to establish the link between the trades and quotes files. This link is necessary to:
- Distinguish between quote changes due to limit order cancellation and market order execution
- Estimate size of submitted market orders
- Distinquish between limit order placements and market limit order placements
and, overall, to achieve the ultimate goal - to perform a complete order book reconstruction at every moment of time.
A recently published book (Abergel 2016) uses the Thomson Reuters Tick History (TRTH) database tells us that
Because one cannot distinguish market orders from cancellations just by observing changes in the limit order book (the “event” file), and since, the timestamps of the “trade” and “event” files are asynchronous, we use a matching procedure to reconstruct the order book events.
The reported matching rate of the above procedure is about 85% for CAC 40 stocks and as a byproduct the procedure outputs the sign of each matched trade, that is whether it is a buyer or a seller initiated trade. Note that TRTH data set does not even provide information about trade direction, it has to be deduced!
The description of similar issues we find in (Hautsch 2004):
A typical problem occurs when trades and quotes are recorded in separate trade and quote databases, like, for example, in the Trade and Quote (TAQ) database released by the NYSE. In this case, it is not directly identifiable whether a quote which has been posted some seconds before a transaction was already valid at the corresponding trade.
Websocket API v2 of the cryptocurrency exchange Bitstamp gives access to the following information for every instrument traded:
- Live ticker channel - information about trades. Unique ids of participating quote and market order are provided for each trade.
- Live orders channel - information about quotes and market orders (all order creation, change and deletion events are reported).
As we will see later, events in these channels are not always sent in correct time order. It seems that some events are ommitted. Timestamps in Live tickers and Live orders channels are not synchronized: market order, changes of quotes and of market order itself caused by the execution of the market order, trades produced - all may have different timestamps. Substantial amount of matched orders are not executed due to self-trade prevention policy of Bitstamp.
Similarly [Websocket API version 2.0] of the cryptocurrency exchange Bitfinex have the following channels (for every instrument traded):
- Trades channel - information about trades. Ids of participating quote and market orders are not reported
- Raw book channel - provides information about 100 best bid and 100 ask quotes. Market orders are not reported. Quotes reported as deleted when they fall beyond 100+ best quotes and created again with the same id when they return back. What was happening to them between these moments is not known. Since Bitfinex allows traders to change the price and volume of submitted quotes, the quotes may be changed or just cancelled.
As well as at Bitstamp, records in Trades and Raw book channels are not synchronized.
It should be clear from the above that substantial effort is required to reconstruct the true dynamics of order submission, matching and execution.