Efficient Estimation of Volatility using High Frequency Data
The paper titled, Efficient Estimation of Volatility using High Frequency Data, is about testing a set of volatility estimators using high frequency data. I will attempt to briefly summarize the paper.
For a person working in the financial markets, there is not a day that goes by without hearing the word, “volatility’’. Yet, it is something that is not observed. If you assume that stocks follow some random process like a GBM, then the relevant question to ask is, “How does one estimate the diffusion parameter/process in the model?” One of the principles from classical statistics, minimal sufficient statistics, says that, for estimating the volatility, every increment in the price process is needed. This means that any discrete sampling implies loss of information.
One of the standard ways to measure daily volatility is via realized volatility: chop the day in tiny intervals, compute squared returns for each interval and add them up for the entire trading day, you get an estimate of daily volatility. This formula seems straightforward, yet there is something problematic with it. As you increase the sampling frequency, the microstructure noise comes in to picture. This noise component is besides the usual bidask bounce. Bidask bounce can be easily removed from the training data by taking midpoint of quotes. This noise component that arises out microstructure noise is termed as incoherent price process. Zhou’s paper takes in to account this noise process and models the log price process as a combination of Brownian motion and an i.i.d process. In this kind of a setup, Zhou suggests an estimator that is simple enough to implement. Another problem using realized volatility computed via finer intervals squared returns is that the estimator is strongly biased upwards. To achieve unbiasedness, the lower bound for the time between observations is about the order of 30 min which means throwing away most of the high frequency data.
What’s this paper about?
The paper is mainly about computing a set of estimators that include tweaks of Zhou’s estimators in “tick” time rather than “homogenous” time and then comparing the performance of all these estimators on simulated data. What are the estimators considered in the paper?

Simple 1 day volatility estimate using closed price returns for the day

The widely used Risk Metric definition which is basically an IGARCH

Zhou’s estimator

Zhou’s estimator with the correction term computed on a longer sample

Quadratic variation of the filtered series. The way to create filtered series is described in a paper by Corsi

Zhou’s estimate for filtered series

Bias corrected Zhou’s estimator
A constant volatility random walk and a GARCH(1,1) process data is simulated and each of the above estimators are tested for their forecasting effectiveness. The paper goes in extreme detail in stating the various issues that come up in such a back testing exercise. Extremely valuable for anyone working with HFD.
What are the conclusions of the tests ?

The best high frequency estimators clearly outperform the estimators using only daily data

Zhou’s estimator is not a good estimator for aggregated tick returns

Zhou’s estimator with lagged covariance has small variance but the probability to obtain negative values is large

Zhou’s estimator on filtered series is an efficient estimator with a small enough probability to obtain negative values using empirical data

Applying quadratic variation estimate on filtered series is efficient and positive. The flip side is that this estimator does not correct for the misspecifications of the incoherent filter
Overall conclusion of the paper is that the authors suggest the Zhou’s estimator on filtered time series as a best available estimator. However the authors conclude the paper with a word of caution:
The optimal choice of the volatility estimator is still an open problem !