# The Invisible-hand Explanation by Nozick

This is not a quant post, nor an Econ 101 revisit. In Robert Nozick’s political philosophy book Anarchy, State and Utopia, he gives us a precise and elegant explanation of what “invisible-hand explanations” actually are epistemologically. Analogous to the way we model market behavior using information theory. I believe this concept is monumentally important, but we are not spending enough time on it in our education.

There is a certain lovely quality to explanations of this sort. They show how some overall pattern or design, which one would have thought had to be produced by an individual’s or group’s successful attempt to realize the pattern, instead was produced and maintained by a process that in no way had the overall pattern or design “in mind”. After Adam Smith, we shall call such explanations invisible-hand explanations….Consider now complicated patterns which one would have thought could arise only through intelligent design, only through some attempt to realize the pattern. One might attempt straightforwardly to explain such patterns in terms of the desires, wants, beliefs, and so on, of individuals, directed toward realizing the pattern. But within such explanations will appear descriptions of the pattern, at least within quotation marks, as objects of belief and desire. The explanation itself will say that some individuals desire to bring about something with (some of) the pattern-features, that some individuals believe that the only (or the best, or the . . .) way to bring about the realization of the pattern features is to . . . , and so on. Invisible-hand explanations minimize the use of notions constituting the phenomena to be explained; in contrast to the straightforward explanations, they don’t explain complicated patterns by including the full-blown pattern-notions as objects of people’s desires or beliefs. Invisible-hand explanations of phenomena thus yield greater understanding than do explanations of them as brought about by design as the object of people’s intentions. It therefore is no surprise that they are more satisfying.

An invisible-hand explanation explains what looks to be the product of someone’s intentional design, as not being brought about by anyone’s intentions. We might call the opposite sort of explanation a “hidden-hand explanation.” A hidden-hand explanation explains what looks to be merely a disconnected set of facts that (certainly) is not the product of intentional design, as the product of an individual’s or group’s intentional design(s). Some persons also find such explanations satisfying, as is evidenced by the popularity of conspiracy theories.

Someone might so prize each type of explanation, invisible hand and hidden hand, that he might attempt the Sisyphean task of explaining each purported nondesigned or coincidental set of isolated facts as the product of intentional design, and each purported product of design as a nondesigned set of facts! It would be lovely to continue this iteration for a bit, even through only one complete cycle.

After listing dozens of invisible-hand explanations across different fields including evolutionary biology, economics, sociology, psychology, etc, Nozick goes on and explains the 2 types of invisible-hand explanations and their properties related to causality.

We can mention here two types of invisible-hand processes by which a pattern P can be produced: filtering processes and equilibrium processes. Through filtering processes can pass only things fitting P, because processes or structures filter out all non-P’s; in equilibrium processes each component part responds or adjusts to “local” conditions, with each adjustment changing the local environment of others close by, so that the sum of the ripples of the local adjustments constitutes or realizes P (Some processes of such rippling local adjustments don’t come to an equilibrium pattern, not even a moving one.) There are different ways an equilibrium process can help maintain a pattern, and there also might be a filter that eliminates deviations from the pattern that are too great to be brought back by the internal equilibrating mechanisms. Perhaps the most elegant form of explanation of this sort involves two equilibrium processes, each internally maintaining its pattern in the face of small deviations, and each being a filter to eliminate the large deviations occurring in the other.

Here’s the most interesting part.

We might note in passing that the notion of filtering processes enables us to understand one way in which the position in the philosophy of the social sciences known as methodological individualism might go wrong. If there is a filter that filters out (destroys) all non-P Q’s, then the explanation of why all Q’s are P’s (fit the pattern P) will refer to this filter. For each particular Q, there may be a particular explanation of why it is P, how it came to be P, what maintains it as P. But the explanation of why all Q’s are P will not be the conjunction of these individual explanations, even though these are all the Q’s there are, for that is part of what is to be explained. The explanation will refer to the filter. To make this clear, we might imagine that we have no explanation of why the individual Q’s are P’s. It just is an ultimate statistical law (so far as we can tell at any rate) that some Q’s are P; we even might be unable to discover any stable statistical regularity at all. In this case we would know why all Q’s are P’s (and know there are Q’s, and perhaps even know why there are Q’s) without knowing of any Q, why it is P! The methodological individualist position requires that there be no basic (unreduced) social filtering processes.

Roy

# Leveraged ETF – A Simulation

This post is a token of appreciation for Faisal Habib who taught us structured products this summer.

As commonly known among people who are familiar with leveraged ETFs, the tracking error of those products tend to be larger than what we intuitively expected. This phenomenon has been explored and explained by Avellaneda and Zhang (2009). More information can also be found here. In a nutshell, the discrepancy resides in the borrowing cost incurred in a replication portfolio and extra realized variance in a continuous model. The resources linked above have more details, here I think it’ll be interesting just to model it out and take a look at the simulated results.

To do this I simulated four time-series.

1. An underlying that follows GBM.
2. An imaginary product that provides perfect leverage simply by multiplying the underlying return by the leverage ratio.
3. A leveraged product that achieves leveraged return by constructing a static replication portfolio with risk-free rate r.
4. A leveraged ETF modeled in continuous time, i.e., drift term adjusted using Ito’s lemma.

Here we have the results after running it once with 2x leverage and daily frequency (delta_t = 1/252). I know it strikingly resembles the shape of S&P500 but I swear this is pure coincidence (I thought it’s cool, too).

And annualized returns:

 underlying_process simple_multiplication static_replication leveraged_etf 12.0% 25.4% 23.6% 22.3%

A quick note. The annualized return of the simple multiplication process (# 2) is way above 2 times of the underlying because of the compounding effect in a raging bull market we just created. The static replication (# 3) underperforms # 2 roughly by the same amount of the annual risk-free rate. The leveraged ETF (# 4) underperforms # 3 by the annual variance of the underlying. These results are consistent with previous studies.

There are a lot of ways to play with this so I’ll post the source code here for anyone who’s interested in trying it out (or let me know if I made any mistakes).

LETF <- function(S_0 = 100,
r_f = 0.015,
borrow_cost = 0.01,
mu = 0.05,
sigma = 0.1,
end_t = 5,
delta_t = 1/252,
leverage = 2) {
# create a vector of the undelrying process
n <- end_t / delta_t
underlying_process <- zoo(0, 1:n)
underlying_process[1] <- S_0      if (leverage > 1) {borrow_cost <- 0}

# create vectors of simple_multiplication, static replication
# and leveraged ETF without expense ratio
simple_multiplication <- underlying_process
static_replication <- underlying_process
leveraged_etf <- underlying_process

for (i in 2:n) {
# the random component of the GBM model
rdn <- rnorm(1)

# model the underlying with GBM
underlying_process[i] <- coredata(underlying_process[i - 1]) *
exp((mu - sigma ^ 2 / 2) * delta_t + sigma * sqrt(delta_t) * rdn)
underlying_log_return <- log(coredata(underlying_process[i]) /
coredata(underlying_process[i - 1]))

# model a leveraged process by simplely multiplying underlying returns
simple_multiplication_log_return <- leverage * underlying_log_return
simple_multiplication[i] <- simple_multiplication[i - 1] *
exp(simple_multiplication_log_return)

# model static replication return with equation (1) and (2) in
# Avellaneda and Zhang (2009)
static_replication_log_return <- underlying_log_return * leverage -
((leverage - 1) * r_f - borrow_cost * leverage) * delta_t
static_replication[i] <- coredata(static_replication[i - 1]) *
exp(static_replication_log_return)

# model leveraged etf return with equation (10) in
# Avellaneda and Zhang (2009)
leveraged_etf[i] <- coredata(leveraged_etf[i - 1]) *
exp(underlying_log_return) ^ leverage *
exp(-((leverage - 1) * r_f +
sigma ^ 2 * leverage * (leverage - 1) / 2) * delta_t)
}
return(merge(underlying_process, simple_multiplication,
static_replication, leveraged_etf))
}


Roy

# When Noise Overwhelms Signal – Sorting out Sorts Review

In his 1998 paper, Jonathan Berk illustrated that by sorting stocks based on a variable (e.g. B/E ratio) correlated to a known variable (e.g. beta), the power of the known variable to predict expected return within each group diminishes when tested with cross-sectional regression. This is very likely why Fama and French found the explanatory power of beta disappeared (1992) and Daniel and Titman discovered that stock characteristics matter more than covariances (1997). For researchers and data analysts, this is a perfect example of how seemingly harmless manipulation of data can cause meaningful loss of information. If not careful, such loss can lead to confusing or even completely wrong conclusions.

The intuition behind this issue is rather simple: when data gets divided into smaller groups and tested separately, the error of beta estimation becomes “louder” as the sample size gets smaller. The error-minimizing advantage from using a large sample diminishes as the sample is divided into smaller groups, as the error of estimation overwhelms the useful information in each group.

Getting the intuition is one thing, identifying where exactly the issue occurs and tracing it through the proof is a different story.

Technical

Assume CAPM holds: $E[R_{i}] = r + \beta_{i}(E[R_{m}]-r)$, in which the systematic risk of stock $i$ is $\beta_{i}\sim\mathcal{N}(1,\sigma^{2})$. Realized return $\hat{R_{i}}$ is the same as expected return $E[R_{i}]$

Scenario 1: CAPM is tested cross-sectionally with a full sample with infinite number of stocks and there’s no estimation error between theoretical beta and estimated beta. i.e., $\hat{\beta_{i}} \equiv \beta_{i}$. The coefficient of this regression is:

$\frac{cov(\hat{R_{i}}-r, \hat{\beta_{i}})}{var(\hat{\beta_{i}})} = \frac{cov(E[R_{i}]-r, {\beta_{i}})}{var({\beta_{i}})} = \frac{\sigma^2}{\sigma^2 + 0}(E[R_{m}]-r) = 1* (E[R_{m}]-r)$

Interpretation: stock returns are perfectly linear (coef = 1) to their exposure to the market risk premium; beta is the perfect predictor of stock returns.

Scenario 2: there’s error in estimated beta, i.e., $\hat{\beta_{i}} = \beta_{i} + \epsilon_{i}$, $\epsilon_{i} \sim\mathcal{N}(1,\theta^2)$. This is where the trouble originates. The existence of $\epsilon_{i}$ gave birth to the original noise $\theta$, which will get passed down through the rest of the test. As we can see, the coefficient of the same test is already contaminated:

$\frac{cov(\hat{R_{i}}-r, \hat{\beta_{i}})}{var(\hat{\beta_{i}})} = \frac{cov(E[R_{i}]-r, {\beta_{i}}+\epsilon_{i})}{var({\beta_{i}}+\epsilon_{i})} = \frac{\sigma^2}{\sigma^2 + \theta^2}(E[R_{m}]-r)$

* Assuming estimated and observed returns are the same for convenience.

Interpretation: $\frac{\sigma^2}{\sigma^2 + \theta^2} < 1$, stock returns are less sensitive to how much systematic risk they are bearing; beta is less of a perfect predictor of stock returns.

Scenario 3: now all stocks are sorted into N fractiles by a variable linearly correlated to beta. Within the jth fractile, the conditional variance of $\beta$ is now redefined:

$\sigma_{j}^{2} \equiv var(\beta_{j}|i\in j) = \sigma^{2}g(j)$,

where $g(j)$ is a concave-up function that “shrinks” $\sigma^{2}$ when all stocks are in the jth fractile (a partial integral of the full sample).

Run the regression test again the coefficient is now:

$\frac{\sigma^{2}g(j)}{\sigma^{2}g(j) + \theta^2}(E[R_{m}]-r) = \frac{\sigma^{2}}{\sigma^{2} + \theta^{2}/g(j)}(E[R_{m}]-r)$

Interpretation: $g(j)$, a term born from the sorting process, is now serving as a “noise amplifier”. It enhances $\theta^{2}$ when it gets smaller and dampens the coefficient as a result. As a concave-up function, it gets smaller when N is larger and/or j moves closer to the middle among groups. The graph below shows how the coefficient changes with $g(j)$ when $E[R_{m}-r$ is fixed at 1, $\sigma^{2} = 0.10$ and $\theta^{2} = 0.05$

To illustrate with actual data, 2,000 stock betas are randomly generated with mean 1 and standard deviation 0.50; 2,000 expected returns are calculated using these betas, market return 6.00% and risk-free rate 1.00%; estimated betas are calculated by adding 2,000 random errors with mean 0 and standard deviation 0.05. All estimated returns are ranked from low to high and this will be used as the basis for sorting. In summary:

• Number of stocks k = 2,000
• $\beta_{i}\sim\mathcal{N}(1,0.5)$
• $E[R_{i}] = 0.01 + \beta_{i}(0.06 - 0.01) = 0.01 + \beta_{i}(0.05)$
• $\epsilon_{i}\sim\mathcal{N}(0,0.05)$
• $\hat{\beta_{i}} = \beta_{i} + \epsilon_{i}$

Test for scenario 1. Run regression $E[R_{i}] = \alpha + \lambda \beta_{i} + \varepsilon_{i}$. We get $\alpha = 0.0100$, $\lambda = 0.0500, R-squared = 1.00$. Essentially perfect fit.

Test for scenario 2. Run regression $E[R_{i}] = \alpha + \lambda \hat{\beta_{i}} + \varepsilon_{i}$. We get $\alpha = 0.01068$, $\lambda = 0.04943, R-squared = 0.9902$.

Test for scenario 3. Run regression $E[R_{i}|i \in j] = \alpha_{j} + \lambda_{j} \hat{\beta_{i}} + \varepsilon_{ij}; j \in [1, N]$.

By setting N = 5, 10, 20, 50, respectively, the coefficients in each group are as follows:

The results are consistent with Berk’s findings. The more groups the stocks are sorted into, the less predictive power beta has on expected returns; the further away j moves towards the center among all groups, the more pronouncing this effect gets.

Roy

# Ex-ante and Ex-post Risk Model – An Empirical Test

Whenever constructing a quant portfolio or managing portfolio risk, the risk model is at the heart of the process. A risk model, usually estimated with a sample covariance matrix, has 3 typical issues.

1. Not positive-definite, which means it’s not invertible.
2. Exposed to extreme values in the sample, which means it’s highly unstable through time and will be exploited by the optimizer.
3. Ex-post tracking error is always larger than the ex-ante tracking error, given a stochastic component in the holdings, which means investors will suffer from unexpected variances, either large or small.

Issue 1 is a pure math problem, but issue 2 and 3 are more subtle and more related to each other. A common technique called shrinkage has been devised to solve these issues. The idea behind is to add more structure to the sample covariance matrix by taking a weighted average between itself and a more stable alternative (e.g. a single-factor model or a constant correlation covariance matrix). Two main considerations are involved in the usage of shrinkage: 1. what’s the shrinkage target, i.e. the alternative? 2. what’s the shrinkage intensity, i.e. the weight assigned to each matrix?

Links above provide details about these considerations. I did several tests to show how the differences between ex-ante and ex-post tracking errors vary when using different shrinkage targets and intensities.The test is done with 450 stocks that were both in the S&P500 by the end of Oct 2016 and had been listed for at least 10 years. An equal-weighted portfolio is formed using 2-year weekly data and is rebalanced every month.

The test is done with 450 stocks that were both in the S&P500 by the end of Oct 2016 and had been listed for at least 10 years. An equal-weighted portfolio is formed using 2-year weekly data and is rebalanced every month. The shrinkage intensity changes from 10% to 90% by 10% throughout the test. The spreads between the ex-ante and ex-post variances are recorded each week .

As shown above, the Ledoit-Wolf approach (single-factor, optimal intensity as derived by L&W) creates the least estimation error among all other approaches tested. Interestingly, the sample covariance matrix approach shows higher ex-ante risks than ex-post, which violates the theory mentioned above. This is possibly because in this test the ex-ante variances always stay constant for four weeks while the ex-post variances change every week, which amplifies the actual spread if we believe that they should move together over time.

Roy

# A Quick Review – the Math Behind the Black Scholes Model

This is a high-level quick review on the derivation of the Black Scholes model, i.e., I will not spend time on putting down rigorous definitions or discussing the assumptions behind the equations.They are definitely important, but I’d rather focus on one thing at a time.

Suppose the change of a stock’s price $dS_{t}$ follows a geometric Brownian motion process:

$dS_{t} = \mu S_{t}dt + \sigma S_{t}dW_{t}$,        (1)

Where $\mu$ is the drift constant and $\sigma$ is the volatility constant.

According to Ito’s Lemma (Newtonian calculus doesn’t work here because of the existance of stochastic term $W_{t}$), the price of an option of this stock, which is a function $C$ of $S$ and $t$, must satisfy:

$dC(S,t) = (\mu S_{t} \frac{\partial C}{\partial S} + \frac{\partial C}{\partial t} + \frac{1}{2} \sigma^2 S^2 \frac{\partial^2 C}{\partial S^2}) dt + \sigma S_{t} \frac{\partial C}{\partial S} dW_{t}$,        (2)

Now theoretically, instead of buying an option, we could replicate the payoff of an option by actively and seamlessly allocating our money between a risk-free asset $B_{t}$ and the stock underlying said option. Given the no-arbitrage assumption, the value of this replicating portfolio, $P_{t}$, should be exactly the same as the option price. Therefore we have:

$P_{t} = a_{t}B_{t} + b_{t}S_{t}$,        (3)

$dP_{t} = a_{t}dB_{t} + b_{t}dS_{t}$

$= ra_{t}B_{t}dt + b_{t}(\mu S_{t}dt + \sigma S_{t}dW_{t})$        replace $dS_{t}$ with (1)

$= (ra_{t}B_{t} + b_{t}\mu S_{t})dt + b_{t}\sigma S_{t}dW_{t}$, and        (4)

$dP_{t} = dC_{t}$,        (5)

where $a$ and $b$ represent the portions of money allocated in each asset; r is the risk-free rate.

Knowing (5), we can map some terms in (2) and (4) to get

$b_{t} = \frac{\partial C}{\partial S}$, and        (6)

$ra_{t}B{t} = \frac{\partial C}{\partial t} + \frac{1}{2}\sigma^2 S^2_{t}\frac{\partial^2 C}{\partial S^2}$        (7)

Feed (6) and (7) into (3), we get the Black Scholes partial differential equation:

$rC_{t} = rS_{t}\frac{\partial C}{\partial S} + \frac{\partial C}{\partial t} + \frac{1}{2}\sigma^2 S^2_{t}\frac{\partial^2 C}{\partial S^2}$        (8)

Apparently, a couple of Nobel Laureates solved this equation here, and now we have the Black Scholes pricing model for European options, which means if $K$ is the strike price, $C(S,T) = max(S-K, 0), C(0, t) = 0$ for all $t$ and $C(S, t)$ approaches $S$ as $S$ approaches infinity.

$C(S,t) = S_{t}\Phi(d_{1}) - e^{-r(T-t)}K\Phi(d_{2})$        (9)

where

$d_{1} = \frac{\ln{\frac{S_{t}}{K}} + (\frac{r-\sigma^2}{2}) (T-t)}{\sigma\sqrt{T-t}}$

$d_{2} = d_{1} - \sigma\sqrt{T-t}$

$\Phi{(.)}$ is the CDF of the standard normal distribution.

Using this general approach, we should be able to model any derivatives as long as we have some basic assumptions about the underlying process. Of course, it’s pointless to use these models in a dogmatic way because almost all of the assumptions behind them are not true. An investment professional’s job is not to follow the textbook and blindly apply the formula in the real world and hope it sticks, instead it’s to investigate the discrepancies between the theoretical model and empirical evidences and figure out which assumptions are violated and if so, can we translate these violations into trading opportunities.

Roy

# An Empirical Mean Reversion Test on VIX Futures

VIX mean reversion trade gets popular when the market experiences big ups and downs. You hear a lot of talks about how much money people make from trading VXX, XIV and their leveraged equivalents. However, is VIX truly mean reverting, or it seems more lucrative than it is just because people only like to talk about it when they make money and keep quiet when they lose?

In this post I use daily returns of S&P 500 VIX Short-Term Futures Index from December 2005 to August 2015 (2438 observations) to find if there’s empirical evidence that supports short-term VIX MR. It’s the most suitable vehicle for this test becuase there’s no instrument that tracks VIX spot and it is the benchmark for VXX and XIV.

VIX ST Futures Index holds VIX 1-month and 2-month futures contracts and rolls them on daily basis. Its performance suffers from contango effect like commodity futures ETFs do but it’s an inevitable cost in this case.

To find out if extreme VIX returns lead to strong short-term rebound, I group all daily returns by deciles and summarize the distributions of the accumulative future returns of each group up to 5 trading days (1 week). If VIX is truly ST MR, we should see the future returns following the 1st group (lowest) significantly higher than 0 on average, and the returns following the 10th group (highest) significantly lower than 0 on average. Future returns that go beyond the sample time period are recorded as 0.

As shown above, group 1 and group 10 are the two groups we want to focus on. If someone can systematically make money by putting in MR trade on VIX, we should see the next day (or next 2, 3, maybe 5 days) returns following these two groups distributed like this:

The actual data look like this:

It’s hard to spot any major difference between group 1 and 10 on the next day. However, on the 5th day, accumulated returns in group 1 largely outperform group 10. To better illustrate, I perform t-test on both groups from day 1 to day 5, as reported below (H0 = average return equals 0).

As it turns out, future returns in Group 1 systematically outperform Group 10 within the next 5 days. The results are not blessed by overwhelmingly strong t-stats and p-values but it’s hard to argue that we are looking at random noises here. Additionally, this test is done by end of day prices. Intraday movements, which possibly constitute the bulk of VIX MR trades, are completely ignored by this test. Therefore the results we see are likely a mitigated version of market reality.

Roy

# Constructing an Alpha Portfolio with Factor Tilts

In this post I’d like to show an example of constructing a monthly-rebalanced long-short portfolio to exploit alpha signals while controlling for factor exposures.

This example covers the time period between March 2005 and 2014. I use 477 stocks from S&P500 universe (data source: Quandl) and Fama-French 3 factors (data source: Kenneth French’s website) to conduct backtests. My alpha signal is simply the cross-sectional price level of all stocks – overweighting stocks that are low on price level and underweighting the ones that are high. By doing this I’m effectively targeting the liquidity factor so it worked out pretty well during the 2008 crisis. But that’s beside the point, for this post is more about the process and techniques than a skyward PnL curve.

At each rebalance, I rank all stocks based on the dollar value of their shares, then assign weights to them based on their ranks inversely, i.e., expensive stocks are getting lower weights and vice versa. This gives me naive exposure to my alpha signal. However, my strategy is probably exposed to common factors in the market. By the end of the day, I could have a working alpha idea and a bad performance driven by untended factor bets at the same time. This situation calls for a technique that gives me control for factor exposures while still keeping the portfolio close to the naive alpha bets.

Good news: the basic quadratic programming function is just the tool for the job – its objective function can minimize the sum of squared weight differences from to the naive portfolio while the linear constraints stretching factor exposures where we want them to be. For this study I backtested 3 scenarios: naive alpha portfolio, factor neutral portfolio and a portfolio that is neutral on MKT and HML factor but tilts towards SMB (with a desired factor loading at 0.5). As an example, the chart below shows the expected factor loadings of each 3 backtests on the 50th rebalance (84 in total). Regression coefficients are estimated with 1-year weekly returns.

After the backtests, I got 3 time-series of monthly returns for 3 scenarios. Tables below show the results of regressing these returns on MKT, SMB and HML factors. All three strategies yield similar monthly alpha, but the neutral portfolio mitigated factor loadings from the naive strategy significantly, while the size tilt portfolio kept material exposure to the SMB factor.

Tables below summarize the annualized performance of these backtests. While the neutralized portfolio generates the lowest annualized alpha, it ranks the highest in terms of information ratio.

Interpretation: the naive and size portfolio gets penalized for having more of their returns driven by factor exposures, either unintended or intentional. The neutral portfolio, with slightly lower returns, gets a better information ratio for representing the “truer” performance of this particular alpha idea.

The idea above can be extend to multiple alpha strategies and dozens of factors, or even hundreds if the universe is large enough to make it feasible. The caveat is that there is such thing as too many factors and most of them don’t last in the long run (Hwang and Lu, 2007). It’s just not that easy to come across something that carries both statistical and economic significance.

Roy