The Power of Mass Deployment

The power of statistics emerges as the sample size grows. I know, it has been repeated multiple times in Stats 101, a bunch of youtube videos, or maybe Statistics for Dummies. But has it stopped people from making judgement calls on purely empirical basis? Statements like “I’ve seen it working x times, so it’s legit” or “it’s a bad indicator because I tried on several stocks and it didn’t work” don’t really make sense when you are living in a complex realm composed by incredible amount of data, multiple dimensions of reality and endless chain reactions such as public administration and stock investments.

To illustrate, I did a back-test using a simple combination of Bollinger Bands and MACD Indicator from 2004 to 2012. It’s an end-of-day, mean-reversion strategy with a price filter and a liquidity filter. After testing it on 30 random stocks listed on TSX, this is what I got.

Nope, not impressive. But it looks quite different if we deploy it to the entire market, which is about 2300 stocks listed on TSX (1400 after using survivor filter ).

Commission is not a concern. As shown below, most of the time the strategy only holds less than 2 stocks, not 200.

The real problem for implementing this strategy, for retail investors, is computing power. Gathering latest data, completing calculation and executing trades right before market closes every day precisely is very challenging for individuals. For big players, it’s liquidity. Because the strategy targets low liquidity segments, it can’t guarantee the trading volumes will be big enough for institutional traders.

Neural Network Algorithm

This post is a token of gratitude to Andrew Ng at Stanford University. I’ve learned a lot of fascinating concepts and practical skills from his Machine Learning course at Coursera. Among these algorithms neural network is my favourite so I decided to convert this class assignment built in Octave to functions in R. I will post my code below but if anyone who’d like to play with an exported package with documentations, please contact me by email, I will send you the tarball file. The code is also available on my github repository.

Comparing to the more comprehensive package “neuralnet” on CRAN, this is a lite version which consists all the basic components: single hidden layer, random initialization, sigmoid activation, regularized cost function, forward-propagation and backward-propagation. Originally it only works as a classifier, but I modified it a bit to create a similar algorithm to handle absolute values. I’m not sure if it’s the best practice for this kind of task but it worked as expected.

This is how the algorithm looks like, along with its source of inspiration, an actual human neural neuron. By mimicking the way a human brain (which is the most powerful computer in the universe) adjusts itself to recognize patterns and estimates the future, an algorithm should be able to solve similar tasks.

For the classifier, the cost function tracks the forward propagation process is defined below.

J(\Theta) = \frac{1}{m}\sum_{i=1}^{m}\sum_{k=1}^{K} \left [ -y_{k}^{(i)}log((h_{\Theta}(x^{(i)}))_{k})-(1-y_{k}^{(i)})log(1-(h(x^{(i)}))_{k}) \right ]

Where J is the total cost we want to minimize, m represents number of training examples, K represents number of labels (for classification). When training for the ith example with label k and weights \Theta , all corresponding correct output values will be converted to 1 and the rest will become 0, so -y_{k}^{(i)} and (1-y_{k}^{(i)}) works as switches for different scenarios. Function h_{\Theta}(x^{(i)}))_{k}=g(z) calculates the training output as z=\Theta^{T}x^{i} , then processes it with sigmoid function g(z)=\frac{1}{1+e^{-z}} in order to bound it between 0 to 1 as likelihood of being 1 (which is equivalent to probability of being true).

To avoid extreme bias caused by large weights, we add parameter regulation to the cost function. Defined as

\frac{\lambda}{2m}\left [ \sum_{j=1}^{h}\sum_{k=1}^{n}(\Theta_{j,k}^{(1)})^{2}+\sum_{j=1}^{o}\sum_{k=1}^{h}(\Theta_{j,k}^{(2)})^{2} \right ]

where n , h and o equal to numbers of nodes in the input layer, hidden layer and output layer respectively.

During the backward propagation process, we push the calculated outputs backwards through the algorithm and accumulate the gradient from each layer. Because there are only one hidden layer, we can calculate the gradient of the output layer as \delta^{3}=a_{k}^{3}-y_{k} and hidden layer as \delta^{2}=(\Theta^{2})^{T}\delta^{3}{\ast}g^{'}(z^{(2)}) , then accumulate it \Delta^{(l)}=\Delta^{(l)}+\delta^{(l+1)}(a^{(l)})^{T} to finally get the total unregularized gradient \frac{\partial }{\partial \Theta^{(l)}_{ij}}J(\Theta)=\frac{1}{m}\Delta^{(l)}_{(ij)} . In the end, we just need to regularize it like we did to the cost function by adding \frac{\lambda}{m}\Theta^{(l)}_{ij} .

The absolute value algorithm is very similar. Instead of using a 0/1 (True/False) switch to train examples with same labels all at once, it trains one example at a time and calculate the cost by squared error of each prediction (y^{(i)}-h_{\Theta}(x^{(i)}))^{2} .

Now we can wrap up a cost function and a gradient function to test it by running optim() provided by default R package “stat”. In this case I choose L-BFGS-B method because it utilizes everything that we’ve got in hand and seems learns faster than most of the others.

The classifier first. Three blocks of data are randomly generated and labelled for training. Because we intentionally created them for testing purpose, it’s not necessary to divide them for cross-validation and out-of-sample testing contrast to using it in real world.

X <- rbind(matrix(rnorm(100, 10, 1), 20, 5),
           matrix(rnorm(200, 5, 2), 40, 5),
           matrix(rnorm(150, 1, 0.1), 30, 5));
y_cla <- c(rep(3, 20), rep(1, 40), rep(2, 30));
test_cla <- nnet.train.cla(X, y_cla, hidden_layer_size = 25,
                   lambda = 5, maxit = 50);
pred_cla <- nnet.predict.cla(test_cla$Theta1, test_cla$Theta2, X);
plot(y_cla, col = "green", pch = 15);
points(pred_cla, col = "red", pch = 3);
legend("topright", legend = c("Original", "Predicted"), pch = c(15, 3), col = c("green", "red"));

Then for absolute values. I reset y values for this test and tried several values for learning speed lamda and number of iterations to get results below.

y_abs <- c(rep(-3, 20), rep(1, 40), rep(2, 30));
test_abs <- nnet.train.abs(X, y_abs, hidden_layer_size = 25,
                   lambda = 0.0001, maxit = 400);
pred_abs <- nnet.predict.abs(test_abs$Theta1, test_abs$Theta2, X);
plot(sigmoid(y_abs), type = 'l', col = "green", ylim = c(0, 1));
lines(pred_abs, col = "red");
legend("topright", legend = c("Original", "Predicted"), col = c("green", "red"), lwd = c(1, 1));

Source code of the algorithm. Apparently it’s not very concise. Like said, package is provided upon request.

## =============== random initialization ===============
randInitializeWeights <- function(L_in, L_out) {
  theta <- matrix(0, L_out, 1 + L_in);
  epsilon_init <- 0.12;
  theta <- matrix(rnorm(L_out*(L_in + 1)),
              L_out, (L_in + 1))*2*epsilon_init - epsilon_init;
  return(theta);
}
## =============== sigmoid function ===============
sigmoid <- function(z) {
  g <- 1 / (1 + exp(-z));
  return(g);
}
## =============== sigmoid gradient function ===============
sigmoidGradient <- function(z) {
  g <- matrix(0, dim(z));
  g <- sigmoid(z);
  g <- g * (1 -g);
  return(g);
}


# ++++++++++++++++++ absolute value predictor functions ++++++++++++++++++
## =============== cost J and gradient ===============
nnCostFuncion.abs <- function(nn_params,
                           input_layer_size,
                           hidden_layer_size,
                           X, y, lambda) {
  Theta1 <- matrix(nn_params[1:(hidden_layer_size *
                   (input_layer_size + 1))],
                   hidden_layer_size,
                   (input_layer_size + 1));
  Theta2 <- matrix(nn_params[-1:-(hidden_layer_size *
                   (input_layer_size + 1))],
                   1,
                   (hidden_layer_size + 1));
  m <- dim(X)[1];
  J <- 0;
  Theta1_grad <- matrix(0, nrow(Theta1), ncol(Theta1));
  Theta2_grad <- matrix(0, nrow(Theta2), ncol(Theta2));

  for (i in 1:m) {
    y_sig <- sigmoid(y[i]);
  ## forward propagation
    # first feed
    a1 <- X[i, ];
    a1 <- c(1, a1);
    # first hidden layer
    z2 <- Theta1 %*% a1;
    a2 <- sigmoid(z2);
    a2 <- c(1, a2);
    # output layer
    z3 <- Theta2 %*% a2;
    a3 <- sigmoid(z3);
    # add to cost function
    J <- J + (a3 - y_sig)^2 / 2;
  ## backward propagation
    delta3 <- a3 * (1 - a3) * (a3 - y_sig);
    delta2 <- (t(Theta2) %*% delta3)[-1] * sigmoidGradient(z2);

    Theta1_grad <- Theta1_grad + (delta2 %*% a1);
    Theta2_grad <- Theta2_grad + (delta3 %*% a2);    
  }
  
  J <- J / m;
  Theta1_grad <- Theta1_grad / m;
  Theta2_grad <- Theta2_grad / m;

  # J regulization
  reg_theta1 <- Theta1[, -1];
  reg_theta2 <- Theta2[, -1];
  J <- J + (lambda/(2*m)) * (sum(reg_theta1^2) + sum(reg_theta2^2));
  # gradient regulization
  Theta1_grad[, -1] <- Theta1_grad[, -1] + (lambda/m) * Theta1[, -1];
  Theta2_grad[, -1] <- Theta2_grad[, -1] + (lambda/m) * Theta2[, -1];

  # unroll gradients
  grad <- c(c(Theta1_grad), c(Theta2_grad));

  return(list(J = J, grad = grad));
}
## =============== cost J function for optimization ===============
costFunction.abs <- function(nn_params,
                           input_layer_size,
                           hidden_layer_size,
                           X, y, lambda) {
  costJ <- nnCostFuncion.abs(nn_params = nn_params,
                          input_layer_size = input_layer_size,
                          hidden_layer_size = hidden_layer_size,
                          X = X, y = y, lambda = lambda)$J;
  return(costJ);
}
## ========== cost J gradient function for optimization ==========
gradFunction.abs <- function(nn_params,
                           input_layer_size,
                           hidden_layer_size,
                           X, y, lambda) {
  grad <- nnCostFuncion.abs(nn_params = nn_params,
                          input_layer_size = input_layer_size,
                          hidden_layer_size = hidden_layer_size,
                          X = X, y = y, lambda = lambda)$grad;
  return(grad);
}
#################### end of utility functions ####################

## ================ train nerual network ===============
nnet.train.abs <- function(X, y, hidden_layer_size = 25,
                 lambda = 1, maxit = 50) {
  m <- nrow(X);
  input_layer_size <- ncol(X);
  # ================ Initializing Pameters ================
  initial_Theta1 <- randInitializeWeights(input_layer_size, hidden_layer_size);
  initial_Theta2 <- randInitializeWeights(hidden_layer_size, 1);
  initial_nn_params = c(c(initial_Theta1), c(initial_Theta2));

  # =================== Training NN ===================
  train_results <- optim(par = initial_nn_params,
                       fn = costFunction.abs,
                       input_layer_size = input_layer_size,
                       hidden_layer_size = hidden_layer_size,
                       X = X, y = y, lambda = lambda,
                       gr = gradFunction.abs,
                       method = "L-BFGS-B",
                       control = list(maxit = maxit, trace = TRUE, REPORT = 1));
  nn_params <- train_results$par;
  Theta1 <- matrix(nn_params[1:(hidden_layer_size*(input_layer_size+1))],
                   hidden_layer_size, (input_layer_size + 1));
  Theta2 <- matrix(nn_params[-1:-(hidden_layer_size*(input_layer_size+1))],
                   1, (hidden_layer_size + 1));
  ## ================= show accuracy =================
  pred = nnet.predict.abs(Theta1, Theta2, X);
  cat("\nLogistic Prediction Error: ", var(pred - sigmoid(y)), "\n", sep = "");
  ## =============== return thetas ===============
  return(list(Theta1 = Theta1, Theta2 = Theta2));
}

## =============== nerual network predict ===============
nnet.predict.abs <- function(Theta1, Theta2, X) {
  m <- nrow(X);
  if (is.null(m)) {
    m <- 1; X <- t(X);
  }
  p = rep(0, m);

  h1 <- sigmoid(cbind(rep(1, m), X) %*% t(Theta1));
  h2 <- sigmoid(cbind(rep(1, m), h1) %*% t(Theta2));
  return(h2);
}

# ++++++++++++++++++ classifier functions ++++++++++++++++++
## =============== cost J and gradient ===============
nnCostFunction.cla <- function(nn_params,
                           input_layer_size,
                           hidden_layer_size,
                           num_labels,
                           X, y, lambda) {
  Theta1 <- matrix(nn_params[1:(hidden_layer_size *
                   (input_layer_size + 1))],
                   hidden_layer_size,
                   (input_layer_size + 1));
  Theta2 <- matrix(nn_params[-1:-(hidden_layer_size *
                   (input_layer_size + 1))],
                   num_labels,
                   (hidden_layer_size + 1));
  m <- dim(X)[1];
  J <- 0;
  Theta1_grad <- matrix(0, nrow(Theta1), ncol(Theta1));
  Theta2_grad <- matrix(0, nrow(Theta2), ncol(Theta2));

  for (i in 1:m) {
    y_label <- rep(0, num_labels);
    y_label[y[i] == as.integer(levels(factor(y)))] <- 1;
  ## forward propagation
    # first feed
    a1 <- X[i, ];
    a1 <- c(1, a1);
    # first hidden layer
    z2 <- Theta1 %*% a1;
    a2 <- sigmoid(z2);
    a2 <- c(1, a2);
    # output layer
    z3 <- Theta2 %*% a2;
    a3 <- sigmoid(z3);
    # add to cost function
    J <- J + sum(-y_label * log(a3) - (1 - y_label) * log(1 - a3));
  ## backward propagation
    delta3 <- a3 - y_label;
    delta2 <- (t(Theta2) %*% delta3)[-1] * sigmoidGradient(z2);

    Theta1_grad <- Theta1_grad + (delta2 %*% a1);
    Theta2_grad <- Theta2_grad + (delta3 %*% a2);    
  }
  
  J <- J / m;
  Theta1_grad <- Theta1_grad / m;
  Theta2_grad <- Theta2_grad / m;

  # J regulization
  reg_theta1 <- Theta1[, -1];
  reg_theta2 <- Theta2[, -1];
  J <- J + (lambda/(2*m)) * (sum(reg_theta1^2) + sum(reg_theta2^2));
  # gradient regulization
  Theta1_grad[, -1] <- Theta1_grad[, -1] + (lambda/m) * Theta1[, -1];
  Theta2_grad[, -1] <- Theta2_grad[, -1] + (lambda/m) * Theta2[, -1];

  # unroll gradients
  grad <- c(c(Theta1_grad), c(Theta2_grad));

  return(list(J = J, grad = grad));
}
## =============== cost J function for optimization ===============
costFunction.cla <- function(nn_params,
                           input_layer_size,
                           hidden_layer_size,
                           num_labels,
                           X, y, lambda) {
  costJ <- nnCostFunction.cla(nn_params = nn_params,
                          input_layer_size = input_layer_size,
                          hidden_layer_size = hidden_layer_size,
                          num_labels = num_labels,
                          X = X, y = y, lambda = lambda)$J;
  return(costJ);
}
## ========== cost J gradient function for optimization ==========
gradFunction.cla <- function(nn_params,
                           input_layer_size,
                           hidden_layer_size,
                           num_labels,
                           X, y, lambda) {
  grad <- nnCostFunction.cla(nn_params = nn_params,
                          input_layer_size = input_layer_size,
                          hidden_layer_size = hidden_layer_size,
                          num_labels = num_labels,
                          X = X, y = y, lambda = lambda)$grad;
  return(grad);
}
#################### end of utility functions ####################

## ================ train nerual network ===============
nnet.train.cla <- function(X, y, hidden_layer_size = 25,
                 lambda = 1, maxit = 50) {
  m <- nrow(X);
  input_layer_size <- ncol(X);
  num_labels <- length(levels(factor(y)));
  # ================ Initializing Pameters ================
  initial_Theta1 <- randInitializeWeights(input_layer_size, hidden_layer_size);
  initial_Theta2 <- randInitializeWeights(hidden_layer_size, num_labels);
  initial_nn_params = c(c(initial_Theta1), c(initial_Theta2));

  # =================== Training NN ===================
  train_results <- optim(par = initial_nn_params,
                       fn = costFunction.cla,
                       input_layer_size = input_layer_size,
                       hidden_layer_size = hidden_layer_size,
                       num_labels = num_labels,
                       X = X, y = y, lambda = lambda,
                       gr = gradFunction.cla,
                       method = "L-BFGS-B",
                       control = list(maxit = maxit, trace = TRUE, REPORT = 1));
  nn_params <- train_results$par;
  Theta1 <- matrix(nn_params[1:(hidden_layer_size*(input_layer_size+1))],
                   hidden_layer_size, (input_layer_size + 1));
  Theta2 <- matrix(nn_params[-1:-(hidden_layer_size*(input_layer_size+1))],
                   num_labels, (hidden_layer_size + 1));
  ## ================= show accuracy =================
  pred = nnet.predict.cla(Theta1, Theta2, X);
  cat("\nTraining Set Accuracy: ", sum(pred == y)/length(pred)*100,
      "%\n", sep = "");
  ## =============== return thetas ===============
  return(list(Theta1 = Theta1, Theta2 = Theta2));
}

## =============== nerual network predict ===============
nnet.predict.cla <- function(Theta1, Theta2, X) {
  num_labels <- nrow(Theta2);
  m <- nrow(X);
  if (is.null(m)) {
    m <- 1; X <- t(X);
  }
  p = rep(0, m);

  h1 <- sigmoid(cbind(rep(1, m), X) %*% t(Theta1));
  h2 <- sigmoid(cbind(rep(1, m), h1) %*% t(Theta2));
  p = apply(h2, 1, which.max);
  return(p);
}

A not-so-mathy Black-Litterman model

Black-Litterman model is a very handy tool to quantify and integrate different sources of information into a portfolio. But I know a fair number of people got scared away from it once they saw this:

E[R]= [(\tau\Sigma)^{-1}+P^{'}\Omega^{-1}P]^{-1}[(\tau\Sigma)^{-1}\Pi+P^{'}\Omega^{-1}Q]

Or this

w_{k} =[\lambda\Sigma]^{-1}[(\tau\Sigma)^{-1}+ p_{k}^{'}\omega^{-1}p_{k}]^{-1} [(\tau\Sigma)^{-1}\Pi + p_{k}^{'}\omega^{-1}Q_{k}]

But the idea behind is really simple: we start to construct a portfolio with weights based on implied market equilibrium returns or market capitalization, then tune the weights based on our views. Each view, although seems subjective, actually only has 3 pieces of key information: which assets will outperform which? By how many basis points? How confident you are about this view? To illustrate, here’s an example produced by my R package “portopt”. (It’s on its final stage of development and I’ll talk more details about it in the future)

First of all, let’s create an optimizer object which contains all information we need for an optimization task, including historical returns of 5 hypothetical assets (normally distributed around 0; creatively named A, B, C, D and E), their market capitalization weights (20% each), scalar for views (tau), and so on. In this example, we have 2 views and they have already built in the optimizer. If we call function show.optimizer() and pass in the object as the only argument, these views will be listed below. And if we call function run.optimizer(), we can see how the portfolio weights shifted because of the views. As shown below:

Now let’s call function set.views() to change our views. The program will ask how many views you have, then loop you through each view with the 3 key questions. See below:

Now if we run optimization on this new optimizer, we’ll see corresponding weights change. We see some big shifts here because 5% and 10% are some very large numbers to a bunch of standard-normally distributed returns.

Put it all together:

This post is just an illustration of how the model is used without any details about how it works. There are many discussions and research about different components of this model such as how to set up \tau, how to calculate \Omega vector, how to treat confidence level, etc. An investor should look into theses topics before finally decide to put this thing in production.

Mean-Reversion Risk Tuning

This is just a test for fun. What if we buy everything that goes down to a certain point, say -%1 or -%5 at the end of the day everyday? There are probably a bunch of programs out there can do this but I prefer something home-made.

To get a list of all publicly traded stocks on NASDAQ, I went to EOData and grabbed a txt file for all NASDAQ tickers. But the trouble was even though the calculation I wanted to perform was simple, the volume was too large for my computer to handle. So I had to push everything to github, start an Amazon EC2 instance, configure it, pull from my own repository, run my script, then push everything back. The process itself is not complicated, just a lot of annoying details.

Anyhow, before everybody gets bored with my geeky life story, here are the results. The numbers in the legend represent mean-reversion signals. For instance, -0.12 means “what if we buy everything that goes down more than 12%?

It seems as we moveing the signal further away from 0, the strategy becomes more aggressive and performance improves gradually, except for -0.19 and -0.20, which might be two outliers or trend changers.

Source code in R.

## functions
EODMR <- function(returnsData, shockRates = seq(-0.01, -0.2, by = -0.01)) {
  output <- mrTest(returnsData, shockRate = 0)
  output <- calcRet(output)
  outputRet <- output
  annRet <- prod(1+output)^(250/length(output))-1
  annSd <- sd(output)*sqrt(250)
  Sharpe <- annRet/annSd
  output <- data.frame(shock0 = rbind(annRet, annSd, Sharpe))
  for (i in seq(along = shockRates)) {
	testData <- mrTest(returnsData, shockRate = shockRates[i])
	finalRet <- calcRet(testData)
        outputRet <- merge(outputRet, finalRet)
        names(outputRet)[i + 1] <- shockRates[i]
        annRet <- prod(1+finalRet)^(250/length(finalRet))-1
        annSd <- sd(finalRet)*sqrt(250)
        Sharpe <- annRet/annSd
        finalRet <- data.frame(rbind(annRet, annSd, Sharpe))
        names(finalRet) <- shockRates[i]
	output <- cbind(output, finalRet)
	print(paste(i, "out of ", length(shockRates), " finished"))
  }
  return(list(testStat = output, testReturns = outputRet))
}

# download and format returns data
getReturns <- function(tickers, start = Sys.Date() - 2520,
                       minVol = 100000) {  
  returnsData <- get.hist.quote("^GSPC", start = start,
                           quote = "AdjClose")
  returnsData <- merge(ROC(returnsData), Next(ROC(returnsData)))
  names(returnsData) <- paste("SPX", c("Ret0", "Ret1"), sep = ".")
  
  for (i in seq(along = tickers)) {
    nextTicker <- NULL
    nextTicker <- try(get.hist.quote(tickers[i], start = start,
                                     quote = c("AdjClose", "Volume")))
    
    if (!is.character(nextTicker[1])) {

      meanVol <- as.numeric(mean(nextTicker))
      if (meanVol >= minVol) {
        nextTicker <- nextTicker$Adj
      } else next

      nextTicker <- nextTicker[!duplicated(index(nextTicker))]
      nextTicker <- merge(ROC(nextTicker), Next(ROC(nextTicker)))
      names(nextTicker) <- paste(tickers[i], c("Ret0", "Ret1"), sep = ".")
      returnsData <- merge(retournsData, nextTicker)
    } else next
  print(paste(i, "out of", length(tickers), sep = " "))
  }
  return(returnsData)
}

# perform MR test
mrTest <- function(returnsData, shockRate = -0.05) {
  testRets <- returnsData[, 1:2]
  thisName <- gsub(".Ret0", "", names(testRets)[1])
  testRets <- ifelse(testRets[, 1] <= shockRate, testRets[, 2], 0)
  dim(testRets) <- c(length(testRets), 1)
  names(testRets) <- thisName
  
  secNum <- ncol(returnsData) / 2
  for (i in 2:secNum) {
    thisSec <- returnsData[, (i * 2 - 1):(i * 2)]
    thisName <- gsub(".Ret0", "", names(thisSec)[1])
    thisSec <- ifelse(thisSec[, 1] <= shockRate, thisSec[, 2], 0)
    dim(thisSec) <- c(length(thisSec), 1)
    names(thisSec) <- thisName
    testRets <- merge(testRets, thisSec)
    testRets[is.na(testRets)] <- 0
  }
  return(testRets)
}

# calculate strategy results
calcRet <- function(testRets) {
  timeLength <- nrow(testRets)
  finalRet <- rep(0, timeLength)
  for (i in 1:timeLength) {
    tmp <- as.numeric(testRets[i, ])
    tmp <- tmp[tmp != 0]
    finalRet[i] <- mean(tmp)
  }
  finalRet[is.na(finalRet)] <- 0
  finalRet <- zoo(finalRet, order.by = index(testRets))
  return(finalRet)
}

#### run code
# load packages
library(tseries)
library(quantmod)
library(PerformanceAnalytics)

# perform test
tickersNasdaq <- as.vector(read.table("~/R/EODMR_development/NASDAQ.txt",
                                      sep = "\t")[-1, 1])
testNasdaq <- EODMR(returnsNasdaq)

# visualize results
charts.PerformanceSummary(testNasdaq$testReturns, colorset = rainbow(21), ylog = 1)
chart.RiskReturnScatter(testNasdaq$testReturns, colorset = rainbow(21), symbolset = rep(3, 21))

NUCFLASH


Nucflash refers to detonation or possible detonation of a nuclear weapon which creates a risk of an outbreak of nuclear war.


Recently I was engaged in an interesting conversation about extreme risks and thought it would be fun to do a study on stocks’ group behavior during and after market shocks. Sort of like a fallout analysis.

Around 500 stocks listed on NASDAQ were sampled for this study. For the past 10 years, whenever S&P500 daily return dropped by more than 4 standard deviations from it’s mean, a shock was spotted and that day was defined as Day0. Of course, the day after Day0 would be Day1, then Day2, and so on.

Each stock’s average return and standard deviation from Day0 to Day4 were ploted to track their post-shock behaviors. Black lables in the plot below are their average return and standard deviation during the past 10 years (normal state).
In this demonstration, you can literalliy see the market got “nuked” at Day0, quickly bounced back at Day1, then gradually converged back to its normal state. Sharpe readers may also noticed how rapidaly the market swithced its state between MR and TF within such a short time window.

Here’s my R code for this test. Remember to put on your radiation suit before you try it in your basement.

## a function for on-line data retrieving and formatting
get.data <- function(sec, start = Sys.Date() - 5000, end = Sys.Date()) {
  library(tseries)
  tickers <- sec
  close <- get.hist.quote(sec[1], start = start, end = end,
                          quote = 'AdjClose')
  close <- close[!duplicated(index(close))]
  for (i in 2:length(sec)) {
    nextClose <- NULL
    nextClose <- try(get.hist.quote(sec[i], start = start,
                                    end = end, quote = 'AdjClose'))
    if (!is.character(nextClose[1])) {
      nextClose <- nextClose[!duplicated(index(nextClose))]
      close <- merge(close, nextClose)
    } else tickers <- tickers[-i]
  }
  names(close) <- tickers
  close
}

## prep data
library(quantmod)
library(PerformanceAnalytics)

sec <- read.table("NASDAQ.txt", header = FALSE, sep = "\t")[, 1]
sec <- as.vector(sec)
sec <- c("^GSPC", sec)
allData <- get.data(sec)
names(allData)[1] <- "SP500"
compData <- allData[, !is.na(allData[1, ])] # survivors only
allRet <- ROC(compData, na.pad = FALSE)
allRet[is.na(allRet)] <- 0

## identify shocks: drop by more than 4 sds
SPmean <- mean(allRet$SP500)
SPsd <- sd(allRet$SP500)
shocks <- SPmean - 4*SPsd
plot(allRet$SP500, main = "SP500 Daily Returns") # visualize shocks
segments(900000000, shocks, 1350000000, shocks, col = "red")
text(1100000000, -0.06, "Market Shocks", col = "red", cex = 1.5)

## calculate and plot test data
chart.Scatter(c(0, 0.5), c(-0.2, 0.1), col = "white",
              main = "Post-Shock Behavior")

for (i in 0:4) {
  thisDataName <- paste("Day", as.character(i), sep = "")
  thisTestName <- paste("test", as.character(i), sep = "")
  assign(thisDataName, allRet[Lag(allRet$SP500 < fallOut, i),])
  thisData <- get(thisDataName)
  testMean <- sapply(thisData, mean, 2)
  testSd <- sapply(thisData, sd, 2)
  assign(thisTestName, cbind(testSd, testMean))

  testCols = c("red", "orange", "yellow", "blue", "green")
  points(get(thisTestName), col = testCols[i+1], pch = 4)
}

normalMean <- sapply(allRet, mean, 2)
normalSd <- sapply(allRet, sd, 2)
normalState <- cbind(normalSd, normalMean)
points(normalState, pch = 4)
legend("topright", pch = 4, c("Day0","Day1","Day2","Day3","Day4","Normal"),
       col = c(testCols, "Black"), text.col = c(testCols, "black"))
## alarm dismissed

Calculus and statistics: different paradigms of thinking

One day I got a question from an academic star with almost perfect GPA in our university, “I did everything my professor asked us to do, I did regression on PE, PB ratio and stuff for prediction, why my target price for Goldman Sachs is still around $300? That’s too much off from everybody else in the market, I don’t understand.” I stared at him for more than 10 seconds, speechless.

As Arthur Benjamin proposed in his speech (link below), calculus has been on the top of the math pyramid for too long, now it’s time for statistics to take over. This matters because in general our mind is skewed too much towards the calculus-style, deterministic way of thinking. For one thing, it is much more intuitive for human mind to understand things that are either right or wrong; for another, people who claim they know everything and can “prove” that by making predictions always have much more audiences than those who say “this might be true, I could be wrong but it’s the closest we can get”. Also from my own experience, our education systems are mostly designed to reward people who get desired certainties, not people who comprehend things sensibly. Consequentially, students started to pick up the habit of “sacrifice reality for elegance” (Paul Wilmott) and the line between doing scientific research and confirming collective bias is blurred. Einstein was proven wrong about “God doesn’t play dice”, but regretfully that doesn’t stop ordinary people believing it’s possible to eliminate uncertainty in their own life.

On the opposite, not saying statistics is better, but it does focus more on observation and self-evaluation. Its purpose is not about to find the only perfect answer, instead it’s about seeing things from as many dimensions as possible (which is a very unnatural process to human brain). One accurate prediction doesn’t make a statistical model work; a long enough series of predictions under relatively bias free conditions with acceptable level of error does (now I kind of understand why people don’t like it…). Again, I don’t think it’s better than calculus, but I think this is the key to problems such as “if I did it right, how come I’m still losing money?”

Recommended Reading: Fooled by Randomness (N.Taleb)

Recommended Video: Arthur Benjamin’s formula for changing math education

Neutralizing Portfolio Beta (2)

After a quick-and-dirty test in the last post, I roughly illustrated the idea of equity neutral strategy. Here I’d like to do the same thing with same securities. Except this time all portfolios will be rebalanced on weekly basis, and both optimization and beta calculation will be based on 2-year weekly data.

library(tseries)
library(quantmod)

picks <- c('KO','WFC','AXP','PG','JNJ','WMT','COP','USB')
buffetWeights <- c(.216, .1956, .1228, .0983, .0306, .0426, .0343, .0351)
buffetWeights <- buffetWeights / sum(buffetWeights) # scale to 1

## download historical close prices
get.close <- function(picks,start = Sys.Date() - 5040,end = Sys.Date()) {
  histClose <- get.hist.quote(picks[1], start = start, quote = 'AdjClose')
  if(length(picks) > 1) {
    for (i in 2:length(picks)) {
      histClose <- cbind(histClose, get.hist.quote
                        (picks[i],start = start, quote = 'AdjClose'))
    }
  }
  names(histClose) <- picks
  histClose
}

buffetPicks <- get.close(picks)
SPX <- get.close('^GSPC')
## download historical close prices

buffetPicks <- na.locf(buffetPicks) # fill NA with the last observation
plot(buffetPicks) # same graph as in the last post
# monthly returns
buffetReturns <- ROC(buffetPicks[endpoints(buffetPicks, 'weeks'),], 
                     na.pad = FALSE)
SPXReturn <- ROC(SPX[endpoints(SPX, 'weeks'), ], na.pad = FALSE)
names(SPXReturn) <- 'SPX'

# long-only portfolios
eqlPort <- apply(buffetReturns, 1, mean)
buffetPort <- apply(buffetReturns * buffetWeights, 1, sum)

optWeights <- rollapply(buffetReturns, width = 104,
                        function(x) portfolio.optim(x, reslow = 
                          rep(0, 8), reshigh = rep(1, 8))$pw, 
                        by.column = FALSE,align = 'right')
optPort <- merge(buffetReturns, optWeights)
optPort[is.na(optPort)] <- 0
for (i in seq(along = picks)) {
  optPort[, i] <- optPort[, i] * Lag(optPort[, i + length(picks)])
}
optPort <- apply(optPort[, 1:8], 1, sum)

library(PerformanceAnalytics) # only load it now because this package 
                              # causes errors to function portfolio.optim
chart.CumReturns(merge(SPXReturn, eqlPort, buffetPort, optPort),colorset
                 = rainbow(4), main = 'Long-only Portfolios vs SPX', 
                 legend = 'topleft')

# neutralize beta
betaNeutral <- function(against=SPXReturn, port=eqlPort, period=104) {
  neutPort <- merge(against, port)
  beta <- rollapply(neutPort, width = period, 
                    function(x)coef(lm(x[, 2] ~ x[, 1],
                                       data = as.data.frame(x))), 
                    by.column = FALSE, align = 'right')[, 2]
  neutPort <- merge(neutPort, Lag(beta))
  neutPort[is.na(neutPort)] <- 0
  return(neutPort[, 2] - neutPort[, 1] * neutPort[, 3])
}

eqlNeutral <- betaNeutral()
buffetNeutral <- betaNeutral(port = buffetPort)
optNeutral <- betaNeutral(port = optPort)

# put all together
combined <- merge(SPXReturn, eqlPort, buffetPort, optPort, eqlNeutral,
                  buffetNeutral, optNeutral)
names(combined)[5:7] <- c('eqlNeutral', 'buffetNeutral', 'optNeutral')
combined[is.na(combined)] <- 0

# neutralized portfolios
chart.CumReturns(combined[, -2:-4], colorset = rainbow(4),
                 main = 'Neutralized Portfolios vs SPX', 
                 legend = 'topleft')

# performance summary
cols = c('red', rep('green', 3), rep('blue', 3))
charts.PerformanceSummary(combined, ylog = 1, 
                          main = 'Neutralized vs Long-only vs S&P500', 
                          cex.axis= 1,colorset = cols)

# risk-return relationship
chart.RiskReturnScatter(merge(combined, buffetReturns), 
                        colorset = c(cols, rep('yellow', 8)),
                        symbolset = rep(3, 16))

# correlation illustration
chart.Correlation(combined)

With less vague data and higher frequency of rebalance, market neutral strategy is more powerful. It’s also interesting to see even with the same equities, Mr. Buffet’s allocation strategy actually outperforms both equal-weighted allocation and optimal allocation strategy.

Neutralizing Portfolio Beta

For stock pickers, market neutral means guarding equity portfolios from market shocks by doing long/short  at the same time (for traders it probably means statistical arbitrage). A market neutral investor would hold long positions in his favourable stocks and short positions in those he doesn’t like to minimize impact from broad market movements. Both the concept and the math behind are not that complicated. Thus tt would be interesting to visualize how effective it is by running a quick test.

To keep it quick-and-dirty enough so that even not-so-sophisticated investors could try it at home, I decided to get lazy and skip the due diligence part of stock picking. Instead I went to the holy grail of stock picking and directly took his 8 top holdings  as their weights (KFT and WSC were dropped for insufficient data).

Out from these stocks I constructed three long-only, monthly-rebalanced portfolios by using equal weights, their original weights (scale to 1) and optimal weights (efficient portfolio) respectively.

We can see that the guru’s picks did quite well for the last 20 years cumulatively. But whenever the market encountered some headwinds, they usually plunged with it and kicked some investors out of the market despite they picked the “right” stocks (LTCM-style fail). To overcome this situation, I re-constructed three portfolios by running 12-month rolling linear-regression against the S&P500 for each of them and shorting the market by the portion of their beta.

Putting them together. The green lines are long-only portfolios and blue lines are neutralized portfolios. The neutralized portfolios didn’t add too much value in terms of risk-return efficiency but protected investors from severe market downturns as expected. The correlation chart shows the neutralized portfolios have extremely low correlation with the market, which well explains why it works (number size is scaled by it’s value so a value close to 0 will be too small to read).


Optimizing Ivy Portfolio (2)

Last time we attempted to optimize an Ivy Portfolio with very conservative rules to implement the principle of “winning by not losing” and it went quite well. Now just as Quantum Financier mentioned last time, what if we loose some constrains? Presumably, we should be rewarded by more return for taking extra risk.

To make it more observable, I started from the CVaR Ivy Portfolio in my last post and changed several rules below. All of these changes are essentially telling one thing to the model, “go out there to take more risks”.

1. Optimizer: Minimize CVaR -> Maximize Sharpe Ratio (minimizing variance and maximizing expected return at the same time)
2. Timing Signal: 12-month SMA -> 12-month EMA (focusing more on short-term momentum)
3. Weight Constrain: [0, 1] -> [-0.5, 1.5] (higher leverage)
4. Number of assets in the portfolio: 3 -> 6 (broader coverage)


Yep, the plan works. And as we expected, we exposed ourselves to higher drawdown risks at the same time.

Optimizing Ivy Portfolio

Early on I posted a simple live version of GTAA strategy. It demonstrated the effectiveness of the Ivy Portfolio (M.Faber, 2009) rationale in recent market with a small sample. Again, the rationale is very simple and powerful: screen a wide range of asset classes each week/month, then invest in those that have shown the strongest momentum. Last time I tracked 39 ETFs’ 9-month SMA and equally allocated portfolio assets to the top 8. Although I got pretty good results, the sample was relatively small and those ETFs are quite different in terms of time of inception, liquidity, tracking error, etc.

And above all, equal allocation seems a bit, for lack of a better word, boring. This time I want to use a more general sample to see how we can improve this by implementing some optimization strategies I’ve shown in my previous post Backtesting Portfolio Optimization Strategies.

Some equipment check before we launch the test.

Asset Classes:
1. SPX Index: S&P 500 LargeCap Index
2. MID Index: S&P 400 MidCap Index
3. SML Index: S&P 600 SmallCap Index
4. MXEA Index: MSCI EAFE Index
5. MXEF Index: MSCI Emerging Markets Index
6. LBUSTRUU Index: Barclays US Agg Total Return Value Unhedged USD (U.S. investment grade bond)
7. XAU Curncy: Gold/USD Spot
8. SPGSCI Index: Goldman Sachs Commodity Index
9. DJUSRE Index: Dow Jones U.S. Real Estate Index
10. GBP Curncy: GBP/USD Spot
11. EUR Curncy: EUR/USD Spot
12. JPY Curncy: JPY/USD Spot
13. HKD Curncy: HKD/USD Spot

Rules:
1. Rebalance monthly
2. Rank 12-month SMA; invest in the top 3
3. For each asset, minimum weight = 5%; maximum weight = 95%
4. Use CVaR optimization to construct the portfolio each month; confidence level = 1%


Fortunately, our test didn’t fall apart and crash into the Pacific Ocean. The CVaR model seems did a good job improving the original strategy. However, it has to be pointed out that not all optimization models are better than an equal-weighted one. As demonstrated below, the minimum-variance and maximum-sharpe ratio models didn’t make much difference.

roy

Follow

Get every new post delivered to your Inbox.

Join 130 other followers

%d bloggers like this: