Constructing an Alpha Portfolio with Factor Tilts

In this post I’d like to show an example of constructing a monthly-rebalanced long-short portfolio to exploit alpha signals while controlling for factor exposures.

This example covers the time period between March 2005 and 2014. I use 477 stocks from S&P500 universe (data source: Quandl) and Fama-French 3 factors (data source: Kenneth French’s website) to conduct backtests. My alpha signal is simply the cross-sectional price level of all stocks – overweighting stocks that are low on price level and underweighting the ones that are high. By doing this I’m effectively targeting the liquidity factor so it worked out pretty well during the 2008 crisis. But that’s beside the point, for this post is more about the process and techniques than a skyward PnL curve.

At each rebalance, I rank all stocks based on the dollar value of their shares, then assign weights to them based on their ranks inversely, i.e., expensive stocks are getting lower weights and vice versa. This gives me naive exposure to my alpha signal. However, my strategy is probably exposed to common factors in the market. By the end of the day, I could have a working alpha idea and a bad performance driven by untended factor bets at the same time. This situation calls for a technique that gives me control for factor exposures while still keeping the portfolio close to the naive alpha bets.

Good news: the basic quadratic programming function is just the tool for the job – its objective function can minimize the sum of squared weight differences from to the naive portfolio while the linear constraints stretching factor exposures where we want them to be. For this study I backtested 3 scenarios: naive alpha portfolio, factor neutral portfolio and a portfolio that is neutral on MKT and HML factor but tilts towards SMB (with a desired factor loading at 0.5). As an example, the chart below shows the expected factor loadings of each 3 backtests on the 50th rebalance (84 in total). Regression coefficients are estimated with 1-year weekly returns.

fct_expo

After the backtests, I got 3 time-series of monthly returns for 3 scenarios. Tables below show the results of regressing these returns on MKT, SMB and HML factors. All three strategies yield similar monthly alpha, but the neutral portfolio mitigated factor loadings from the naive strategy significantly, while the size tilt portfolio kept material exposure to the SMB factor.

Capture

Tables below summarize the annualized performance of these backtests. While the neutralized portfolio generates the lowest annualized alpha, it ranks the highest in terms of information ratio.

info_ratio

Interpretation: the naive and size portfolio gets penalized for having more of their returns driven by factor exposures, either unintended or intentional. The neutral portfolio, with slightly lower returns, gets a better information ratio for representing the “truer” performance of this particular alpha idea.

The idea above can be extend to multiple alpha strategies and dozens of factors, or even hundreds if the universe is large enough to make it feasible. The caveat is that there is such thing as too many factors and most of them don’t last in the long run (Hwang and Lu, 2007). It’s just not that easy to come across something that carries both statistical and economic significance.

Roy

Non-linear Twists in Stock & Bond Correlation

Stocks and bonds are negatively correlated. Translation: they must move against each other most of the time. Because intuitively, stocks bear higher risk than bonds so investors go to stocks when they want to take more risks and flee to bonds when they feel a storm is coming. Plus the numbers tell the same story, too – correlation coefficient between SPY and TLT from 2002 to 2015 is -0.42, year-over-year correlation on daily returns are:

corr_yoy

However, this effect was very week from 2004 to 2006. This makes sense because in a credit expansion like that, it was hard for any asset class to go down (except for cash, of course).

But this observation reveals that the conventional stock & bond correlation might be conditional or even deceptive. One might ask, is this stock & bond relationship significantly different in bull and bear markets? Does it also depend on market returns? Or does it just depend on market directions?

To keep it simple, I will stick to SPY and TLT daily returns. If I split my data into bull (2002-2007 & 2012-2015) and bear (2008-2011) periods, and divide each of them into two groups (market goes up & market goes down), then dice each group by quantiles of market returns, I will get:

corr_bull bear_corr

The graphs show that these two assets tend to dramatically move against each other when the market is going extremely up or down. Also this effect seems more pronounced when a bull market is having an up day or a bear market is having a down day. But there’s nothing significantly different between the bear and bull groups.

Next I can try not to split the data into bear & bull, instead I’ll just divide it by market direction, then quantile of performance.

cor1

This graph clearly shows that stocks & bonds mostly only move against each other when the market is having a extremely up or down day, either in a bull or bear market. Of course, one could argue that this is a self-fulfilling prophecy because big correlation coef’s feed on big inputs (large market movements), but in the chart the correlation coef’s do not change proportionally through quantiles, which confirms a non-linear relationship.

Roy

A Case of Ambiguous Definition

“Managers of government pension plans counter that they have longer investment horizons and can take greater risks. But most financial economists believe that the risks of stock investments grow, not shrink, with time.” – WSJ

This statement mentioned “risks” twice but they actually mean different things. Therefore the second sentence is correct by itself but cannot be used to reject the first one.

The first “risk” is timeless. The way it’s calculated always scales it down to 1 time unit, which is the time interval between any two data points in the sample.

Risk_1 = \sigma^2 = \frac{1}{N} \sum_{i=1}^{N}(R_i - \bar{R})^2

When “risk” is defined this way, a risky investment A and a less risky investment B have their returns look like this:

figure_1

The second “risk” is the same thing but gets scaled for N time units. It’s not how variance is defined but people use it because it has a practical interpolation (adjust for different time horizons).

Risk_2 = N * \sigma^2 = \sum_{i=1}^{N}(R_i - \bar{R})^2

Under this definition, the possible PnL paths for A and B look like this:

figure_2figure_3

A’s Monte Carlo result is wider than B, but both A and B’s “risk” by the second definition increases through time, while by the first definition never changed.

I have intentionally avoided mentioning time diversification because doing so would probably make things more confusing. For more details on this please see Chung, Smith and Wu (2009).

Roy

A New Post

First of all, apologies to anyone who were expecting new posts or left a comment here but didn’t get a reply from me. There were quite a few changes in my life and I simply had to move my time and energy on blogging somewhere else. Now I’m trying to get back to it.

Because of reasons I will avoid writing about specific investment strategies, factor descriptors or particular stocks. I will write more about my thoughts (or thoughts stolen from people smarter than me) on generic techniques and theories. In an attempt to be rigorous (or more realistically, less sloppy), I will try to stay on one main track: hypothesis -> logical explanations -> supporting data or observations. This time I will use math and programming to make sense of things instead of just getting results on paper.

Roy

The Power of Mass Deployment

The power of statistics emerges as the sample size grows. I know, it has been repeated multiple times in Stats 101, a bunch of youtube videos, or maybe Statistics for Dummies. But has it stopped people from making judgement calls on purely empirical basis? Statements like “I’ve seen it working x times, so it’s legit” or “it’s a bad indicator because I tried on several stocks and it didn’t work” don’t really make sense when you are living in a complex realm composed by incredible amount of data, multiple dimensions of reality and endless chain reactions such as public administration and stock investments.

To illustrate, I did a back-test using a simple combination of Bollinger Bands and MACD Indicator from 2004 to 2012. It’s an end-of-day, mean-reversion strategy with a price filter and a liquidity filter. After testing it on 30 random stocks listed on TSX, this is what I got.

Nope, not impressive. But it looks quite different if we deploy it to the entire market, which is about 2300 stocks listed on TSX (1400 after using survivor filter ).

Commission is not a concern. As shown below, most of the time the strategy only holds less than 2 stocks, not 200.

The real problem for implementing this strategy, for retail investors, is computing power. Gathering latest data, completing calculation and executing trades right before market closes every day precisely is very challenging for individuals. For big players, it’s liquidity. Because the strategy targets low liquidity segments, it can’t guarantee the trading volumes will be big enough for institutional traders.

Neural Network Algorithm

This post is a token of gratitude to Andrew Ng at Stanford University. I’ve learned a lot of fascinating concepts and practical skills from his Machine Learning course at Coursera. Among these algorithms neural network is my favourite so I decided to convert this class assignment built in Octave to functions in R. I will post my code below but if anyone who’d like to play with an exported package with documentations, please contact me by email, I will send you the tarball file. The code is also available on my github repository.

Comparing to the more comprehensive package “neuralnet” on CRAN, this is a lite version which consists all the basic components: single hidden layer, random initialization, sigmoid activation, regularized cost function, forward-propagation and backward-propagation. Originally it only works as a classifier, but I modified it a bit to create a similar algorithm to handle absolute values. I’m not sure if it’s the best practice for this kind of task but it worked as expected.

This is how the algorithm looks like, along with its source of inspiration, an actual human neural neuron. By mimicking the way a human brain (which is the most powerful computer in the universe) adjusts itself to recognize patterns and estimates the future, an algorithm should be able to solve similar tasks.

For the classifier, the cost function tracks the forward propagation process is defined below.

J(\Theta) = \frac{1}{m}\sum_{i=1}^{m}\sum_{k=1}^{K} \left [ -y_{k}^{(i)}log((h_{\Theta}(x^{(i)}))_{k})-(1-y_{k}^{(i)})log(1-(h(x^{(i)}))_{k}) \right ]

Where J is the total cost we want to minimize, m represents number of training examples, K represents number of labels (for classification). When training for the ith example with label k and weights \Theta , all corresponding correct output values will be converted to 1 and the rest will become 0, so -y_{k}^{(i)} and (1-y_{k}^{(i)}) works as switches for different scenarios. Function h_{\Theta}(x^{(i)}))_{k}=g(z) calculates the training output as z=\Theta^{T}x^{i} , then processes it with sigmoid function g(z)=\frac{1}{1+e^{-z}} in order to bound it between 0 to 1 as likelihood of being 1 (which is equivalent to probability of being true).

To avoid extreme bias caused by large weights, we add parameter regulation to the cost function. Defined as

\frac{\lambda}{2m}\left [ \sum_{j=1}^{h}\sum_{k=1}^{n}(\Theta_{j,k}^{(1)})^{2}+\sum_{j=1}^{o}\sum_{k=1}^{h}(\Theta_{j,k}^{(2)})^{2} \right ]

where n , h and o equal to numbers of nodes in the input layer, hidden layer and output layer respectively.

During the backward propagation process, we push the calculated outputs backwards through the algorithm and accumulate the gradient from each layer. Because there are only one hidden layer, we can calculate the gradient of the output layer as \delta^{3}=a_{k}^{3}-y_{k} and hidden layer as \delta^{2}=(\Theta^{2})^{T}\delta^{3}{\ast}g^{'}(z^{(2)}) , then accumulate it \Delta^{(l)}=\Delta^{(l)}+\delta^{(l+1)}(a^{(l)})^{T} to finally get the total unregularized gradient \frac{\partial }{\partial \Theta^{(l)}_{ij}}J(\Theta)=\frac{1}{m}\Delta^{(l)}_{(ij)} . In the end, we just need to regularize it like we did to the cost function by adding \frac{\lambda}{m}\Theta^{(l)}_{ij} .

The absolute value algorithm is very similar. Instead of using a 0/1 (True/False) switch to train examples with same labels all at once, it trains one example at a time and calculate the cost by squared error of each prediction (y^{(i)}-h_{\Theta}(x^{(i)}))^{2} .

Now we can wrap up a cost function and a gradient function to test it by running optim() provided by default R package “stat”. In this case I choose L-BFGS-B method because it utilizes everything that we’ve got in hand and seems learns faster than most of the others.

The classifier first. Three blocks of data are randomly generated and labelled for training. Because we intentionally created them for testing purpose, it’s not necessary to divide them for cross-validation and out-of-sample testing contrast to using it in real world.

X <- rbind(matrix(rnorm(100, 10, 1), 20, 5),
           matrix(rnorm(200, 5, 2), 40, 5),
           matrix(rnorm(150, 1, 0.1), 30, 5));
y_cla <- c(rep(3, 20), rep(1, 40), rep(2, 30));
test_cla <- nnet.train.cla(X, y_cla, hidden_layer_size = 25,
                   lambda = 5, maxit = 50);
pred_cla <- nnet.predict.cla(test_cla$Theta1, test_cla$Theta2, X);
plot(y_cla, col = "green", pch = 15);
points(pred_cla, col = "red", pch = 3);
legend("topright", legend = c("Original", "Predicted"), pch = c(15, 3), col = c("green", "red"));

Then for absolute values. I reset y values for this test and tried several values for learning speed lamda and number of iterations to get results below.

y_abs <- c(rep(-3, 20), rep(1, 40), rep(2, 30));
test_abs <- nnet.train.abs(X, y_abs, hidden_layer_size = 25,
                   lambda = 0.0001, maxit = 400);
pred_abs <- nnet.predict.abs(test_abs$Theta1, test_abs$Theta2, X);
plot(sigmoid(y_abs), type = 'l', col = "green", ylim = c(0, 1));
lines(pred_abs, col = "red");
legend("topright", legend = c("Original", "Predicted"), col = c("green", "red"), lwd = c(1, 1));

Source code of the algorithm. Apparently it’s not very concise. Like said, package is provided upon request.

## =============== random initialization ===============
randInitializeWeights <- function(L_in, L_out) {
  theta <- matrix(0, L_out, 1 + L_in);
  epsilon_init <- 0.12;
  theta <- matrix(rnorm(L_out*(L_in + 1)),
              L_out, (L_in + 1))*2*epsilon_init - epsilon_init;
  return(theta);
}
## =============== sigmoid function ===============
sigmoid <- function(z) {
  g <- 1 / (1 + exp(-z));
  return(g);
}
## =============== sigmoid gradient function ===============
sigmoidGradient <- function(z) {
  g <- matrix(0, dim(z));
  g <- sigmoid(z);
  g <- g * (1 -g);
  return(g);
}


# ++++++++++++++++++ absolute value predictor functions ++++++++++++++++++
## =============== cost J and gradient ===============
nnCostFuncion.abs <- function(nn_params,
                           input_layer_size,
                           hidden_layer_size,
                           X, y, lambda) {
  Theta1 <- matrix(nn_params[1:(hidden_layer_size *
                   (input_layer_size + 1))],
                   hidden_layer_size,
                   (input_layer_size + 1));
  Theta2 <- matrix(nn_params[-1:-(hidden_layer_size *
                   (input_layer_size + 1))],
                   1,
                   (hidden_layer_size + 1));
  m <- dim(X)[1];
  J <- 0;
  Theta1_grad <- matrix(0, nrow(Theta1), ncol(Theta1));
  Theta2_grad <- matrix(0, nrow(Theta2), ncol(Theta2));

  for (i in 1:m) {
    y_sig <- sigmoid(y[i]);
  ## forward propagation
    # first feed
    a1 <- X[i, ];
    a1 <- c(1, a1);
    # first hidden layer
    z2 <- Theta1 %*% a1;
    a2 <- sigmoid(z2);
    a2 <- c(1, a2);
    # output layer
    z3 <- Theta2 %*% a2;
    a3 <- sigmoid(z3);
    # add to cost function
    J <- J + (a3 - y_sig)^2 / 2;
  ## backward propagation
    delta3 <- a3 * (1 - a3) * (a3 - y_sig);
    delta2 <- (t(Theta2) %*% delta3)[-1] * sigmoidGradient(z2);

    Theta1_grad <- Theta1_grad + (delta2 %*% a1);
    Theta2_grad <- Theta2_grad + (delta3 %*% a2);    
  }
  
  J <- J / m;
  Theta1_grad <- Theta1_grad / m;
  Theta2_grad <- Theta2_grad / m;

  # J regulization
  reg_theta1 <- Theta1[, -1];
  reg_theta2 <- Theta2[, -1];
  J <- J + (lambda/(2*m)) * (sum(reg_theta1^2) + sum(reg_theta2^2));
  # gradient regulization
  Theta1_grad[, -1] <- Theta1_grad[, -1] + (lambda/m) * Theta1[, -1];
  Theta2_grad[, -1] <- Theta2_grad[, -1] + (lambda/m) * Theta2[, -1];

  # unroll gradients
  grad <- c(c(Theta1_grad), c(Theta2_grad));

  return(list(J = J, grad = grad));
}
## =============== cost J function for optimization ===============
costFunction.abs <- function(nn_params,
                           input_layer_size,
                           hidden_layer_size,
                           X, y, lambda) {
  costJ <- nnCostFuncion.abs(nn_params = nn_params,
                          input_layer_size = input_layer_size,
                          hidden_layer_size = hidden_layer_size,
                          X = X, y = y, lambda = lambda)$J;
  return(costJ);
}
## ========== cost J gradient function for optimization ==========
gradFunction.abs <- function(nn_params,
                           input_layer_size,
                           hidden_layer_size,
                           X, y, lambda) {
  grad <- nnCostFuncion.abs(nn_params = nn_params,
                          input_layer_size = input_layer_size,
                          hidden_layer_size = hidden_layer_size,
                          X = X, y = y, lambda = lambda)$grad;
  return(grad);
}
#################### end of utility functions ####################

## ================ train nerual network ===============
nnet.train.abs <- function(X, y, hidden_layer_size = 25,
                 lambda = 1, maxit = 50) {
  m <- nrow(X);
  input_layer_size <- ncol(X);
  # ================ Initializing Pameters ================
  initial_Theta1 <- randInitializeWeights(input_layer_size, hidden_layer_size);
  initial_Theta2 <- randInitializeWeights(hidden_layer_size, 1);
  initial_nn_params = c(c(initial_Theta1), c(initial_Theta2));

  # =================== Training NN ===================
  train_results <- optim(par = initial_nn_params,
                       fn = costFunction.abs,
                       input_layer_size = input_layer_size,
                       hidden_layer_size = hidden_layer_size,
                       X = X, y = y, lambda = lambda,
                       gr = gradFunction.abs,
                       method = "L-BFGS-B",
                       control = list(maxit = maxit, trace = TRUE, REPORT = 1));
  nn_params <- train_results$par;
  Theta1 <- matrix(nn_params[1:(hidden_layer_size*(input_layer_size+1))],
                   hidden_layer_size, (input_layer_size + 1));
  Theta2 <- matrix(nn_params[-1:-(hidden_layer_size*(input_layer_size+1))],
                   1, (hidden_layer_size + 1));
  ## ================= show accuracy =================
  pred = nnet.predict.abs(Theta1, Theta2, X);
  cat("\nLogistic Prediction Error: ", var(pred - sigmoid(y)), "\n", sep = "");
  ## =============== return thetas ===============
  return(list(Theta1 = Theta1, Theta2 = Theta2));
}

## =============== nerual network predict ===============
nnet.predict.abs <- function(Theta1, Theta2, X) {
  m <- nrow(X);
  if (is.null(m)) {
    m <- 1; X <- t(X);
  }
  p = rep(0, m);

  h1 <- sigmoid(cbind(rep(1, m), X) %*% t(Theta1));
  h2 <- sigmoid(cbind(rep(1, m), h1) %*% t(Theta2));
  return(h2);
}

# ++++++++++++++++++ classifier functions ++++++++++++++++++
## =============== cost J and gradient ===============
nnCostFunction.cla <- function(nn_params,
                           input_layer_size,
                           hidden_layer_size,
                           num_labels,
                           X, y, lambda) {
  Theta1 <- matrix(nn_params[1:(hidden_layer_size *
                   (input_layer_size + 1))],
                   hidden_layer_size,
                   (input_layer_size + 1));
  Theta2 <- matrix(nn_params[-1:-(hidden_layer_size *
                   (input_layer_size + 1))],
                   num_labels,
                   (hidden_layer_size + 1));
  m <- dim(X)[1];
  J <- 0;
  Theta1_grad <- matrix(0, nrow(Theta1), ncol(Theta1));
  Theta2_grad <- matrix(0, nrow(Theta2), ncol(Theta2));

  for (i in 1:m) {
    y_label <- rep(0, num_labels);
    y_label[y[i] == as.integer(levels(factor(y)))] <- 1;
  ## forward propagation
    # first feed
    a1 <- X[i, ];
    a1 <- c(1, a1);
    # first hidden layer
    z2 <- Theta1 %*% a1;
    a2 <- sigmoid(z2);
    a2 <- c(1, a2);
    # output layer
    z3 <- Theta2 %*% a2;
    a3 <- sigmoid(z3);
    # add to cost function
    J <- J + sum(-y_label * log(a3) - (1 - y_label) * log(1 - a3));
  ## backward propagation
    delta3 <- a3 - y_label;
    delta2 <- (t(Theta2) %*% delta3)[-1] * sigmoidGradient(z2);

    Theta1_grad <- Theta1_grad + (delta2 %*% a1);
    Theta2_grad <- Theta2_grad + (delta3 %*% a2);    
  }
  
  J <- J / m;
  Theta1_grad <- Theta1_grad / m;
  Theta2_grad <- Theta2_grad / m;

  # J regulization
  reg_theta1 <- Theta1[, -1];
  reg_theta2 <- Theta2[, -1];
  J <- J + (lambda/(2*m)) * (sum(reg_theta1^2) + sum(reg_theta2^2));
  # gradient regulization
  Theta1_grad[, -1] <- Theta1_grad[, -1] + (lambda/m) * Theta1[, -1];
  Theta2_grad[, -1] <- Theta2_grad[, -1] + (lambda/m) * Theta2[, -1];

  # unroll gradients
  grad <- c(c(Theta1_grad), c(Theta2_grad));

  return(list(J = J, grad = grad));
}
## =============== cost J function for optimization ===============
costFunction.cla <- function(nn_params,
                           input_layer_size,
                           hidden_layer_size,
                           num_labels,
                           X, y, lambda) {
  costJ <- nnCostFunction.cla(nn_params = nn_params,
                          input_layer_size = input_layer_size,
                          hidden_layer_size = hidden_layer_size,
                          num_labels = num_labels,
                          X = X, y = y, lambda = lambda)$J;
  return(costJ);
}
## ========== cost J gradient function for optimization ==========
gradFunction.cla <- function(nn_params,
                           input_layer_size,
                           hidden_layer_size,
                           num_labels,
                           X, y, lambda) {
  grad <- nnCostFunction.cla(nn_params = nn_params,
                          input_layer_size = input_layer_size,
                          hidden_layer_size = hidden_layer_size,
                          num_labels = num_labels,
                          X = X, y = y, lambda = lambda)$grad;
  return(grad);
}
#################### end of utility functions ####################

## ================ train nerual network ===============
nnet.train.cla <- function(X, y, hidden_layer_size = 25,
                 lambda = 1, maxit = 50) {
  m <- nrow(X);
  input_layer_size <- ncol(X);
  num_labels <- length(levels(factor(y)));
  # ================ Initializing Pameters ================
  initial_Theta1 <- randInitializeWeights(input_layer_size, hidden_layer_size);
  initial_Theta2 <- randInitializeWeights(hidden_layer_size, num_labels);
  initial_nn_params = c(c(initial_Theta1), c(initial_Theta2));

  # =================== Training NN ===================
  train_results <- optim(par = initial_nn_params,
                       fn = costFunction.cla,
                       input_layer_size = input_layer_size,
                       hidden_layer_size = hidden_layer_size,
                       num_labels = num_labels,
                       X = X, y = y, lambda = lambda,
                       gr = gradFunction.cla,
                       method = "L-BFGS-B",
                       control = list(maxit = maxit, trace = TRUE, REPORT = 1));
  nn_params <- train_results$par;
  Theta1 <- matrix(nn_params[1:(hidden_layer_size*(input_layer_size+1))],
                   hidden_layer_size, (input_layer_size + 1));
  Theta2 <- matrix(nn_params[-1:-(hidden_layer_size*(input_layer_size+1))],
                   num_labels, (hidden_layer_size + 1));
  ## ================= show accuracy =================
  pred = nnet.predict.cla(Theta1, Theta2, X);
  cat("\nTraining Set Accuracy: ", sum(pred == y)/length(pred)*100,
      "%\n", sep = "");
  ## =============== return thetas ===============
  return(list(Theta1 = Theta1, Theta2 = Theta2));
}

## =============== nerual network predict ===============
nnet.predict.cla <- function(Theta1, Theta2, X) {
  num_labels <- nrow(Theta2);
  m <- nrow(X);
  if (is.null(m)) {
    m <- 1; X <- t(X);
  }
  p = rep(0, m);

  h1 <- sigmoid(cbind(rep(1, m), X) %*% t(Theta1));
  h2 <- sigmoid(cbind(rep(1, m), h1) %*% t(Theta2));
  p = apply(h2, 1, which.max);
  return(p);
}

A not-so-mathy Black-Litterman model

Black-Litterman model is a very handy tool to quantify and integrate different sources of information into a portfolio. But I know a fair number of people got scared away from it once they saw this:

E[R]= [(\tau\Sigma)^{-1}+P^{'}\Omega^{-1}P]^{-1}[(\tau\Sigma)^{-1}\Pi+P^{'}\Omega^{-1}Q]

Or this

w_{k} =[\lambda\Sigma]^{-1}[(\tau\Sigma)^{-1}+ p_{k}^{'}\omega^{-1}p_{k}]^{-1} [(\tau\Sigma)^{-1}\Pi + p_{k}^{'}\omega^{-1}Q_{k}]

But the idea behind is really simple: we start to construct a portfolio with weights based on implied market equilibrium returns or market capitalization, then tune the weights based on our views. Each view, although seems subjective, actually only has 3 pieces of key information: which assets will outperform which? By how many basis points? How confident you are about this view? To illustrate, here’s an example produced by my R package “portopt”. (It’s on its final stage of development and I’ll talk more details about it in the future)

First of all, let’s create an optimizer object which contains all information we need for an optimization task, including historical returns of 5 hypothetical assets (normally distributed around 0; creatively named A, B, C, D and E), their market capitalization weights (20% each), scalar for views (tau), and so on. In this example, we have 2 views and they have already built in the optimizer. If we call function show.optimizer() and pass in the object as the only argument, these views will be listed below. And if we call function run.optimizer(), we can see how the portfolio weights shifted because of the views. As shown below:

Now let’s call function set.views() to change our views. The program will ask how many views you have, then loop you through each view with the 3 key questions. See below:

Now if we run optimization on this new optimizer, we’ll see corresponding weights change. We see some big shifts here because 5% and 10% are some very large numbers to a bunch of standard-normally distributed returns.

Put it all together:

This post is just an illustration of how the model is used without any details about how it works. There are many discussions and research about different components of this model such as how to set up \tau, how to calculate \Omega vector, how to treat confidence level, etc. An investor should look into theses topics before finally decide to put this thing in production.