When Noise Overwhelms Signal – Sorting out Sorts Review

In his 1998 paper, Jonathan Berk illustrated that by sorting stocks based on a variable (e.g. B/E ratio) correlated to a known variable (e.g. beta), the power of the known variable to predict expected return within each group diminishes when tested with cross-sectional regression. This is very likely why Fama and French found the explanatory power of beta disappeared (1992) and Daniel and Titman discovered that stock characteristics matter more than covariances (1997). For researchers and data analysts, this is a perfect example of how seemingly harmless manipulation of data can cause meaningful loss of information. If not careful, such loss can lead to confusing or even completely wrong conclusions.

The intuition behind this issue is rather simple: when data gets divided into smaller groups and tested separately, the error of beta estimation becomes “louder” as the sample size gets smaller. The error-minimizing advantage from using a large sample diminishes as the sample is divided into smaller groups, as the error of estimation overwhelms the useful information in each group.

Getting the intuition is one thing, identifying where exactly the issue occurs and tracing it through the proof is a different story.


Assume CAPM holds: E[R_{i}] = r + \beta_{i}(E[R_{m}]-r), in which the systematic risk of stock i is \beta_{i}\sim\mathcal{N}(1,\sigma^{2}). Realized return \hat{R_{i}} is the same as expected return E[R_{i}]

Scenario 1: CAPM is tested cross-sectionally with a full sample with infinite number of stocks and there’s no estimation error between theoretical beta and estimated beta. i.e., \hat{\beta_{i}} \equiv \beta_{i}. The coefficient of this regression is:

\frac{cov(\hat{R_{i}}-r, \hat{\beta_{i}})}{var(\hat{\beta_{i}})} = \frac{cov(E[R_{i}]-r, {\beta_{i}})}{var({\beta_{i}})} = \frac{\sigma^2}{\sigma^2 + 0}(E[R_{m}]-r) = 1* (E[R_{m}]-r)

Interpretation: stock returns are perfectly linear (coef = 1) to their exposure to the market risk premium; beta is the perfect predictor of stock returns.

Scenario 2: there’s error in estimated beta, i.e., \hat{\beta_{i}} = \beta_{i} + \epsilon_{i}, \epsilon_{i} \sim\mathcal{N}(1,\theta^2). This is where the trouble originates. The existence of \epsilon_{i} gave birth to the original noise \theta, which will get passed down through the rest of the test. As we can see, the coefficient of the same test is already contaminated:

\frac{cov(\hat{R_{i}}-r, \hat{\beta_{i}})}{var(\hat{\beta_{i}})} = \frac{cov(E[R_{i}]-r, {\beta_{i}}+\epsilon_{i})}{var({\beta_{i}}+\epsilon_{i})} = \frac{\sigma^2}{\sigma^2 + \theta^2}(E[R_{m}]-r)

* Assuming estimated and observed returns are the same for convenience.

Interpretation: \frac{\sigma^2}{\sigma^2 + \theta^2} < 1, stock returns are less sensitive to how much systematic risk they are bearing; beta is less of a perfect predictor of stock returns.

Scenario 3: now all stocks are sorted into N fractiles by a variable linearly correlated to beta. Within the jth fractile, the conditional variance of \beta is now redefined:

\sigma_{j}^{2} \equiv var(\beta_{j}|i\in j) = \sigma^{2}g(j),

where g(j) is a concave-up function that “shrinks” \sigma^{2} when all stocks are in the jth fractile (a partial integral of the full sample).

Run the regression test again the coefficient is now:

\frac{\sigma^{2}g(j)}{\sigma^{2}g(j) + \theta^2}(E[R_{m}]-r) = \frac{\sigma^{2}}{\sigma^{2} + \theta^{2}/g(j)}(E[R_{m}]-r)

Interpretation: g(j), a term born from the sorting process, is now serving as a “noise amplifier”. It enhances \theta^{2} when it gets smaller and dampens the coefficient as a result. As a concave-up function, it gets smaller when N is larger and/or j moves closer to the middle among groups. The graph below shows how the coefficient changes with g(j) when E[R_{m}-r is fixed at 1, \sigma^{2} = 0.10 and \theta^{2} = 0.05



To illustrate with actual data, 2,000 stock betas are randomly generated with mean 1 and standard deviation 0.50; 2,000 expected returns are calculated using these betas, market return 6.00% and risk-free rate 1.00%; estimated betas are calculated by adding 2,000 random errors with mean 0 and standard deviation 0.05. All estimated returns are ranked from low to high and this will be used as the basis for sorting. In summary:

  • Number of stocks k = 2,000
  • \beta_{i}\sim\mathcal{N}(1,0.5)
  • E[R_{i}] = 0.01 + \beta_{i}(0.06 - 0.01) = 0.01 + \beta_{i}(0.05)
  • \epsilon_{i}\sim\mathcal{N}(0,0.05)
  • \hat{\beta_{i}} = \beta_{i} + \epsilon_{i}

Test for scenario 1. Run regression E[R_{i}] = \alpha + \lambda \beta_{i} + \varepsilon_{i}. We get \alpha = 0.0100, \lambda = 0.0500, R-squared = 1.00. Essentially perfect fit.

Test for scenario 2. Run regression E[R_{i}] = \alpha + \lambda \hat{\beta_{i}} + \varepsilon_{i}. We get \alpha = 0.01068, \lambda = 0.04943, R-squared = 0.9902.

Test for scenario 3. Run regression E[R_{i}|i \in j] = \alpha_{j} + \lambda_{j} \hat{\beta_{i}} + \varepsilon_{ij}; j \in [1, N].

By setting N = 5, 10, 20, 50, respectively, the coefficients in each group are as follows:



The results are consistent with Berk’s findings. The more groups the stocks are sorted into, the less predictive power beta has on expected returns; the further away j moves towards the center among all groups, the more pronouncing this effect gets.