The Life of the Number-Crunching Analyst

There was a great post of visuals on the life of a data analyst. I’ve posted a few of my favorites below but check out the original post by Elisabeth Fosslien. (h/t freakonomics blog). I have certainly been able to relate to this!









Source: Elisabeth Fosslien

Advertisements

How oversold is enough? Investigations into the %above50dma indicator

I’ve long been a fan of Woodshedder’s event studies, so I thought I would do my own investigation into a couple indicators using his visualization technique. Here is an example from his blog:

Source: System Trading with Woodshedder

As you can see, the technique graphs the Avg % profit/loss after the event happens for 50 days. From that you can visually compare different levels of an indicator or compare it to a baseline like the after return of the S&P500. For our purposes I will be plotting the average return (equity curve), which is slightly different but should give similar returns.

Today I thought we could run a simple case of extremely oversold conditions (X < 20) for the % of stocks trading above the 50 day moving average (NYSE-All) indicator to see how they would plot against the average return of the S%P500. Using the Systematic Investor Toolbox (SIT) lets get started….

Let’s load the data….

# Load Systematic Investor Toolbox (SIT)
setInternet2(TRUE)
con = gzcon(url('https://github.com/systematicinvestor/SIT/raw/master/sit.gz', 'rb'))
source(con)
close(con)

#*****************************************************************
# Load historical data
#******************************************************************
load.packages('quantmod,quadprog,lpSolve,kernlab')
require(XLConnect)
require(rJava)

#LOAD DATA
filename = "I:/2013/2013 backtesting/Breadth/SPYbreadthdata.csv"

#Stock Prices
data.SPY = getSymbols('^GSPC', src = 'yahoo', from = '1996-01-01',auto.assign = FALSE)

breadth <- read.csv(filename, header = FALSE)
breadth <- make.xts(breadth[,-1], as.Date(as.character(breadth[,1]),'%Y%m%d'))
colnames(breadth) <- spl('50perc,20perc')[1:ncol(breadth)]


SPY.breadth = merge(data.SPY,breadth)
ret = SPY.breadth$GSPC.Close/mlag(SPY.breadth$GSPC.Close) - 1
ret = na.omit(ret)
SPY.breadth = merge(SPY.breadth,ret)
colnames(SPY.breadth) <- spl('Open,High,Low,Close,Volume,AdjC,p50,p20,ret')[1:ncol(SPY.breadth)]
SPY.breadth = SPY.breadth["1997::"]

Next lets set the variables, calculate the average SP500 return and create a plot


#Set variables
forward.len = 50
threshold.50 = 20

#SP500 average returns for period
SPYonly = matrix(NA, nr=forward.len,nc=1)

for (i in 1:nrow(SPY.breadth) ) {
  SPYonly = cbind(SPYonly,coredata(SPY.breadth$ret[(i+1):min((i + forward.len),nrow(SPY.breadth))]))
}


SPYonly.avg = apply(SPYonly,1,mean,na.rm=TRUE)
SPYonly.avg.equity = cumprod(1 + SPYonly.avg)
plot(SPYonly.avg.equity, type = 'l',col = 'blue', main = 'SP500 average return', ylim = range(SPYonly.avg.equity),axes = TRUE,ann=FALSE)
title(xlab="# of Days trade is held")
title(ylab="Avg Equity (Day 0 = 1)")
title(main ="SP500 average return")

Here’s what it looks like:

Finally, lets calculate the returns from the indicator being oversold and plot them. Note that I added the criteria of having to be above 20 for the indicator on the previous and then below the next day for the event to trigger. You can play around with different combinations of this too, ie. didn’t trigger last X days.

#Oversold Indicator returns
results = matrix(NA, nr=forward.len,nc=1)

for (i in 2:nrow(SPY.breadth) ) {
  if (SPY.breadth$p50[i] <=  threshold.50 && SPY.breadth$p50[i-1] >  threshold.50 ) {
    results = cbind(results,coredata(SPY.breadth$ret[(i+1):min((i + forward.len),nrow(SPY.breadth))]))
  }
}



results2 = results
results2.avg = apply(results2,1,mean,na.rm=TRUE)
results2.min = apply(results2,1,min,na.rm=TRUE)
results2.max = apply(results2,1,max,na.rm=TRUE)
results2.stdev = apply(results2,1,sd,na.rm=TRUE)
results2.1sdH = results2.avg + results2.stdev * 1
results2.1sdL = results2.avg - results2.stdev * 1

results2.avg.equity = cumprod(1 + results2.avg)
results2.1sdH.equity = cumprod(1 + results2.1sdH)
results2.1sdL.equity = cumprod(1 + results2.1sdL)

plot(results2.avg.equity, type = 'l',col = 'blue', main = 'SP500 average returns after %Stocks above 50dma <20', ylim = range(results2.avg.equity),axes = TRUE,ann=FALSE)
lines(SPYonly.avg.equity, type = 'l', col = 'red')
title(xlab="# of Days trade is held")
title(ylab="Avg Equity (Day 0 = 1)")
title(main ="SP500 average returns after %Stocks above 50dma <20")
legend(1,max(results2.avg.equity),c("%Stocks above 50dma <20","SP500 average return"),cex = 0.8,col=c("blue","red"),lty=1) 

As you can see the oversold conditions lead to short term under-performance and long term out performance versus the index. Personally I’m interested in seeing the point where the short term weakness is eliminated. We must also consider the shrinking sample size of trades as we narrow the criteria. Next time we’ll play around with different values for the threshold and maybe add some confidence intervals around the equity curves…

Dashboard Upgrades: Fundamentals

I added a couple features to the market dashboard presented earlier and seen below:
-A visual indicator for indicating whether or not the market is above or below the 50 and 200 day moving averages

In the Fundamentals section:
-The ECRI Weekly Leading Index growth rate with a multicolor scale
-A binary indicator for the ECRI WLI being above its 10wma
-The ISM Purchasing Managers Index
-The binary indicator of the trend of the Prices Paid series for several regional manufacturing surveys as well as the ISM. This series is advanced and inverted. It is supposed to give an indication of the trend of future economic activity, going along with the theme that ‘inflation is the new fed funds rate’ in a zero interest rate environment

Chris Anderson author of Makers: The New Industrial Revolution on Econtalk

There was a very interesting podcast recently on EconTalk, interviewing Chris Anderson who explores how technology is transforming the manufacturing business. Anderson believes the same personalization of culture that happened in the digital realm is now going to happen in the manufacturing area, driven by the low costs/barriers to entry to customized manufacturing. With the democratization of manufacturing, good ideas will become the dominant trait of success and will not be simply limited to those who own the means of production. Innovation will speed up due to shorter production timelines. I highly recommend checking this podcast out, details below:

http://www.econtalk.org/archives/2012/12/chris_anderson_2.html

“Chris Anderson, author of Makers: The New Industrial Revolution, talks with EconTalk host Russ Roberts about his new book–the story of how technology is transforming the manufacturing business. Anderson argues that the plummeting prices of 3D printers and other tabletop design and manufacturing tools allows for individuals to enter manufacturing and for manufacturing to become customized in a way that was unimaginable until recently. Anderson explores how social networking interacts with this technology to create a new world of crowd-sourced design and production. ”

From Financial Turbulence to Correlation Surprise

Systematic investor did a great post using the Mahalanobis distance to calculate a measure of financial turbulence. This was based on the paper Skulls, Financial Turbulence, and Risk Management by Mark Kritzman and Yuanzhen Li.

According to wikipedia:

In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936.[1] It is based on correlations between variables by which different patterns can be identified and analyzed. It gauges similarity of an unknown sample set to a known one. It differs from Euclidean distance in that it takes into account the correlations of the data set and is scale-invariant. In other words, it is a multivariate effect size.

Another useful turbulence measure can be calculated by decomposing the Mahalanobis distance into both a correlation part and magnitude part. This concept was the basis of the paper Correlation Surprise by Will Kinlaw and David Turkington. The authors go on to explain:

Kritzman and Li (2010) introduced what is perhaps the first measure to capture the degree of multivariate asset price “unusualness” through time. Their financial turbulence score spikes when asset prices “behave in an uncharacteristic fashion, including extreme price moves, decoupling of correlated assets, and convergence of uncorrelated assets.” We extend Kritzman and Li’s study by disentangling the volatility and correlation components of turbulence to derive a measure of correlation surprise.

Systematic investor created the turbulence indicator for G10 currencies, so I’ll borrow that base code to get us started and make a few modifications along the way. Going back to the Correlation Surprise paper, the authors describe how to create the indicators. I’ll also highlight the specific R code that does the calculation:

To review, we compute the following quantities to calculate correlation surprise:
1. Magnitude surprise: a “correlation-blind” turbulence score in which all off-diagonals in the covariance matrix are set to zero.

magnitude[i] = mahalanobis(ret[i,], colMeans(temp), diag(cov(temp))*diag(n))

2. Turbulence score: the degree of statistical unusualness across assets on a given day, as given in Equation 1.

turbulence[i] = mahalanobis(ret[i,], colMeans(temp), cov(temp))

3. Correlation surprise: the ratio of turbulence to magnitude surprise, using the above quantities (2) and (1), respectively.

correlation = turbulence / magnitude

The full code is below:

###############################################################################
# Load Systematic Investor Toolbox (SIT)
# http://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
source(con)
close(con)

#*****************************************************************
# Load historical data
#******************************************************************
load.packages('quantmod')

fx = get.G10()
nperiods = nrow(fx)
n = ncol(fx)

#*****************************************************************
# Rolling estimate of the Correlation Surprise for G10 Currencies
#******************************************************************
turbulence = fx[,1] * NA
magnitude = fx[,1] * NA
correlation = fx[,1] * NA
ret = coredata(fx / mlag(fx) - 1)

look.back = 252

for( i in (look.back+1) : (nperiods) ) {
temp = ret[(i - look.back + 1):(i-1), ]

# measures turbulence for the current observation
turbulence[i] = mahalanobis(ret[i,], colMeans(temp), cov(temp))
magnitude[i] = mahalanobis(ret[i,], colMeans(temp), diag(cov(temp))*diag(n))

if( i %% 200 == 0) cat(i, 'out of', nperiods, 'n')
}

correlation = turbulence / magnitude

Next, we’ll create some charts to visualize 20 day moving average the indicator.

layout(c(1,2))
plota(EMA(correlation, 20), type = 'l',col = 'red',main='Correlation Surprise')
plota(EMA(magnitude, 20), type = 'l',col = 'blue', main='Magnitude Surprise')

Perhaps in a future post we’ll look at backtesting this analysis technique to determine its merit in trading.