Hm, after a bit of poking around, I find Pearson’s correlation coefficient might be interesting.
If the data sets have a perfect linear correlation, the coefficient is 1.0. (Perfect negative correlation is -1.0.) 0 is completely decorrelated. “Perfect correlation” does not mean “equal” – x[i] - xMean
eliminates DC offset, and the standard deviation product accounts for amplitude differences.
(
f = { |x, y|
var xMean = x.mean,
yMean = y.mean,
momentSum = 0, xDev = 0, yDev = 0;
x.size.do { |i|
var xDiff = (x[i] - xMean),
yDiff = (y[i] - yMean);
momentSum = momentSum + (xDiff * yDiff);
xDev = xDev + xDiff.squared;
yDev = yDev + yDiff.squared;
};
xDev = (xDev / (x.size - 1)).sqrt;
yDev = (yDev / (x.size - 1)).sqrt; // if y has extra items, they're ignored
momentSum / ((x.size - 1) * xDev * yDev)
};
)
f.(x, x); // equal datasets
-> 1.0
f.(x, x * 3 + 0.5); // in phase but different amplitude + DC
-> 1.0
f.(x, x.rotate(20)); // out of phase
-> 0.5555702330196
Edit: The formula is sigma[i = 0…n-1]((x[i] - xMean) * (y[i] - yMean)) / ((n-1) * stdev(x) * stdev(y)) … (x[i] - xMean) matches x - xMean
in my original attempt, meaning that the top of the fraction is actually a sum(x * y)
with an adjustment for DC. So I’d guessed that part right. What we were missing was the correction for amplitude, which Pearson provides with the product of standard deviations.
hjh