In this post, I start exploring the RSRuby interface and its facilities for handling some simple statistical operations. At the moment I am not interested in many of the options available for the different procedures, but just want to understand how to pass data to R from Ruby, and do some useful computations.
Given a list of readings, such as: [4, 2, 35, 10, 17, 3, 6, 8] we can compute the mean, median, variance, and standard deviation as follows:
irb(main):052:0> require 'RSRuby' irb(main):053:0> r = RSRuby.instance irb(main):054:0> a = [4, 2, 35, 10, 17, 3, 6, 8] irb(main):055:0> r.mean(a) => 10.625 irb(main):056:0> r.median(a) => 7.0 irb(main):057:0> r.var(a) => 119.982142857143 irb(main):058:0> r.sd(a) => 10.9536360564491
This is straightforward. The RSRuby bridge converts our Ruby array stored in the variable 'a' into an 'R' object, and calls the appropriate methods, as illustrated.
Often in modelling experiments we have a set of observed data and a set of predicted data, and we would like to compute the fit of the predicted data from the model to the observations. There are various tests to help us compute this.
The covariance is one measure of how two variables change together. The covariance is zero if the two sets of variables are unrelated; positive if the variables are related and change in the same direction; and negative if the variables are related and change in different directions. Covariances are computed easily in RSRuby, passing the data as two arrays:
irb(main):074:0> a = [1,2,3,4,5,6] => [1, 2, 3, 4, 5, 6] irb(main):075:0> b = [2,4,6,8,10,12] => [2, 4, 6, 8, 10, 12] irb(main):076:0> c = [6,5,4,3,2,1] => [6, 5, 4, 3, 2, 1] irb(main):077:0> d=[2,2,2,2,2,2] => [2, 2, 2, 2, 2, 2] irb(main):078:0> r.cov(a,b) => 7.0 irb(main):079:0> r.cov(a,c) => -3.5 irb(main):080:0> r.cov(a,d) => 0.0
Notice that 'b' changes in the same way as 'a', 'c' changes but in the opposite direction to 'a', and 'd' has no relation to 'a' whatsoever.
The covariance has a dimension, the dimensions of the variables it is computed over. By contrast, correlation is an independent measure, indicating the strength of any linear relation between the two variables. Correlations produce values in the range [-1, 1], and so are comparable between experiments.
R provides a 'cor' function, to compute the correlation between two variables. Three forms of correlation may be computed: Pearson's, Kendall's tau, and Spearman's rho. As can be seen in the sample below, there are clear differences between them:
irb(main):082:0> r.cor([1,2,3],[2,1,4], :method => "pearson") => 0.654653670707977 irb(main):083:0> r.cor([1,2,3],[2,1,4], :method => "spearman") => 0.5 irb(main):084:0> r.cor([1,2,3],[2,1,4], :method => "kendall") => 0.333333333333333
Note the way these functions are called in RSRuby. The part `:method =>
"pearson"` uses Ruby's notation for passing keyword arguments into a method to
provide values for optional named arguments in the R syntax. For instance, the
equivalent of r.cor([1,2,3],[2,1,4], :method => "kendall")
in R is:
> cor(c(1,2,3), c(2,1,4), method="kendall") [1] 0.3333333