2009-05-13: Plotting Simple Graphs with RSRuby

R has a powerful set of plotting and graphics tools for visualising data. I shall look at some of the options for generating graphs, fitting a line to a graph, and plotting data that is stored in a text file. If you are familiar with R, the method used here may seem a bit odd, as I call R from Ruby code as a standalone program, generating results with minimal user input. Working with R is often more of an interactive process, where graphs are built up piece by piece, and shown on the screen. Instead, I shall immediately save graphs out to file, and then use an image viewer program.

Plotting Points on a Graph

The simplest graph is obtained by plotting data points and drawing lines between those points. For instance, the graph below was obtained from the list of x,y coordinates: (1,10) (2,5) (3,2) (4,5) (5,10) using a simple call to RSRuby.

(IMAGE MISSING)

The calling code goes through the following steps:

  1. Creates an instance @@r of the R interpreter.
  2. Instructs the interpreter to use the named PNG file for its graphical output.
  3. Plots the graph.
  4. Tells the interpreter that plotting is over, and the PNG file can be written.

The plot command can take a wide range of parameters. The basic ones are the list of x coordinates, and the list of y coordinates. The RSRuby bridge is kind to us here, as the lists are Ruby arrays. The points (1,10) (2,5) (3,2) (4,5) (5,10) are passed to the interpreter as the two arrays, first of the x-coordinates [1,2,3,4,5] and second of the y-coordinates [10,5,2,5,10]. The rest of the arguments use Ruby's special hashmap notation to give the effect of keyword arguments. The :xlab and :ylab keywords specify the labels for the x and y axes respectively. :type specifies how the points and line are plotted: "n" stands for show nothing, "p" for show the points only, "l" for draw lines between the points, "b" for plot both, "c" for draw lines and leave gaps for the points. :main sets the title of the graph, and :col lets you colour the points and lines plotted.

require 'rsruby'

@@r = RSRuby.instance

@@r.png("sample.png")
@@r.plot([1,2,3,4,5], [10,5,2,5,10], 
    :xlab => "x label", :ylab => "y label",
    :type => "b", :main => "Simple graph",
    :col => "blue")
@@r.eval_R("dev.off()")

Fitting a Line to Data

R provides the 'lm' function for computing linear models. This function appears very powerful, and relies on the caller to provide it with a formula, or linear model, to fit against. Consider the following data, which fall around the line y=x:

(1, 1.2) (1.9, 2.1) (3.1, 2.8) (3.8, 4.1) (4.7, 5.2)

In order to fit a linear model to these data, we place the x and y coordinates in variables called, perhaps, xs and ys. Then, in R, we call lm with the argument "xs ~ ys", to define the formula for the linear model. Below, I give a rather clunky way to do this, which involves assigning variables called xs and ys in the R interpreter - I hope to improve this in later posts. What is returned is a Ruby object defining the line of fit, and we can print out some summary information about the computed linear model. By supplying the intercept and y values of our fitted line to the 'abline' method, a line will be added to our graph giving the best fit.

@@r.png("sample2.png")
x = [1, 1.9, 3.1, 3.8, 4.7]
y = [1.2, 2.1, 2.8, 4.1, 5.2]
@@r.assign('x', x)
@@r.assign('y', y)
fit = @@r.lm('x ~ y')
@@r.plot(x, y)
@@r.abline(fit["coefficients"]["(Intercept)"], fit["coefficients"]["y"])
puts fit["coefficients"]
@@r.eval_R("dev.off()")
$ ruby sample-1.rb
{"(Intercept)"=>0.0875346260387804, "y"=>0.913138108428967}

(IMAGE MISSING)

Histograms from a Text File

To finish this post, I want to show how to plot a different kind of graph, using data from a text file. The text file I shall use is an example from UCI Irvine's Machine Learning Repository ( http://archive.ics.uci.edu/ml/ ), and I grabbed the 'Image Segmentation' dataset for illustration. The UCI data is usually stored as comma separated data, one instance per row, with some header information at the top. Let's plot a histogram of the values in the 11th column of data. (Don't ask why - I just want to see how to do it!)

First, we store the data in a file, call it "image.txt". Second, we open the file into Ruby, ignore the first five lines, which contain header information, and then begin working for real. We read each line, split it up using the comma separator character, and store the 11th item in that array as a floating point number. The code is:

data = []
file = File.new("image.txt", "r")   # open the file for reading
5.times {file.gets}                 # ignore the header
while line = file.gets              # read all remaining lines
  data << line.split(",")[10].to_f  # add 11th item to 'data' as a float
end
file.close                          # close the file

Finally, generating a histogram from this is fairly simple:

@@r.png("sample3.png")
@@r.hist(data, :xlab => "Some data", :main => "Histogram of some data")
@@r.eval_R("dev.off()")

(IMAGE MISSING)


Page from Peter's Scrapbook, output from a VimWiki on 2024-01-29.