Some examples/links/resources for doing machine learning or data mining in Scheme.

Contents:

1. Adaptive-Resonance Theory: Scheme Implementation

Adaptive Resonance Theory is a neural-network theory developed by Gail Carpenter and Stephen Grossberg. Back in 1995, when I was still a student, I created an implementation of ART1, ART2 and ARTMAP in C, along with a set of experiments.

This version is in Scheme, and there are versions for R5RS, R7RS and Chez Scheme.

The implementation is contained in the single library file "(lib adaptive-resonance-theory)", whose exported symbols are documented below. Some examples are provided, demonstrating how to build and test the nets.

The R7RS version has been tested with Gauche.

> gosh -I. .\examples\art-1-example.sps
ART-1 information:
 - 3 inputs
 - 2 max categories
 - 0.6 rho
 - 0.5 beta
Number of patterns: 3
Committed categories:
Category 0: 1 0 1
Category 1: 0 1 1

Optionally run the tests:

> gosh -I. run-tests.sps

The "mushrooms" example is more extensive, and requires r7rs-libs - see https://peterlane.codeberg.page/r7rs-libs - installed for R7RS Scheme.

1.1. API

[procedure] make-art1-net
Definition

(make-art1-net num-inputs max-categories max-input-bits rho beta [do-force])

Description
  • num-inputs - the number of features in an input instance

  • max-categories - the maximum number of categories permitted in memory

  • max-input-bits - the maximum number of set bits that may occur in the input

  • rho - similarity measure for matching instances to categories

  • beta - regularisation measure

  • do-force - optional parameter, if #t will set the initial bottom-up weights to a small value

Creates an instance of an ART 1 net using the given parameters.

[procedure] make-art2a-net
Definition

(make-art2a-net num-inputs max-categories rho beta theta)

Description
  • num-inputs - the number of features in an input instance

  • max-categories - the maximum number of categories permitted in memory

  • rho - similarity measure for matching instances to categories

  • beta - regularisation measure

  • theta - controls the smallest feature value which will be used

Creates an instance of an ART 2a net using the given parameters.

[procedure] make-artmap-net
Definition

(make-artmap-net art-a art-b)

Description
  • art-a - an instance of either an ART 1 or ART 2a net

  • art-b - an instance of either an ART 1 or ART 2a net

Creates an instance of an ARTMAP net using the two given nets as the input and output layers, respectively.

[procedure] net-build
Definition

(net-build net data-set)

Description
  • net - an ART net instance

  • data-set - a list of data instances

For an ART 1 or ART 2a net, the data-set should be a list of vectors with length the same as the number of inputs specified when creating the net. For an ARTMAP net, the data-set should be a list of dotted pairs, each pair containing two vectors representing the input and output.

Error

If net is not a recognised net type.

[procedure] net-display
Definition

(net-display net)

Description
  • net - an ART net instance

Displays basic information about the net, such as its parameter values.

Error

If net is not a recognised net type.

[procedure] net-display-categories
Definition

(net-display-categories net [layout])

Description
  • net - an ART net instance

  • layout - optional parameter which inserts a newline after every layout elements

Displays the committed categories in the net.

Error

If net is not a recognised net type.

[procedure] classify-instance
Definition

(classify-instance net instance)

Description
  • net - an ARTMAP network

  • instance - an instance, as a vector

Matches the given instance to an input category, and returns the definition in any linked output category. Returns #f if no output category found.

2. JVM

It’s almost cheating to use Kawa and all the standard JVM libraries, but this is the easiest way to use Scheme for machine learning.

2.1. Examples

Some small examples, from my Scheme notes:

Shows how to use Apache Commons libraries to parse a CSV file, analyse data, and apply k-NN clustering.

Shows how to use the Weka library to load, pre-process, and classify text documents.

Apache Commons

Various libraries, with some support for machine-learning / data-analysis.

Weka

A collection of machine learning algorithms for data mining tasks.

3. Wren

A pure R6RS-Scheme machine-learning/data-mining library.

Tested against:

Library Name Description

(wren adaptive-resonance-theory)

Implementation of some simple Adaptive Resonance Theory neural networks.

(wren confusion-matrix)

Supports incremental construction of a confusion matrix, and calculation of common quantitative statistics.

(wren csv)

Read/write csv files, according to RFC4180.

(wren k-means)

Implementation of k-means++ clustering algorithm.

(wren random-numbers)

Wrapper around implementation-specific random-number generators.

(wren simulated-annealing)

Stochastic optimisation algorithm.

(wren statistics)

Descriptive statistical functions.

(wren utils)

General utility functions for testing, matrices, etc.

4. MIT Licence

Copyright (c) 2022-23, Peter Lane

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.