General


This package contains fast C++ implementation of online clustering algorithm Growing Neural Gas wrapped as R package using Rcpp. It produces topological graph, that you can easily convert to igraph, or you can dump your model to optimized binary file and load it later on.

This algorithms is widely used for dynamic clustering problem. Package is designed with focus on big datasets. It is already possible to cluster dataset without making its copy, with different dataset types (bagging, sequential or probability sampling). In the near future it will be possible to stream dataset from csv file.

Comparision of performance of original implementation and with improvements suggested by D. Fiser, J. Faigl, M. Kulich

Daniel Fiser, Jan Faigl, Miroslav Kulich optimization paper Growing neural gas efficiently

Examples


MNIST dataset clustering (raw images)

Usage

For more detailed usage see code in demo folder, or in tests/testthat folder. You can also refer to R package documentation (pdf version gng.pdf).

Cluster wine dataset

In this example we will construct a clustering of UCI wine dataset using offline GNG.

library("GrowingNeuralGas")

# Load data
data(wine, package="rattle")
scaled_wine <- scale(wine[-1])

# Train in an offline manner
gng <- GNG(scaled_wine, labels=wine$Type, max_nodes=20)

# Find closest node to vector [1,1,1]
predict(gng, c(1,1,1))

# Find mean error
meanError(gng)

# Plot with first 2 coordinates as position
plot(gng, mode=gng.plot.2d.errors, vertex.color=gng.plot.color.cluster,
     layout=gng.plot.layout.v2d)

Reconstruction of the Buddha figure from Standford Repositories

List of functions

This is not a full documentation. Please refer to R package documentation (pdf version gng.pdf ).

  • GNG(…) - constructor for GNG object. Parameters:

    • beta - Decrease the error variables of all node nodes by this fraction. Forgetting rate. Default 0.99

    • alpha - Alpha coefficient. Decrease the error variables of the nodes neighboring to the newly inserted node by this fraction. Default 0.5

    • lambda - how often to add new vertices. Default is 200

    • max.node - Maximum number of nodes (after reaching this size it will continue running, but won’t add new nodes)

    • eps.n - Default 0.05

    • eps.v - Default 0.0006

    • max.edge.age - Maximum age of edge, after which it is deleted. Decrease if your graph is not following changes of the dataset.

    • k - Utility parameter. Defaults to NULL. Controls speed of removal of obsolete nodes, a common value is 1.3.

  • OptimizedGNG(…) - constructor for simplified and optimized (linear) GNG object. Parameters:

    • beta - Decrease the error variables of all node nodes by this fraction. Forgetting rate. Default 0.99

    • alpha - Alpha coefficient. Decrease the error variables of the nodes neighboring to the newly inserted node by this fraction. Default 0.5

    • lambda - how often to add new vertices. Default is 200

    • max.node - Maximum number of nodes (after reaching this size it will continue running, but won’t add new nodes)

    • eps.n - Default 0.05

    • eps.v - Default 0.0006

    • max.edge.age - Maximum age of edge, after which it is deleted. Decrease if your graph is not following changes of the dataset.

    • value.range - Defaults to c(0,1). All example features have to be in this range.

  • run(gng), pause(gng), terminate(gng) - execution control

  • node(gng, gng_index) - returns node given index

  • save(gng, filename) - save gng

  • meanError(gng) - mean error in the graph

  • numberNodes(gng) - returns number of nodes

  • errorStatistics(gng) - vector of errors every second

  • plot(gng, mode, layout, vertex.color start_s) - plots gng using one of the presets (gng.plot.rgl3d, gng.plot.2d, gng.plot.2derrors). If plotting erros you can specify second from it will plot the errors.

  • centroids(gng) - using igraph algorithm GNG will write centroids of found clusters (community centers)

  • convertToGraph(gng) - converts GNG to igraph

  • predict(gng, x) - return gng_index of the closest node in the graph to given example

  • insertExamples(gng, M) - inefficient adding examples to the graph