: Class SpectralClusterer

weka.clusterers
Class SpectralClusterer

weka.clusterers.SpectralClusterer

public class SpectralClusterer

Spectral clusterer class. For more information see:

Shi, J., and J. Malik (1997) "Normalized Cuts and Image Segmentation", in Proc. of IEEE Conf. on Comp. Vision and Pattern Recognition, Puerto Rico
Kannan, R., S. Vempala, and A. Vetta (2000) "On Clusterings - Good, Bad and Spectral", Tech. Report, CS Dept., Yale University.
Ng, A. Y., M. I. Jordan, and Y. Weiss (2001) "On Spectral Clustering: Analysis and an algorithm", in Advances in Neural Information Processing Systems 14.

This implementation assumes that the data instances are points in an Euclidean space. The algorithm is based on the concept of similarity between points instead of distance. Given two points x and y and their Euclidean distance d(x, y), their similarity is defined as exp(- d(x, y)^2 / (2 * sigma^2)), where sigma is a scaling factor (its default value is 1.0).

There is a distance cut factor r, if the distance d(x, y) between two points is greater than r then their similarity is set to 0. This parameter combined with the use of sparse matrices can improve the performances w.r.t. the memory.

To classify a new instance w.r.t. the partitions found this implementation applies a naive min-distance algorithm that assigns the instance to the cluster that contains the nearest point. Since the distance to similarity function is bijective and monotone the nearest point is also the most similar one.

Valid options are:

-A <0-1>
Specifies the alpha star factor. The algorithm stops the recursive partitioning when it does not find a cut that has a value below this factor.
Use this argument to limit the number of clusters.
-S <positive number>
Specifies the value of the scaling factor sigma.
-R <-1 or a positive number>
Specifies the distance cut factor. -1 is equivalent to the positive infinity.
-M
Requires the use of sparse representation of similarity matrices.

This implementation relies on the COLT numeric package for Java written by Wolfgang Hoschek. For other information about COLT see its home page at http://nicewww.cern.ch/~hoschek/colt/index.htm.

According to the license, the copyright notice is reported below:

   Written by Wolfgang Hoschek. Check the Colt home page for more info.
   Copyright © 1999 CERN - European Organization for Nuclear Research.
   Permission to use, copy, modify, distribute and sell this software and
   its documentation for any purpose is hereby granted without fee,
   provided that the above copyright notice appear in all copies and that
   both that copyright notice and this permission notice appear in
   supporting documentation. CERN makes no representations about
   the suitability of this software for any purpose. It is provided "as is"
   without expressed or implied warranty.

This software is issued under GNU General Public License.

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

Author:: Luigi Dragone (luigi@luigidragone.com)
See Also:: COLT

Field Summary

protected double alpha_star
          The alpha star parameter value

protected int[] cluster
          The class number of each point in the dataset

protected int numOfClusters
          The number of different clusters found

protected static java.util.Vector options


protected double r
          The distance cut factor

protected double sigma
          The sigma scaling factor

protected boolean useSparseMatrix
          The using sparse matrix flag

protected cern.colt.matrix.DoubleMatrix2D v
          The points of the dataset

Constructor Summary

SpectralClusterer()
          Constructor.

Method Summary

java.lang.String alphaStarTipText()
          Returns the tip text for this property

protected static double asso(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b)
          Computes the association degree between two partitions of a graph.
The association degree is defined as the sum of the weights of all the edges between points of the two partitions.

protected static int[][] bestCut(cern.colt.matrix.DoubleMatrix2D W)
          Returns the best cut of a graph w.r.t. the degree of dissimilarity between points of different partitions and the degree of similarity between points of the same partition.

void buildClusterer(weka.clusterers.Instances data)
          Generates a clusterer by the mean of spectral clustering algorithm.

int clusterInstance(weka.clusterers.Instance instance)
          Classifies an instance w.r.t. the partitions found.

protected static double distnorm2(cern.colt.matrix.DoubleMatrix1D x, cern.colt.matrix.DoubleMatrix1D y)
          Returns the Euclidean distance between two points.

double getAlphaStar()
          Returns the current value of the alpha star factor.

java.lang.String[] getOptions()
          Gets the current settings of the options.

double getR()
          Returns the current value of the r distance cur parameter.

double getSigma()
          Returns the current value of the sigma factor.

boolean getUseSparseMatrix()
          Returns the status of using sparse matrix flag.

java.lang.String globalInfo()
          Returns a string describing this clusterer

java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.

protected static int[] merge(int[] a, int[] b)
          Merges two sets of points represented as integer vectors.

protected static double Nasso(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b)
          Returns the normalized association degree between two partitions of a graph.

protected static double Nasso(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b, int[] v)
          Returns the normalized association degree between two partitions of a graph w.r.t. a given subgraph.

protected static double Ncut(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b)
          Returns the normalized dissimilarity degree (or cut) between two partitions of a graph.

protected static double Ncut(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b, int[] v)
          Returns the normalized dissimilarity degree (or cut) between two partitions of a graph w.r.t. a given subgraph.

int numberOfClusters()
          Returns the number of clusters found.

protected static int[][] partition(cern.colt.matrix.DoubleMatrix2D W, double alpha_star)
          Splits recursively the points of the graph while the value of the best cut found is less of a specified limit (the alpha star factor).

java.lang.String rTipText()
          Returns the tip text for this property.

void setAlphaStar(double alpha_star)
          Sets the new value of the alpha star factor.

void setOptions(java.lang.String[] options)
          Parses a given list of options.

void setR(double r)
          Sets the new value of the r distance cut parameter.

void setSigma(double sigma)
          Sets the new value of the sigma scaling factor.

void setUseSparseMatrix(boolean useSparseMatrix)
          Sets the use of sparse representation for similarity matrix.

java.lang.String sigmaTipText()
          Returns the tip text for this property.

java.lang.String useSparseMatrixTipText()
          Returns the tip text for this property.

Field Detail

v

protected cern.colt.matrix.DoubleMatrix2D v

The points of the dataset

cluster

protected int[] cluster

The class number of each point in the dataset

numOfClusters

protected int numOfClusters

The number of different clusters found

alpha_star

protected double alpha_star

The alpha star parameter value

r

protected double r

The distance cut factor

sigma

protected double sigma

The sigma scaling factor

useSparseMatrix

protected boolean useSparseMatrix

The using sparse matrix flag

options

protected static java.util.Vector options

Constructor Detail

SpectralClusterer

public SpectralClusterer()

Constructor.

Method Detail

distnorm2

protected static double distnorm2(cern.colt.matrix.DoubleMatrix1D x,
                                  cern.colt.matrix.DoubleMatrix1D y)

Returns the Euclidean distance between two points. It is used to compute the similarity degree of these ones.

Parameters:: x - the first point; y - the second point
Returns:: the Euclidean distance between the points

merge

protected static int[] merge(int[] a,
                             int[] b)

Merges two sets of points represented as integer vectors. The sets are not overlapped.

Parameters:: a - the first set of points; b - the second set of points
Returns:: the union of the two sets

asso

protected static double asso(cern.colt.matrix.DoubleMatrix2D W,
                             int[] a,
                             int[] b)

Computes the association degree between two partitions of a graph.
The association degree is defined as the sum of the weights of all the edges between points of the two partitions.

Parameters:: W - the weight matrix of the graph; a - the points of the first partition; b - the points of the second partition
Returns:: the association degree

Nasso

protected static double Nasso(cern.colt.matrix.DoubleMatrix2D W,
                              int[] a,
                              int[] b)

Returns the normalized association degree between two partitions of a graph.

Parameters:: W - the weight matrix of the graph; a - the points of the first partition; b - the points of the second partition
Returns:: the normalized association degree

Nasso

protected static double Nasso(cern.colt.matrix.DoubleMatrix2D W,
                              int[] a,
                              int[] b,
                              int[] v)

Returns the normalized association degree between two partitions of a graph w.r.t. a given subgraph.

Parameters:: W - the weight matrix of the graph; a - the points of the first partition; b - the points of the second partition; v - the points of the subgraph
Returns:: the normalized association degree

Ncut

protected static double Ncut(cern.colt.matrix.DoubleMatrix2D W,
                             int[] a,
                             int[] b)

Returns the normalized dissimilarity degree (or cut) between two partitions of a graph.

Parameters:: W - the weight matrix of the graph; a - the points of the first partition; b - the points of the second partition
Returns:: the normalized cut

Ncut

protected static double Ncut(cern.colt.matrix.DoubleMatrix2D W,
                             int[] a,
                             int[] b,
                             int[] v)

Returns the normalized dissimilarity degree (or cut) between two partitions of a graph w.r.t. a given subgraph.

Parameters:: W - the weight matrix of the graph; a - the points of the first partition; b - the points of the second partition; v - the points of the subgraph.
Returns:: the normalized cut

bestCut

protected static int[][] bestCut(cern.colt.matrix.DoubleMatrix2D W)

Returns the best cut of a graph w.r.t. the degree of dissimilarity between points of different partitions and the degree of similarity between points of the same partition.

Parameters:: W - the weight matrix of the graph
Returns:: an array of two elements, each of these contains the points of a partition

partition

protected static int[][] partition(cern.colt.matrix.DoubleMatrix2D W,
                                   double alpha_star)

Splits recursively the points of the graph while the value of the best cut found is less of a specified limit (the alpha star factor).

Parameters:: W - the weight matrix of the graph; alpha_star - the alpha star factor
Returns:: an array of sets of points (partitions)

numberOfClusters

public int numberOfClusters()
                     throws java.lang.Exception

Returns the number of clusters found.

Returns:: the number of clusters

clusterInstance

public int clusterInstance(weka.clusterers.Instance instance)
                    throws java.lang.Exception

Classifies an instance w.r.t. the partitions found. It applies a naive min-distance algorithm.

Parameters:: instance - the instance to classify
Returns:: the cluster that contains the nearest point to the instance

buildClusterer

public void buildClusterer(weka.clusterers.Instances data)
                    throws java.lang.Exception

Generates a clusterer by the mean of spectral clustering algorithm.

Parameters:: data - set of instances serving as training data
Throws:: java.lang.Exception - if the clusterer has not been generated successfully

globalInfo

public java.lang.String globalInfo()

Returns a string describing this clusterer

Returns:: a description of the evaluator suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()

Returns an enumeration describing the available options.

Returns:: an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception

Parses a given list of options.

Parameters:: options - the list of options as an array of strings
Throws:: java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()

Gets the current settings of the options.

Returns:: an array of strings suitable for passing to setOptions()

setAlphaStar

public void setAlphaStar(double alpha_star)
                  throws java.lang.Exception

Sets the new value of the alpha star factor.

Parameters:: alpah_star - the new value (0 < alpha_star < 1)
Throws:: java.lang.Exception - if alpha_star is not between 0 and 1

getAlphaStar

public double getAlphaStar()

Returns the current value of the alpha star factor.

Returns:: the alpha star factor

alphaStarTipText

public java.lang.String alphaStarTipText()

Returns the tip text for this property

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setSigma

public void setSigma(double sigma)
              throws java.lang.Exception

Sets the new value of the sigma scaling factor.

Parameters:: sigma - the new value (sigma > 0)
Throws:: java.lang.Exception - if sigma is not strictly greater than 0

getSigma

public double getSigma()

Returns the current value of the sigma factor.

Returns:: the sigma factor

sigmaTipText

public java.lang.String sigmaTipText()

Returns the tip text for this property.

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setR

public void setR(double r)
          throws java.lang.Exception

Sets the new value of the r distance cut parameter.

Parameters:: r - the new value (r > 0 || r == -1)
Throws:: java.lang.Exception - if r is not -1 and r is not a positive number

getR

public double getR()

Returns the current value of the r distance cur parameter.

Returns:: the r distance cut parameter

rTipText

public java.lang.String rTipText()

Returns the tip text for this property.

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

setUseSparseMatrix

public void setUseSparseMatrix(boolean useSparseMatrix)

Sets the use of sparse representation for similarity matrix.

Parameters:: useSparseMatrix - true for sparse matrix representation

getUseSparseMatrix

public boolean getUseSparseMatrix()

Returns the status of using sparse matrix flag.

Returns:: the status of using sparse matrix flag

useSparseMatrixTipText

public java.lang.String useSparseMatrixTipText()

Returns the tip text for this property.

Returns:: tip text for this property suitable for displaying in the explorer/experimenter gui

Field Summary
`protected double`	`alpha_star` The alpha star parameter value
`protected int[]`	`cluster` The class number of each point in the dataset
`protected int`	`numOfClusters` The number of different clusters found
`protected static java.util.Vector`	`options`
`protected double`	`r` The distance cut factor
`protected double`	`sigma` The sigma scaling factor
`protected boolean`	`useSparseMatrix` The using sparse matrix flag
`protected cern.colt.matrix.DoubleMatrix2D`	`v` The points of the dataset

Method Summary
`java.lang.String`	`alphaStarTipText()` Returns the tip text for this property
`protected static double`	`asso(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b)` Computes the association degree between two partitions of a graph. The association degree is defined as the sum of the weights of all the edges between points of the two partitions.
`protected static int[][]`	`bestCut(cern.colt.matrix.DoubleMatrix2D W)` Returns the best cut of a graph w.r.t. the degree of dissimilarity between points of different partitions and the degree of similarity between points of the same partition.
`void`	`buildClusterer(weka.clusterers.Instances data)` Generates a clusterer by the mean of spectral clustering algorithm.
`int`	`clusterInstance(weka.clusterers.Instance instance)` Classifies an instance w.r.t. the partitions found.
`protected static double`	`distnorm2(cern.colt.matrix.DoubleMatrix1D x, cern.colt.matrix.DoubleMatrix1D y)` Returns the Euclidean distance between two points.
`double`	`getAlphaStar()` Returns the current value of the alpha star factor.
`java.lang.String[]`	`getOptions()` Gets the current settings of the options.
`double`	`getR()` Returns the current value of the r distance cur parameter.
`double`	`getSigma()` Returns the current value of the sigma factor.
`boolean`	`getUseSparseMatrix()` Returns the status of using sparse matrix flag.
`java.lang.String`	`globalInfo()` Returns a string describing this clusterer
`java.util.Enumeration`	`listOptions()` Returns an enumeration describing the available options.
`protected static int[]`	`merge(int[] a, int[] b)` Merges two sets of points represented as integer vectors.
`protected static double`	`Nasso(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b)` Returns the normalized association degree between two partitions of a graph.
`protected static double`	`Nasso(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b, int[] v)` Returns the normalized association degree between two partitions of a graph w.r.t. a given subgraph.
`protected static double`	`Ncut(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b)` Returns the normalized dissimilarity degree (or cut) between two partitions of a graph.
`protected static double`	`Ncut(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b, int[] v)` Returns the normalized dissimilarity degree (or cut) between two partitions of a graph w.r.t. a given subgraph.
`int`	`numberOfClusters()` Returns the number of clusters found.
`protected static int[][]`	`partition(cern.colt.matrix.DoubleMatrix2D W, double alpha_star)` Splits recursively the points of the graph while the value of the best cut found is less of a specified limit (the alpha star factor).
`java.lang.String`	`rTipText()` Returns the tip text for this property.
`void`	`setAlphaStar(double alpha_star)` Sets the new value of the alpha star factor.
`void`	`setOptions(java.lang.String[] options)` Parses a given list of options.
`void`	`setR(double r)` Sets the new value of the r distance cut parameter.
`void`	`setSigma(double sigma)` Sets the new value of the sigma scaling factor.
`void`	`setUseSparseMatrix(boolean useSparseMatrix)` Sets the use of sparse representation for similarity matrix.
`java.lang.String`	`sigmaTipText()` Returns the tip text for this property.
`java.lang.String`	`useSparseMatrixTipText()` Returns the tip text for this property.

weka.clusterers Class SpectralClusterer

v

cluster

numOfClusters

alpha_star

r

sigma

useSparseMatrix

options

SpectralClusterer

distnorm2

merge

asso

Nasso

Nasso

Ncut

Ncut

bestCut

partition

numberOfClusters

clusterInstance

buildClusterer

globalInfo

listOptions

setOptions

getOptions

setAlphaStar

getAlphaStar

alphaStarTipText

setSigma

getSigma

sigmaTipText

setR

getR

rTipText

setUseSparseMatrix

getUseSparseMatrix

useSparseMatrixTipText

weka.clusterers
Class SpectralClusterer