weka.clusterers.SpectralClusterer
Spectral clusterer class. For more information see:
This implementation assumes that the data instances are points in an Euclidean space. The algorithm is based on the concept of similarity between points instead of distance. Given two points x and y and their Euclidean distance d(x, y), their similarity is defined as exp(- d(x, y)^2 / (2 * sigma^2)), where sigma is a scaling factor (its default value is 1.0).
There is a distance cut factor r, if the distance d(x, y) between two points is greater than r then their similarity is set to 0. This parameter combined with the use of sparse matrices can improve the performances w.r.t. the memory.
To classify a new instance w.r.t. the partitions found this implementation applies a naive min-distance algorithm that assigns the instance to the cluster that contains the nearest point. Since the distance to similarity function is bijective and monotone the nearest point is also the most similar one.
Valid options are:
This implementation relies on the COLT numeric package for Java written by Wolfgang Hoschek. For other information about COLT see its home page at http://nicewww.cern.ch/~hoschek/colt/index.htm.
According to the license, the copyright notice is reported below:Written by Wolfgang Hoschek. Check the Colt home page for more info. Copyright © 1999 CERN - European Organization for Nuclear Research. Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation. CERN makes no representations about the suitability of this software for any purpose. It is provided "as is" without expressed or implied warranty.
This software is issued under GNU General Public License.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
Field Summary | |
protected double |
alpha_star
The alpha star parameter value |
protected int[] |
cluster
The class number of each point in the dataset |
protected int |
numOfClusters
The number of different clusters found |
protected static java.util.Vector |
options
|
protected double |
r
The distance cut factor |
protected double |
sigma
The sigma scaling factor |
protected boolean |
useSparseMatrix
The using sparse matrix flag |
protected cern.colt.matrix.DoubleMatrix2D |
v
The points of the dataset |
Constructor Summary | |
SpectralClusterer()
Constructor. |
Method Summary | |
java.lang.String |
alphaStarTipText()
Returns the tip text for this property |
protected static double |
asso(cern.colt.matrix.DoubleMatrix2D W,
int[] a,
int[] b)
Computes the association degree between two partitions of a graph. The association degree is defined as the sum of the weights of all the edges between points of the two partitions. |
protected static int[][] |
bestCut(cern.colt.matrix.DoubleMatrix2D W)
Returns the best cut of a graph w.r.t. the degree of dissimilarity between points of different partitions and the degree of similarity between points of the same partition. |
void |
buildClusterer(weka.clusterers.Instances data)
Generates a clusterer by the mean of spectral clustering algorithm. |
int |
clusterInstance(weka.clusterers.Instance instance)
Classifies an instance w.r.t. the partitions found. |
protected static double |
distnorm2(cern.colt.matrix.DoubleMatrix1D x,
cern.colt.matrix.DoubleMatrix1D y)
Returns the Euclidean distance between two points. |
double |
getAlphaStar()
Returns the current value of the alpha star factor. |
java.lang.String[] |
getOptions()
Gets the current settings of the options. |
double |
getR()
Returns the current value of the r distance cur parameter. |
double |
getSigma()
Returns the current value of the sigma factor. |
boolean |
getUseSparseMatrix()
Returns the status of using sparse matrix flag. |
java.lang.String |
globalInfo()
Returns a string describing this clusterer |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options. |
protected static int[] |
merge(int[] a,
int[] b)
Merges two sets of points represented as integer vectors. |
protected static double |
Nasso(cern.colt.matrix.DoubleMatrix2D W,
int[] a,
int[] b)
Returns the normalized association degree between two partitions of a graph. |
protected static double |
Nasso(cern.colt.matrix.DoubleMatrix2D W,
int[] a,
int[] b,
int[] v)
Returns the normalized association degree between two partitions of a graph w.r.t. a given subgraph. |
protected static double |
Ncut(cern.colt.matrix.DoubleMatrix2D W,
int[] a,
int[] b)
Returns the normalized dissimilarity degree (or cut) between two partitions of a graph. |
protected static double |
Ncut(cern.colt.matrix.DoubleMatrix2D W,
int[] a,
int[] b,
int[] v)
Returns the normalized dissimilarity degree (or cut) between two partitions of a graph w.r.t. a given subgraph. |
int |
numberOfClusters()
Returns the number of clusters found. |
protected static int[][] |
partition(cern.colt.matrix.DoubleMatrix2D W,
double alpha_star)
Splits recursively the points of the graph while the value of the best cut found is less of a specified limit (the alpha star factor). |
java.lang.String |
rTipText()
Returns the tip text for this property. |
void |
setAlphaStar(double alpha_star)
Sets the new value of the alpha star factor. |
void |
setOptions(java.lang.String[] options)
Parses a given list of options. |
void |
setR(double r)
Sets the new value of the r distance cut parameter. |
void |
setSigma(double sigma)
Sets the new value of the sigma scaling factor. |
void |
setUseSparseMatrix(boolean useSparseMatrix)
Sets the use of sparse representation for similarity matrix. |
java.lang.String |
sigmaTipText()
Returns the tip text for this property. |
java.lang.String |
useSparseMatrixTipText()
Returns the tip text for this property. |
Field Detail |
protected cern.colt.matrix.DoubleMatrix2D v
protected int[] cluster
protected int numOfClusters
protected double alpha_star
protected double r
protected double sigma
protected boolean useSparseMatrix
protected static java.util.Vector options
Constructor Detail |
public SpectralClusterer()
Method Detail |
protected static double distnorm2(cern.colt.matrix.DoubleMatrix1D x, cern.colt.matrix.DoubleMatrix1D y)
x
- the first pointy
- the second pointprotected static int[] merge(int[] a, int[] b)
a
- the first set of pointsb
- the second set of pointsprotected static double asso(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b)
W
- the weight matrix of the grapha
- the points of the first partitionb
- the points of the second partitionprotected static double Nasso(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b)
W
- the weight matrix of the grapha
- the points of the first partitionb
- the points of the second partitionprotected static double Nasso(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b, int[] v)
W
- the weight matrix of the grapha
- the points of the first partitionb
- the points of the second partitionv
- the points of the subgraphprotected static double Ncut(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b)
W
- the weight matrix of the grapha
- the points of the first partitionb
- the points of the second partitionprotected static double Ncut(cern.colt.matrix.DoubleMatrix2D W, int[] a, int[] b, int[] v)
W
- the weight matrix of the grapha
- the points of the first partitionb
- the points of the second partitionv
- the points of the subgraph.protected static int[][] bestCut(cern.colt.matrix.DoubleMatrix2D W)
W
- the weight matrix of the graphprotected static int[][] partition(cern.colt.matrix.DoubleMatrix2D W, double alpha_star)
W
- the weight matrix of the graphalpha_star
- the alpha star factorpublic int numberOfClusters() throws java.lang.Exception
public int clusterInstance(weka.clusterers.Instance instance) throws java.lang.Exception
instance
- the instance to classifypublic void buildClusterer(weka.clusterers.Instances data) throws java.lang.Exception
data
- set of instances serving as training datajava.lang.Exception
- if the clusterer has not been generated successfullypublic java.lang.String globalInfo()
public java.util.Enumeration listOptions()
public void setOptions(java.lang.String[] options) throws java.lang.Exception
options
- the list of options as an array of stringsjava.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
public void setAlphaStar(double alpha_star) throws java.lang.Exception
alpah_star
- the new value (0 < alpha_star < 1)java.lang.Exception
- if alpha_star is not between 0 and 1public double getAlphaStar()
public java.lang.String alphaStarTipText()
public void setSigma(double sigma) throws java.lang.Exception
sigma
- the new value (sigma > 0)java.lang.Exception
- if sigma is not strictly greater than 0public double getSigma()
public java.lang.String sigmaTipText()
public void setR(double r) throws java.lang.Exception
r
- the new value (r > 0 || r == -1)java.lang.Exception
- if r is not -1 and r is not a positive numberpublic double getR()
public java.lang.String rTipText()
public void setUseSparseMatrix(boolean useSparseMatrix)
useSparseMatrix
- true for sparse matrix representationpublic boolean getUseSparseMatrix()
public java.lang.String useSparseMatrixTipText()