|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
See:
Description
| Interface Summary | |
|---|---|
| Cluster | Implementations of this interface have a printable representation and certain attributes that are common across all clustering implementations |
| GaussianAccumulator | |
| Model<O> | A model is a probability distribution over observed data points and allows the probability of any data point to be computed. |
| ModelDistribution<O> | A model distribution allows us to sample a model from its prior distribution. |
| Class Summary | |
|---|---|
| AbstractCluster | |
| ClusteringUtils | |
| OnlineGaussianAccumulator | An online Gaussian statistics accumulator based upon Knuth (who cites Welford) which is declared to be numerically-stable. |
| RunningSumsGaussianAccumulator | An online Gaussian accumulator that uses a running power sums approach as reported on http://en.wikipedia.org/wiki/Standard_deviation Suffers from overflow, underflow and roundoff error but has minimal observe-time overhead |
| UncommonDistributions | |
This package provides several clustering algorithm implementations. Clustering usually groups a set of objects into groups of similar items. The definition of similarity usually is up to you - for text documents, cosine-distance/-similarity is recommended. Mahout also features other types of distance measure like Euclidean distance. Input of each clustering algorithm is a set of vectors representing your items. For texts in general these are TFIDF or Bag of words representations of the documents.
Output of each clustering algorithm is either a hard or soft assignment of items to clusters.
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||