|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.mahout.classifier.df.mapreduce.Builder
public abstract class Builder
Base class for Mapred DecisionForest builders. Takes care of storing the parameters common to the mapred
implementations.
The child classes must implement at least :
| Constructor Summary | |
|---|---|
protected |
Builder(TreeBuilder treeBuilder,
org.apache.hadoop.fs.Path dataPath,
org.apache.hadoop.fs.Path datasetPath,
Long seed,
org.apache.hadoop.conf.Configuration conf)
|
| Method Summary | |
|---|---|
DecisionForest |
build(int nbTrees)
|
protected abstract void |
configureJob(org.apache.hadoop.mapreduce.Job job)
Used by the inheriting classes to configure the job |
protected org.apache.hadoop.fs.Path |
getDataPath()
|
static org.apache.hadoop.fs.Path |
getDistributedCacheFile(org.apache.hadoop.conf.Configuration conf,
int index)
Helper method. |
static int |
getNbTrees(org.apache.hadoop.conf.Configuration conf)
Get the number of trees for the map-reduce job. |
static int |
getNumMaps(org.apache.hadoop.conf.Configuration conf)
Return the value of "mapred.map.tasks". |
protected org.apache.hadoop.fs.Path |
getOutputPath(org.apache.hadoop.conf.Configuration conf)
Output Directory name |
static Long |
getRandomSeed(org.apache.hadoop.conf.Configuration conf)
Returns the random seed |
static TreeBuilder |
getTreeBuilder(org.apache.hadoop.conf.Configuration conf)
|
protected static boolean |
isOutput(org.apache.hadoop.conf.Configuration conf)
Used only for DEBUG purposes. |
static Dataset |
loadDataset(org.apache.hadoop.conf.Configuration conf)
Helper method. |
protected abstract DecisionForest |
parseOutput(org.apache.hadoop.mapreduce.Job job)
Parse the output files to extract the trees and pass the predictions to the callback |
protected boolean |
runJob(org.apache.hadoop.mapreduce.Job job)
Sequential implementation should override this method to simulate the job execution |
static void |
setNbTrees(org.apache.hadoop.conf.Configuration conf,
int nbTrees)
Set the number of trees to grow for the map-reduce job |
void |
setOutputDirName(String name)
Sets the Output directory name, will be creating in the working directory |
static void |
sortSplits(org.apache.hadoop.mapreduce.InputSplit[] splits)
sort the splits into order based on size, so that the biggest go first. This is the same code used by Hadoop's JobClient. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
protected Builder(TreeBuilder treeBuilder,
org.apache.hadoop.fs.Path dataPath,
org.apache.hadoop.fs.Path datasetPath,
Long seed,
org.apache.hadoop.conf.Configuration conf)
| Method Detail |
|---|
protected org.apache.hadoop.fs.Path getDataPath()
public static int getNumMaps(org.apache.hadoop.conf.Configuration conf)
conf - configuration
protected static boolean isOutput(org.apache.hadoop.conf.Configuration conf)
conf - configuration
public static Long getRandomSeed(org.apache.hadoop.conf.Configuration conf)
conf - configuration
public static TreeBuilder getTreeBuilder(org.apache.hadoop.conf.Configuration conf)
public static int getNbTrees(org.apache.hadoop.conf.Configuration conf)
conf - configuration
public static void setNbTrees(org.apache.hadoop.conf.Configuration conf,
int nbTrees)
conf - configurationnbTrees - number of trees to build
IllegalArgumentException - if (nbTrees <= 0)public void setOutputDirName(String name)
name - output dir. name
protected org.apache.hadoop.fs.Path getOutputPath(org.apache.hadoop.conf.Configuration conf)
throws IOException
conf - configuration
IOException - if we cannot get the default FileSystem
public static org.apache.hadoop.fs.Path getDistributedCacheFile(org.apache.hadoop.conf.Configuration conf,
int index)
throws IOException
conf - configurationindex - index of the path in the DistributedCache files
IOException - if no path is found
public static Dataset loadDataset(org.apache.hadoop.conf.Configuration conf)
throws IOException
conf - configuration
IOException - if we cannot retrieve the Dataset path from the DistributedCache, or the Dataset could not be
loaded
protected abstract void configureJob(org.apache.hadoop.mapreduce.Job job)
throws IOException
job - Hadoop's Job
IOException - if anything goes wrong while configuring the job
protected boolean runJob(org.apache.hadoop.mapreduce.Job job)
throws ClassNotFoundException,
IOException,
InterruptedException
job - Hadoop's job
ClassNotFoundException
IOException
InterruptedException
protected abstract DecisionForest parseOutput(org.apache.hadoop.mapreduce.Job job)
throws IOException
job - Hadoop's job
IOException - if anything goes wrong while parsing the output
public DecisionForest build(int nbTrees)
throws IOException,
ClassNotFoundException,
InterruptedException
IOException
ClassNotFoundException
InterruptedExceptionpublic static void sortSplits(org.apache.hadoop.mapreduce.InputSplit[] splits)
splits - input splits
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||