public class GISTrainer extends AbstractEventTrainer
ftp://ftp.cis.upenn.edu/pub/ircs/tr/97-08.ps.Z.
The slack parameter used in the above implementation has been removed by default
from the computation and a method for updating with Gaussian smoothing has been
added per Investigating GIS and Smoothing for Maximum Entropy Taggers, Clark and Curran (2002).
http://acl.ldc.upenn.edu/E/E03/E03-1071.pdf
The slack parameter can be used by setting useSlackParameter to true.
Gaussian smoothing can be used by setting useGaussianSmoothing to true.
A prior can be used to train models which converge to the distribution which minimizes the relative entropy between the distribution specified by the empirical constraints of the training data and the specified prior. By default, the uniform distribution is used as the prior.
| Modifier and Type | Field and Description |
|---|---|
static double |
LOG_LIKELIHOOD_THRESHOLD_DEFAULT |
static java.lang.String |
LOG_LIKELIHOOD_THRESHOLD_PARAM |
static java.lang.String |
MAXENT_VALUE |
static java.lang.String |
OLD_LL_THRESHOLD_PARAM
Deprecated.
|
DATA_INDEXER_ONE_PASS_REAL_VALUE, DATA_INDEXER_ONE_PASS_VALUE, DATA_INDEXER_PARAM, DATA_INDEXER_TWO_PASS_VALUEALGORITHM_PARAM, CUTOFF_DEFAULT, CUTOFF_PARAM, ITERATIONS_DEFAULT, ITERATIONS_PARAM, TRAINER_TYPE_PARAM, VERBOSE_DEFAULT, VERBOSE_PARAMEVENT_VALUE| Constructor and Description |
|---|
GISTrainer()
Creates a new
GISTrainer instance which does not print
progress messages about training to STDOUT. |
| Modifier and Type | Method and Description |
|---|---|
MaxentModel |
doTrain(DataIndexer indexer) |
void |
init(TrainingParameters trainingParameters,
java.util.Map<java.lang.String,java.lang.String> reportMap) |
boolean |
isSortAndMerge() |
void |
setGaussianSigma(double sigmaValue)
Sets whether this trainer will use smoothing while training the model.
|
void |
setSmoothing(boolean smooth)
Sets whether this trainer will use smoothing while training the model.
|
void |
setSmoothingObservation(double timesSeen)
Sets whether this trainer will use smoothing while training the model.
|
GISModel |
trainModel(int iterations,
DataIndexer di)
Train a model using the GIS algorithm.
|
GISModel |
trainModel(int iterations,
DataIndexer di,
int threads)
Train a model using the GIS algorithm.
|
GISModel |
trainModel(int iterations,
DataIndexer di,
Prior modelPrior,
int threads)
Train a model using the GIS algorithm.
|
GISModel |
trainModel(ObjectStream<Event> eventStream)
Train a model using the GIS algorithm, assuming 100 iterations and no
cutoff.
|
GISModel |
trainModel(ObjectStream<Event> eventStream,
int iterations,
int cutoff)
Trains a GIS model on the event in the specified event stream, using the specified number
of iterations and the specified count cutoff.
|
getDataIndexer, isValid, train, train, validategetAlgorithm, getCutoff, getIterations, initequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitinit@Deprecated public static final java.lang.String OLD_LL_THRESHOLD_PARAM
public static final java.lang.String LOG_LIKELIHOOD_THRESHOLD_PARAM
public static final double LOG_LIKELIHOOD_THRESHOLD_DEFAULT
public static final java.lang.String MAXENT_VALUE
public GISTrainer()
GISTrainer instance which does not print
progress messages about training to STDOUT.public boolean isSortAndMerge()
isSortAndMerge in class AbstractEventTrainerpublic void init(TrainingParameters trainingParameters, java.util.Map<java.lang.String,java.lang.String> reportMap)
init in interface EventTrainerinit in class AbstractTrainerpublic MaxentModel doTrain(DataIndexer indexer) throws java.io.IOException
doTrain in class AbstractEventTrainerjava.io.IOExceptionpublic void setSmoothing(boolean smooth)
smooth - true if smoothing is desired, false if notpublic void setSmoothingObservation(double timesSeen)
timesSeen - the "number" of times we want the trainer to imagine
it saw a feature that it actually didn't seepublic void setGaussianSigma(double sigmaValue)
public GISModel trainModel(ObjectStream<Event> eventStream) throws java.io.IOException
eventStream - The EventStream holding the data on which this model will be
trained.java.io.IOExceptionpublic GISModel trainModel(ObjectStream<Event> eventStream, int iterations, int cutoff) throws java.io.IOException
eventStream - A stream of all events.iterations - The number of iterations to use for GIS.cutoff - The number of times a feature must occur to be included.java.io.IOExceptionpublic GISModel trainModel(int iterations, DataIndexer di)
iterations - The number of GIS iterations to perform.di - The data indexer used to compress events in memory.public GISModel trainModel(int iterations, DataIndexer di, int threads)
iterations - The number of GIS iterations to perform.di - The data indexer used to compress events in memory.threads - public GISModel trainModel(int iterations, DataIndexer di, Prior modelPrior, int threads)
iterations - The number of GIS iterations to perform.di - The data indexer used to compress events in memory.modelPrior - The prior distribution used to train this model.Copyright © 2010 - 2023 Adobe. All Rights Reserved