General Unconstrained Minimization¶
Modeling¶
FirstOrderFunction¶
-
class
FirstOrderFunction¶ Instances of
FirstOrderFunctionimplement the evaluation of a function and its gradient.class FirstOrderFunction { public: virtual ~FirstOrderFunction() {} virtual bool Evaluate(const double* const parameters, double* cost, double* gradient) const = 0; virtual int NumParameters() const = 0; };
-
bool
FirstOrderFunction::Evaluate(const double *const parameters, double *cost, double *gradient) const¶ Evaluate the cost/value of the function. If
gradientis notNULLthen evaluate the gradient too. If evaluation is successful return,trueelse returnfalse.costguaranteed to be neverNULL,gradientcan beNULL.
-
int
FirstOrderFunction::NumParameters() const¶ Number of parameters in the domain of the function.
GradientProblem¶
-
class
GradientProblem¶
class GradientProblem {
public:
explicit GradientProblem(FirstOrderFunction* function);
GradientProblem(FirstOrderFunction* function,
LocalParameterization* parameterization);
int NumParameters() const;
int NumLocalParameters() const;
bool Evaluate(const double* parameters, double* cost, double* gradient) const;
bool Plus(const double* x, const double* delta, double* x_plus_delta) const;
};
Instances of GradientProblem represent general non-linear
optimization problems that must be solved using just the value of the
objective function and its gradient. Unlike the Problem
class, which can only be used to model non-linear least squares
problems, instances of GradientProblem not restricted in the
form of the objective function.
Structurally GradientProblem is a composition of a
FirstOrderFunction and optionally a
LocalParameterization.
The FirstOrderFunction is responsible for evaluating the cost
and gradient of the objective function.
The LocalParameterization is responsible for going back and
forth between the ambient space and the local tangent space. When a
LocalParameterization is not provided, then the tangent space
is assumed to coincide with the ambient Euclidean space that the
gradient vector lives in.
The constructor takes ownership of the FirstOrderFunction and
LocalParamterization objects passed to it.
-
void
Solve(const GradientProblemSolver::Options &options, const GradientProblem &problem, double *parameters, GradientProblemSolver::Summary *summary)¶ Solve the given
GradientProblemusing the values inparametersas the initial guess of the solution.
Solving¶
GradientProblemSolver::Options¶
-
class
GradientProblemSolver::Options¶ GradientProblemSolver::Optionscontrols the overall behavior of the solver. We list the various settings and their default values below.
-
bool
GradientProblemSolver::Options::IsValid(string *error) const¶ Validate the values in the options struct and returns true on success. If there is a problem, the method returns false with
errorcontaining a textual description of the cause.
-
LineSearchDirectionType
GradientProblemSolver::Options::line_search_direction_type¶ Default:
LBFGSChoices are
STEEPEST_DESCENT,NONLINEAR_CONJUGATE_GRADIENT,BFGSandLBFGS.
-
LineSearchType
GradientProblemSolver::Options::line_search_type¶ Default:
WOLFEChoices are
ARMIJOandWOLFE(strong Wolfe conditions). Note that in order for the assumptions underlying theBFGSandLBFGSline search direction algorithms to be guaranteed to be satisifed, theWOLFEline search should be used.
-
NonlinearConjugateGradientType
GradientProblemSolver::Options::nonlinear_conjugate_gradient_type¶ Default:
FLETCHER_REEVESChoices are
FLETCHER_REEVES,POLAK_RIBIEREandHESTENES_STIEFEL.
-
int
GradientProblemSolver::Options::max_lbfs_rank¶ Default: 20
The L-BFGS hessian approximation is a low rank approximation to the inverse of the Hessian matrix. The rank of the approximation determines (linearly) the space and time complexity of using the approximation. Higher the rank, the better is the quality of the approximation. The increase in quality is however is bounded for a number of reasons.
- The method only uses secant information and not actual derivatives.
- The Hessian approximation is constrained to be positive definite.
So increasing this rank to a large number will cost time and space complexity without the corresponding increase in solution quality. There are no hard and fast rules for choosing the maximum rank. The best choice usually requires some problem specific experimentation.
-
bool
GradientProblemSolver::Options::use_approximate_eigenvalue_bfgs_scaling¶ Default:
falseAs part of the
BFGSupdate step /LBFGSright-multiply step, the initial inverse Hessian approximation is taken to be the Identity. However, [Oren] showed that using instead \(I * \gamma\), where \(\gamma\) is a scalar chosen to approximate an eigenvalue of the true inverse Hessian can result in improved convergence in a wide variety of cases. Settinguse_approximate_eigenvalue_bfgs_scalingto true enables this scaling inBFGS(before first iteration) andLBFGS(at each iteration).Precisely, approximate eigenvalue scaling equates to
\[\gamma = \frac{y_k' s_k}{y_k' y_k}\]With:
\[y_k = \nabla f_{k+1} - \nabla f_k\]\[s_k = x_{k+1} - x_k\]Where \(f()\) is the line search objective and \(x\) the vector of parameter values [NocedalWright].
It is important to note that approximate eigenvalue scaling does not always improve convergence, and that it can in fact significantly degrade performance for certain classes of problem, which is why it is disabled by default. In particular it can degrade performance when the sensitivity of the problem to different parameters varies significantly, as in this case a single scalar factor fails to capture this variation and detrimentally downscales parts of the Jacobian approximation which correspond to low-sensitivity parameters. It can also reduce the robustness of the solution to errors in the Jacobians.
-
LineSearchIterpolationType
GradientProblemSolver::Options::line_search_interpolation_type¶ Default:
CUBICDegree of the polynomial used to approximate the objective function. Valid values are
BISECTION,QUADRATICandCUBIC.
-
double
GradientProblemSolver::Options::min_line_search_step_size¶ The line search terminates if:
\[\|\Delta x_k\|_\infty < \text{min_line_search_step_size}\]where \(\|\cdot\|_\infty\) refers to the max norm, and \(\Delta x_k\) is the step change in the parameter values at the \(k\)-th iteration.
-
double
GradientProblemSolver::Options::line_search_sufficient_function_decrease¶ Default:
1e-4Solving the line search problem exactly is computationally prohibitive. Fortunately, line search based optimization algorithms can still guarantee convergence if instead of an exact solution, the line search algorithm returns a solution which decreases the value of the objective function sufficiently. More precisely, we are looking for a step size s.t.
\[f(\text{step_size}) \le f(0) + \text{sufficient_decrease} * [f'(0) * \text{step_size}]\]This condition is known as the Armijo condition.
-
double
GradientProblemSolver::Options::max_line_search_step_contraction¶ Default:
1e-3In each iteration of the line search,
\[\text{new_step_size} \geq \text{max_line_search_step_contraction} * \text{step_size}\]Note that by definition, for contraction:
\[0 < \text{max_step_contraction} < \text{min_step_contraction} < 1\]
-
double
GradientProblemSolver::Options::min_line_search_step_contraction¶ Default:
0.6In each iteration of the line search,
\[\text{new_step_size} \leq \text{min_line_search_step_contraction} * \text{step_size}\]Note that by definition, for contraction:
\[0 < \text{max_step_contraction} < \text{min_step_contraction} < 1\]
-
int
GradientProblemSolver::Options::max_num_line_search_step_size_iterations¶ Default:
20Maximum number of trial step size iterations during each line search, if a step size satisfying the search conditions cannot be found within this number of trials, the line search will stop.
As this is an ‘artificial’ constraint (one imposed by the user, not the underlying math), if
WOLFEline search is being used, and points satisfying the Armijo sufficient (function) decrease condition have been found during the current search (in \(\leq\)max_num_line_search_step_size_iterations). Then, the step size with the lowest function value which satisfies the Armijo condition will be returned as the new valid step, even though it does not satisfy the strong Wolfe conditions. This behaviour protects against early termination of the optimizer at a sub-optimal point.
-
int
GradientProblemSolver::Options::max_num_line_search_direction_restarts¶ Default:
5Maximum number of restarts of the line search direction algorithm before terminating the optimization. Restarts of the line search direction algorithm occur when the current algorithm fails to produce a new descent direction. This typically indicates a numerical failure, or a breakdown in the validity of the approximations used.
-
double
GradientProblemSolver::Options::line_search_sufficient_curvature_decrease¶ Default:
0.9The strong Wolfe conditions consist of the Armijo sufficient decrease condition, and an additional requirement that the step size be chosen s.t. the magnitude (‘strong’ Wolfe conditions) of the gradient along the search direction decreases sufficiently. Precisely, this second condition is that we seek a step size s.t.
\[\|f'(\text{step_size})\| \leq \text{sufficient_curvature_decrease} * \|f'(0)\|\]Where \(f()\) is the line search objective and \(f'()\) is the derivative of \(f\) with respect to the step size: \(\frac{d f}{d~\text{step size}}\).
-
double
GradientProblemSolver::Options::max_line_search_step_expansion¶ Default:
10.0During the bracketing phase of a Wolfe line search, the step size is increased until either a point satisfying the Wolfe conditions is found, or an upper bound for a bracket containing a point satisfying the conditions is found. Precisely, at each iteration of the expansion:
\[\text{new_step_size} \leq \text{max_step_expansion} * \text{step_size}\]By definition for expansion
\[\text{max_step_expansion} > 1.0\]
-
int
GradientProblemSolver::Options::max_num_iterations¶ Default:
50Maximum number of iterations for which the solver should run.
-
double
GradientProblemSolver::Options::max_solver_time_in_seconds¶ Default:
1e6Maximum amount of time for which the solver should run.
-
double
GradientProblemSolver::Options::function_tolerance¶ Default:
1e-6Solver terminates if
\[\frac{|\Delta \text{cost}|}{\text{cost}} \leq \text{function_tolerance}\]where, \(\Delta \text{cost}\) is the change in objective function value (up or down) in the current iteration of the line search.
-
double
GradientProblemSolver::Options::gradient_tolerance¶ Default:
1e-10Solver terminates if
\[\|x - \Pi \boxplus(x, -g(x))\|_\infty \leq \text{gradient_tolerance}\]where \(\|\cdot\|_\infty\) refers to the max norm, \(\Pi\) is projection onto the bounds constraints and \(\boxplus\) is Plus operation for the overall local parameterization associated with the parameter vector.
-
double
GradientProblemSolver::Options::parameter_tolerance¶ Default:
1e-8Solver terminates if
\[\|\Delta x\| \leq (\|x\| + \text{parameter_tolerance}) * \text{parameter_tolerance}\]where \(\Delta x\) is the step computed by the linear solver in the current iteration of the line search.
-
LoggingType
GradientProblemSolver::Options::logging_type¶ Default:
PER_MINIMIZER_ITERATION
-
bool
GradientProblemSolver::Options::minimizer_progress_to_stdout¶ Default:
falseBy default the
Minimizerprogress is logged toSTDERRdepending on thevloglevel. If this flag is set to true, andGradientProblemSolver::Options::logging_typeis notSILENT, the logging output is sent toSTDOUT.The progress display looks like
0: f: 2.317806e+05 d: 0.00e+00 g: 3.19e-01 h: 0.00e+00 s: 0.00e+00 e: 0 it: 2.98e-02 tt: 8.50e-02 1: f: 2.312019e+05 d: 5.79e+02 g: 3.18e-01 h: 2.41e+01 s: 1.00e+00 e: 1 it: 4.54e-02 tt: 1.31e-01 2: f: 2.300462e+05 d: 1.16e+03 g: 3.17e-01 h: 4.90e+01 s: 2.54e-03 e: 1 it: 4.96e-02 tt: 1.81e-01
Here
fis the value of the objective function.dis the change in the value of the objective function if the step computed in this iteration is accepted.gis the max norm of the gradient.his the change in the parameter vector.sis the optimal step length computed by the line search.itis the time take by the current iteration.ttis the total time taken by the minimizer.
-
vector<IterationCallback>
GradientProblemSolver::Options::callbacks¶ Callbacks that are executed at the end of each iteration of the
Minimizer. They are executed in the order that they are specified in this vector. By default, parameter blocks are updated only at the end of the optimization, i.e., when theMinimizerterminates. This behavior is controlled byGradientProblemSolver::Options::update_state_every_variable. If the user wishes to have access to the update parameter blocks when his/her callbacks are executed, then setGradientProblemSolver::Options::update_state_every_iterationto true.The solver does NOT take ownership of these pointers.
-
bool
Solver::Options::update_state_every_iteration¶ Default:
falseNormally the parameter vector is only updated when the solver terminates. Setting this to true updates it every iteration. This setting is useful when building an interactive application using Ceres and using an
IterationCallback.
GradientProblemSolver::Summary¶
-
class
GradientProblemSolver::Summary¶ Summary of the various stages of the solver after termination.
-
string
GradientProblemSolver::Summary::BriefReport() const¶ A brief one line description of the state of the solver after termination.
-
string
GradientProblemSolver::Summary::FullReport() const¶ A full multiline description of the state of the solver after termination.
-
bool
GradientProblemSolver::Summary::IsSolutionUsable() const¶ Whether the solution returned by the optimization algorithm can be relied on to be numerically sane. This will be the case if GradientProblemSolver::Summary:termination_type is set to CONVERGENCE, USER_SUCCESS or NO_CONVERGENCE, i.e., either the solver converged by meeting one of the convergence tolerances or because the user indicated that it had converged or it ran to the maximum number of iterations or time.
-
TerminationType
GradientProblemSolver::Summary::termination_type¶ The cause of the minimizer terminating.
-
string
GradientProblemSolver::Summary::message¶ Reason why the solver terminated.
-
double
GradientProblemSolver::Summary::initial_cost¶ Cost of the problem (value of the objective function) before the optimization.
-
double
GradientProblemSolver::Summary::final_cost¶ Cost of the problem (value of the objective function) after the optimization.
-
vector<IterationSummary>
GradientProblemSolver::Summary::iterations¶ IterationSummaryfor each minimizer iteration in order.
-
int
num_cost_evaluations¶ Number of times the cost (and not the gradient) was evaluated.
-
int
num_gradient_evaluations¶ Number of times the gradient (and the cost) were evaluated.
-
double
GradientProblemSolver::Summary::total_time_in_seconds¶ Time (in seconds) spent in the solver.
-
double
GradientProblemSolver::Summary::cost_evaluation_time_in_seconds¶ Time (in seconds) spent evaluating the cost vector.
-
double
GradientProblemSolver::Summary::gradient_evaluation_time_in_seconds¶ Time (in seconds) spent evaluating the gradient vector.
-
int
GradientProblemSolver::Summary::num_parameters¶ Number of parameters in the problem.
-
int
GradientProblemSolver::Summary::num_local_parameters¶ Dimension of the tangent space of the problem. This is different from
GradientProblemSolver::Summary::num_parametersif aLocalParameterizationobject is used.
-
LineSearchDirectionType
GradientProblemSolver::Summary::line_search_direction_type¶ Type of line search direction used.
-
LineSearchType
GradientProblemSolver::Summary::line_search_type¶ Type of the line search algorithm used.
-
LineSearchInterpolationType
GradientProblemSolver::Summary::line_search_interpolation_type¶ When performing line search, the degree of the polynomial used to approximate the objective function.
-
NonlinearConjugateGradientType
GradientProblemSolver::Summary::nonlinear_conjugate_gradient_type¶ If the line search direction is NONLINEAR_CONJUGATE_GRADIENT, then this indicates the particular variant of non-linear conjugate gradient used.
-
int
GradientProblemSolver::Summary::max_lbfgs_rank¶ If the type of the line search direction is LBFGS, then this indicates the rank of the Hessian approximation.