Individuals and organizations are often faced with making tradeoffs between different factors in order to achieve desirable outcomes. Choosing these tradeoffs in the "best" way is the essence of the optimization problem. Mathematical algorithms for search and optimization play a large role in finding the best options in many problems in engineering, business, medicine, and the natural and social sciences. Such algorithms start with an initial "guess" at a solution, and this estimated solution is updated on an iteration-by-iteration basis with the aim of improving the performance measure (objective function). In most practical problems, the solution depends on more than one quantity, leading to the multivariate optimization problem of minimizing or maximizing an objective function dependent on multiple variables. See the introductory article on stochastic optimization for a general introduction to issues in optimization in a stochastic setting. There has recently been much interest in recursive optimization algorithms that rely on measurements of only the objective function to be optimized, not on direct measurements of the gradient (derivative) of the objective function. Such algorithms have the advantage of not requiring detailed modeling information describing the relationship between the parameters to be optimized and the objective function. For example, many systems involving human beings or computer simulations are difficult to treat analytically, and could potentially benefit from such an optimization approach. We now summarize an algorithm that is especially suited to such problems. SPSA Overview Top One optimization method that has attracted considerable international attention is the simultaneous perturbation stochastic approximation (SPSA) method. As motivated above—and like methods such as simulated annealing or genetic algorithms—SPSA uses only objective function measurements. This contrasts with algorithms requiring direct measurements of the gradient of the objective function (which are often difficult or impossible to obtain). Further, SPSA is especially efficient in high-dimensional problems in terms of providing a good solution for a relatively small number of measurements of the objective function. The essential feature of SPSA, which provides its power and relative ease of use in difficult multivariate optimization problems, is the underlying gradient approximation that requires only two objective function measurements per iteration regardless of the dimension of the optimization problem. These two measurements are made by simultaneously varying in a "proper" random fashion all of the variables in the problem (the "simultaneous perturbation"). This contrasts with the classical ("finite-difference") method where the variables are varied one at a time. If the number of terms being optimized is p, then the finite-difference method takes 2p measurements of the objective function at each iteration (to form one gradient approximation) while SPSA takes only two measurements. A fundamental result on relative efficiency then follows: Under reasonably general conditions, SPSA and the standard finite-difference SA method achieve the same level of statistical accuracy for a given number of iterations even though SPSA uses p times fewer measurements of the objective function at each iteration (since each gradient approximation uses only 1/p the number of function measurements). This indicates that SPSA will converge to the optimal solution within a given level of accuracy with p times fewer measurements of the objective function than the standard method. An equivalent way of interpreting the statement above is the following: One properly generated simultaneous random change of all p variables in the problem contains as much information for optimization as a full set of p one-at-a-time changes of each variable.
Some of the general areas for application of SPSA include statistical parameter estimation, simulation-based optimization, pattern recognition, nonlinear regression, signal processing, neural network training, adaptive feedback control, and experimental design. Specific system applications
represented in the list of references include
SPSA has several features that make it attractive for many practical applications, such as the ones mentioned above.
The introductory article on SPSA and the list of references provide a more-detailed sense of the performance of SPSA and the type of problems for which SPSA is
appropriate. Top There are a large number of methods for numerical optimization in multivariate problems. Hence, a user with a challenging optimization problem faces the daunting task of determining which algorithm is appropriate for a given problem. This choice is made more difficult by the large amount of "hype" and dubious claims that are associated with some popular algorithms. An inappropriate approach may lead to a large waste of resources, both from the view of wasted efforts in implementation and from the view of the resulting suboptimal solution to the optimization problem of interest. No search algorithm is uniformly better than all other algorithms across all possible problems. It is clear, however, that some algorithms may work better than others on certain classes of problems as a consequence of being able to exploit the problem structure. For example, traditional nonlinear programming methods (e.g., constrained conjugate gradient) are well suited to deterministic optimization problems with exact knowledge of the gradient of the objective function. Methods involving a population of candidate solutions, such as genetic algorithms, may be useful for a broad search over the domain of the parameters being optimized and subsequent initialization of more powerful local search algorithms. (Simple random search may also be useful for such a "crude" search if it is not desirable or feasible to work with a population of solutions.) Standard backpropagation may be effective in certain neural network applications when it is possible to calculate the required gradient of the objective function. More generally, stochastic gradient methods (i.e., Robbins-Monro stochastic approximation) are effective if one has direct (unbiased) measurements of the gradient of the objective function. SPSA is generally used in nonlinear problems having many variables where the objective function gradient is difficult or impossible to obtain. ("Many" here is a relative concept. In some problems, this may represent five or ten variables; in other problems, this may represent thousands of variables or more.) As a stochastic approximation algorithm, SPSA may be rigorously applied when noisy measurements of the objective function are all that are available. (Noisy measurements can have a profound degrading effect on algorithms that are not designed to cope with noise. Noise may prevent convergence or may dramatically decrease efficiency in such algorithms.) There have also been many successful applications of SPSA in settings where perfect (noise-free) measurements of the loss function are available. In
summary, SPSA is a powerful method for optimization in challenging nonlinear problems. It has a strong theoretical foundation and is often more efficient and easier to implement than well-known methods such as simulated annealing or genetic algorithms. Many examples of the use of SPSA are given in the list of references.Key
words related to SPSA: Optimization; parameter estimation; stochastic approximation; root-finding; random directions; Kiefer-Wolfowitz stochastic approximation; Robbins-Monro stochastic approximation; simultaneous perturbation; adaptive estimation; stochastic gradient; stochastic programming; stochastic optimization. |