Newton’s method solves quadratic optimization problems in one step because the derivative of a parabola is a straight line, so every linear fit provides an exact approximation. To make things a little more interesting, this visualization combines a trigonometric cosine function in such a way that the global minimizer occurs at x = 0 regardless of the weight of the cosine term. The extra cosine term provides a convenient expression that creates nonlinear structure in the derivative, to better reveal some of the interesting properties of the algorithm.

The upper plot shows the objective function as a blue line with arbitrary units on the y-axis. Background colors indicate the percentage of failures to converge within 10 steps in the neighborhood of a given initial position. Green indicates low failure percentage, while dark red indicates high failure percentage. The lower plot shows the objective function’s derivative, a straight line combined with a trigonometric sine function.

Clicking either plot selects the position of the initial position and starts the algorithm running, providing a step-by-step visualization of each iteration. The local quadratic approximation at each position momentarily appear along with the minimum of the local quadratic approximation, before the animation moving the point kicks in.

The gray lines show where the “undamped” algorithm would move, and the red lines show where the “damped” algorithm moves. When damping = 1 (default), gray and red lines coincide; otherwise, the red line’s position will be proportionally scaled to reduce the standard step size by some fractional amount. Adjusting the top slider on the right affects the convergence parameters for the algorithm. The top-most slider sets the zero-tolerance for the magnitude of the derivative (i.e. norm of the gradient). If the gradient norm goes below this tolerance of zero, the algorithm converges.

The middle slider sets the damping coefficient for the step length. The damped Newton’s method is perhaps the simplest possible modification to the original method that can increase its radius of convergence for objective functions having nonlinear derivatives.

Moving the bottom slider changes the weight of the cosine term. Notice that with the slider all the way left, the cosine term disappears, and the algorithm never fails to converge (all green), as expected for purely quadratic objectives.

If you want to read more, here’s the full post on LinkedIn.