Frank-Wolfe and other projection-free algorithms¶
Frank-Wolfe¶
The Frank-Wolfe (FW) or conditional gradient algorithm [J2003], [P2018], [PANJ2018] is a method for constrained optimization. It can solve problems of the form
where \(f\) is a differentiable function for which we have access to its gradient and \(\mathcal{D}\) is a compact set for which we have access to its linear minimization oracle (lmo). This is a routine that given a vector \(\bs{u}\) returns a solution to
Contrary to other constrained optimization algorithms like projected gradient descent, the Frank-Wolfe algorithm does not require access to a projection, hence why it is sometimes referred to as a projection-free algorithm. It instead relies exclusively on the linear minimization oracle described above.
The Frank-Wolfe algorithm is implemented in this library in the method copt.minimize_frank_wolfe()
. As most other methods it takes as argument an objective function to minimize, but unlike most other methods, it also requires to
Warning
incomplete sentence above
At each iteration, the Frank-Wolfe algorithm selects the vertex \(\boldsymbol{s} = \argmin_{\bs{z} \in D}\, \langle \nabla f(\bs{x}), \bs{z}\rangle\) using the linear minimization oracle.
Warning
TODO: describe the API of the LMO.
This is the vertex of the domain that correlates the most with the negative gradient. Then the next iterate \(\boldsymbol{x}^+\) is constructed as a convex combination of the current iterate \(\boldsymbol{x}\) and the newly acquired vertex \(\boldsymbol{s}\):
The step-size \(\gamma\) can be chosen by different strategies:
Inexact line-search. This is the default option and corresponds to the keyword argument
step_size="adaptive"
This is typically the fastest and simplest method, if unsure, use this option.Demyanov-Rubinov step-size. This is a step-size of the form
\[\gamma = \langle \nabla f(\bs{x}), \bs{s} - \bs{x}\rangle / (L \|\bs{s} - \bs{x}\|^2)~.\]This step-size typically performs well but has the drawback that it requires knowledge of the Lipschitz constant of \(\nabla f\). This step-size can be used with the keyword argument
step_size="DR"
. In this case the Lipschitz constant \(L\) needs to be specified through the keyword argumentlipschitz
. For example, if the lipschitz constant is 0.1, then the signature should includestep_size="DR", lipschitz=0.1
.Oblivious step-size. This is the very simple step-size of the form
\[\gamma = \frac{2}{t+2}~,\]where \(t\) is the number of iterations. This step-size is oblivious since it doesn’t use any previous information of the objective. It typically performs worst than the alternatives, but is simple to implement and can be competitive in the case in the case of noisy objectives.
Below is an illustration of the iterates generated by the Frank-Wolfe algorithkm on a toy 2-dimensional problem, in which the triangle is the domain \(\mathcal{D}\) and the level curves represent values of the objective function \(f\).

|
Frank-Wolfe algorithm. |
Pairwise Frank-Wolfe¶
As the Frank-Wolfe algorithm, the Pairwise Frank-Wolfe [LJ] solves problems of the form
where \(f\). is differentiable and the domain \(\mathcal{D}\) is a convex and compart set.
Although the algorithm is more broadly applicable, this library’s implementation, copt.minimize_pairwise_frank_wolfe()
, assumes that the domain \(\mathcal{D}\) is the \(\ell_1\) ball, that is, \(\mathcal{D} = \{x : \sum_i |x| \leq \alpha\}\), where \(\alpha\) is a user-defined parameter.
|
Pairwise FW on the L1 ball. |
References:
- J2003
Jaggi, Martin. “Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization.” ICML 2013.
- P2018
Pedregosa, Fabian “Notes on the Frank-Wolfe Algorithm”, 2018
- PANJ2018
Pedregosa, Fabian, Armin Askari, Geoffrey Negiar, and Martin Jaggi. “Step-Size Adaptivity in Projection-Free Optimization.” arXiv:1806.05123 (2018).
- LJ
Lacoste-Julien, Simon, and Martin Jaggi. “On the global linear convergence of Frank-Wolfe optimization variants.” Advances in Neural Information Processing Systems. 2015.