|
Safemotion Lib
|
Public Member Functions | |
| __init__ (self, params, lr=required, momentum=0, dampening=0, weight_decay=0, nesterov=False) | |
| __setstate__ (self, state) | |
| step (self, closure=None) | |
Implements stochastic gradient descent (optionally with momentum).
Nesterov momentum is based on the formula from
`On the importance of initialization and momentum in deep learning`__.
Args:
params (iterable): iterable of parameters to optimize or dicts defining
parameter groups
lr (float): learning rate
momentum (float, optional): momentum factor (default: 0)
weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
dampening (float, optional): dampening for momentum (default: 0)
nesterov (bool, optional): enables Nesterov momentum (default: False)
Example:
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> optimizer.zero_grad()
>>> loss_fn(model(input), target).backward()
>>> optimizer.step()
__ http://www.cs.toronto.edu/%7Ehinton/absps/momentum.pdf
.. note::
The implementation of SGD with Momentum/Nesterov subtly differs from
Sutskever et. al. and implementations in some other frameworks.
Considering the specific case of Momentum, the update can be written as
.. math::
\begin{aligned}
v_{t+1} & = \mu * v_{t} + g_{t+1}, \\
p_{t+1} & = p_{t} - \text{lr} * v_{t+1},
\end{aligned}
where :math:`p`, :math:`g`, :math:`v` and :math:`\mu` denote the
parameters, gradient, velocity, and momentum respectively.
This is in contrast to Sutskever et. al. and
other frameworks which employ an update of the form
.. math::
\begin{aligned}
v_{t+1} & = \mu * v_{t} + \text{lr} * g_{t+1}, \\
p_{t+1} & = p_{t} - v_{t+1}.
\end{aligned}
The Nesterov version is analogously modified.
| fastreid.solver.optim.sgd.SGD.__init__ | ( | self, | |
| params, | |||
| lr = required, | |||
| momentum = 0, | |||
| dampening = 0, | |||
| weight_decay = 0, | |||
| nesterov = False ) |
Definition at line 44 of file sgd.py.
| fastreid.solver.optim.sgd.SGD.__setstate__ | ( | self, | |
| state ) |
| fastreid.solver.optim.sgd.SGD.step | ( | self, | |
| closure = None ) |
Performs a single optimization step.
Arguments:
closure (callable, optional): A closure that reevaluates the model
and returns the loss.
Definition at line 65 of file sgd.py.