BFGS(1)

  • 算法特征:
    利用函数$f(vec{x})$的1阶信息, 构造其近似的二阶Hessian矩阵. 结合Armijo Rule, 在最优化过程中达到超线性收敛的目的.
  • 算法推导:
    为书写方便, 引入如下两个符号$B$、$D$分别表示近似Hessian矩阵及其逆矩阵:
    egin{equation}label{eq_1}
    B approx H; quad D approx H^{-1}
    end{equation}
    注意, $B$与$D$均为对称矩阵.
    考虑第$k$步迭代, 将其与第$k-1$步迭代的优化变量及梯度之差分别记为$vec{s}_k$、$vec{y}_k$:
    egin{equation}label{eq_2}
    vec{s}_k = vec{x}_k - vec{x}_{k-1} \
    vec{y}_k = abla{f(vec{x}_k)} - abla{f(vec{x}_{k-1})}
    end{equation}
    引入割线条件(类似于微分中值定理):
    egin{equation}label{eq_3}
    vec{y}_k = B_k cdot vec{s}_k
    end{equation}
    因此有:
    egin{equation}label{eq_4}
    D_k cdot vec{y}_k = vec{s}_k
    end{equation}
    在相邻两步迭代过程中, 采用Frobenius范数衡量矩阵之变化. 为增强近似Hessian矩阵的稳定性, 考虑近似Hessian矩阵遵循保守变化, 即在满足约束条件的情况下, 相邻两步迭代过程中$B$或$D$的变化越小越好. 由此抽象出一个优化问题如下(以下忽略矢量符号):
    egin{equation}label{eq_5}
    minquadleft | D_k - D_{k-1} ight |_F^2\
    s.t.quad D_k cdot y_k = s_k\
    quadquad D_k is symmetric
    end{equation}
    其中, $D_k$为优化变量, 代表第$k$步迭代近似Hessian矩阵的逆矩阵. 该优化问题的最优解形式如下(笔者也不太确定, 大家有能力的可以验证一下):
    egin{equation}label{eq_6}
    D_k = (I - ho_ks_ky_k^T)D_{k-1}(I - ho_ky_ks_k^T) + ho_ks_ks_k^T
    end{equation}
    其中, $ ho_k = (y_k^Ts_k)^{-1}$.
  • 代码实现:
      1 # BFGS算法实现
      2 # 通过Armijo Rule确定迭代步长
      3 
      4 import numpy
      5 from matplotlib import pyplot as plt
      6 
      7 
      8 # 目标函数0阶信息
      9 def func(x1, x2):
     10     funcVal = 5 * x1 ** 2 + 2 * x2 ** 2 + 3 * x1 - 10 * x2 + 4
     11     return funcVal
     12 # 目标函数1阶信息
     13 def grad(x1, x2):
     14     gradVal = numpy.array([[10 * x1 + 3], [4 * x2 - 10]])
     15     return gradVal
     16     
     17 
     18 class BFGS(object):
     19     
     20     def __init__(self, seed=None, epsilon=1.e-6, maxIter=300):
     21         self.__seed = seed                         # 迭代起点
     22         self.__epsilon = epsilon                   # 计算精度
     23         self.__maxIter = maxIter                   # 最大迭代次数
     24         
     25         self.__xPath = list()                      # 记录优化变量之路径
     26         self.__fPath = list()                      # 记录目标函数值之路径
     27         
     28         
     29     def solve(self):
     30         self.__init_path()
     31         
     32         xCurr = self.__init_seed(self.__seed)
     33         fCurr = func(xCurr[0, 0], xCurr[1, 0])
     34         gCurr = grad(xCurr[0, 0], xCurr[1, 0])
     35         DCurr = self.__init_D(xCurr.shape[0])      # 矩阵D初始化 ~ 此处采用单位矩阵
     36         self.__save_path(xCurr, fCurr)
     37         
     38         for i in range(self.__maxIter):
     39             if self.__is_converged(gCurr):
     40                 self.__print_MSG()
     41                 break
     42             
     43             dCurr = -numpy.matmul(DCurr, gCurr)
     44             alpha = self.__calc_alpha_by_ArmijoRule(xCurr, fCurr, gCurr, dCurr)
     45             
     46             xNext = xCurr + alpha * dCurr
     47             fNext = func(xNext[0, 0], xNext[1, 0])
     48             gNext = grad(xNext[0, 0], xNext[1, 0])
     49             DNext = self.__update_D_by_BFGS(xCurr, gCurr, xNext, gNext, DCurr)
     50             
     51             xCurr, fCurr, gCurr, DCurr = xNext, fNext, gNext, DNext
     52             self.__save_path(xCurr, fCurr)
     53         else:
     54             if self.__is_converged(gCurr):
     55                 self.__print_MSG()
     56             else:
     57                 print("BFGS not converged after {} steps!".format(self.__maxIter))
     58                 
     59     
     60     def show(self):
     61         if not self.__xPath:
     62             self.solve()
     63             
     64         fig = plt.figure(figsize=(10, 4))
     65         ax1 = plt.subplot(1, 2, 1)
     66         ax2 = plt.subplot(1, 2, 2)
     67         
     68         ax1.plot(numpy.arange(len(self.__fPath)), self.__fPath, "k.")
     69         ax1.plot(0, self.__fPath[0], "go", label="starting point")
     70         ax1.plot(len(self.__fPath) - 1, self.__fPath[-1], "r*", label="solution")
     71         ax1.set(xlabel="$iterCnt$", ylabel="$iterVal$")
     72         ax1.legend()
     73         
     74         x1 = numpy.linspace(-100, 100, 500)
     75         x2 = numpy.linspace(-100, 100, 500)
     76         x1, x2 = numpy.meshgrid(x1, x2)
     77         f = func(x1, x2)
     78         ax2.contour(x1, x2, f, levels=36)
     79         x1Path = list(item[0] for item in self.__xPath)
     80         x2Path = list(item[1] for item in self.__xPath)
     81         ax2.plot(x1Path, x2Path, "k--", lw=2)
     82         ax2.plot(x1Path[0], x2Path[0], "go", label="starting point")
     83         ax2.plot(x1Path[-1], x2Path[-1], "r*", label="solution")
     84         ax2.set(xlabel="$x_1$", ylabel="$x_2$")
     85         ax2.legend()
     86         
     87         fig.tight_layout()
     88         fig.savefig("bfgs.png", dpi=300)
     89         plt.close()
     90     
     91     
     92     def __print_MSG(self):
     93         print("Iteration steps: {}".format(len(self.__xPath) - 1))
     94         print("Seed: {}".format(self.__xPath[0].reshape(-1)))
     95         print("Solution: {}".format(self.__xPath[-1].reshape(-1)))
     96         
     97             
     98     def __is_converged(self, gCurr):
     99         if numpy.linalg.norm(gCurr) <= self.__epsilon:
    100             return True
    101         return False
    102         
    103             
    104     def __update_D_by_BFGS(self, xCurr, gCurr, xNext, gNext, DCurr):
    105         sk = xNext - xCurr
    106         yk = gNext - gCurr
    107         rk = 1 / numpy.matmul(yk.T, sk)[0, 0]
    108         
    109         term1 = rk * numpy.matmul(sk, yk.T)
    110         term2 = rk * numpy.matmul(yk, sk.T)
    111         I = numpy.identity(term1.shape[0])
    112         term3 = numpy.matmul(I - term1, DCurr)
    113         term4 = numpy.matmul(term3, I - term2)
    114         term5 = rk * numpy.matmul(sk, sk.T)
    115         
    116         Dk = term4 + term5
    117         return Dk
    118             
    119     
    120     def __calc_alpha_by_ArmijoRule(self, xCurr, fCurr, gCurr, dCurr, c=1.e-4, v=0.5):
    121         i = 0
    122         alpha = v ** i
    123         xNext = xCurr + alpha * dCurr
    124         fNext = func(xNext[0, 0], xNext[1, 0])
    125         
    126         while True:
    127             if fNext <= fCurr + c * alpha * numpy.matmul(dCurr.T, gCurr)[0, 0]: break
    128             i += 1
    129             alpha = v ** i
    130             xNext = xCurr + alpha * dCurr
    131             fNext = func(xNext[0, 0], xNext[1, 0])
    132         
    133         return alpha
    134     
    135     
    136     def __init_D(self, n):
    137         D = numpy.identity(n)
    138         return D        
    139         
    140         
    141     def __init_seed(self, seed):
    142         if seed is None:
    143             seed = numpy.random.uniform(-100, 100, 2)
    144             
    145         seed = numpy.array(seed).reshape((2, 1))
    146         return seed
    147 
    148         
    149     def __init_path(self):
    150         self.__xPath.clear()
    151         self.__fPath.clear()
    152 
    153     
    154     def __save_path(self, xCurr, fCurr):
    155         self.__xPath.append(xCurr)
    156         self.__fPath.append(fCurr)
    157         
    158 
    159 
    160 if __name__ == "__main__":
    161     obj = BFGS()
    162     obj.solve()
    163     obj.show()
    View Code

笔者所用示例函数为:
egin{equation}label{eq_7}
f(x_1, x_2) = 5x_1^2 + 2x_2^2 + 3x_1 - 10x_2 + 4
end{equation}

  • 结果展示:
    BFGS(1)
  • 使用建议:
    1. BFGS作为一种拟牛顿法, 主要用于无法直接获取目标函数准确Hessian矩阵的情形. 此时退而求其次, 采用BFGS的更新方式获取目标函数的近似Hessian矩阵.
    2. 需要注意的是, Hessian矩阵反映函数本身的曲率信息, 因此对函数响应的噪声部分非常敏感. 如果函数响应自带噪声, 则需要采用一定的光滑手段处理之.
  • 参考文档:
    http://aria42.com/blog/2014/12/understanding-lbfgs
    https://blog.csdn.net/itplus/article/details/21897443