30. The gradient descent has been run for 15 iterations with learning rate a=0.3 and the corresponding loss function J (theta) is computed after each iteration. You find that the value of J (Theta) decreases quickly and then levels off. Based on this observation, which one of the following conclusion seems most plausible?
(A) Rather than using the current value of a, use a larger value of a (say a=1.0)
(B) Rather than using the current value of a, use a smaller value of a (say a=0.1)
(C) a=0.3 is an effective choice of learning rate
(D) Overfitting. Rather than using the current definition of J, a better loss function of J shall be chosen.
(E) None of the above

答案:登入後查看
統計: 尚無統計資料