30. The gradient descent has been run for 15 iterations with learning rate a=0.3 and the corresponding loss function J (theta) is computed after each iteration. You find that the value of J (Theta) decreases quickly and then levels off. Based on this observation, which one of the following conclusion seems most plausible? (A) Rather than using the current value of a, use a larger value of a (say a=1.0) (B) Rather than using the current value of a, use a smaller value of a (say a=0.1) (C) a=0.3 is an effective choice of learning rate (D) Overfitting. Rather than using the current definition of J, a better loss function of J shall be chosen. (E) None of the above