逻辑回归是分类!!
理论证明及交叉熵误差见https://blog.csdn.net/weixin_39445556/article/details/83930186
对于输入x,输出 $ h_{\theta}(x) = g(\theta^{T}x)$
其中 $ g(z)=\frac{1}{1+e^{−z}} $,即Sigmoid函数。
1 2 3 def sigmoid (z ): return (1 / (1 + np.exp(-z)))
损失函数
假设输入X为m维向量
$$ J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\big[-y^{(i)}, log,( h_\theta,(x^{(i)}))-(1-y^{(i)}),log,(1-h_\theta(x^{(i)}))\big]$$
向量化的损失函数(矩阵形式)
$$ J(\theta) = \frac{1}{m}\big((,log,(g(X\theta))^Ty+(,log,(1-g(X\theta))^T(1-y)\big)$$
1 2 3 4 5 6 7 8 9 10 def costFunction (theta, X, y ): m = y.size h = sigmoid(X.dot(theta)) J = -1.0 *(1.0 /m)*(np.log(h).T.dot(y)+np.log(1 -h).T.dot(1 -y)) if np.isnan(J[0 ]): return (np.inf) return J[0 ]
求偏导(梯度)
$$ \frac{\delta J(\theta)}{\delta\theta_{j}} = \frac{1}{m}\sum_{i=1}^{m} ( h_\theta (x^{(i)})-y^{(i)})x^{(i)}_{j} $$
向量化的偏导(梯度)
$$ \frac{\delta J(\theta)}{\delta\theta_{j}} = \frac{1}{m} X^T(g(X\theta)-y)$$
1 2 3 4 5 6 7 8 def gradient (theta, X, y ): m = y.size h = sigmoid(X.dot(theta.reshape(-1 ,1 ))) grad =(1.0 /m)*X.T.dot(h-y) return (grad.flatten())
最小化损失函数
1 2 res = minimize(costFunction, initial_theta, args=(X,y), jac=gradient, options={'maxiter' :400 }) res
minimize为scipy.optimize中函数(详见scipy笔记)
预测
1 2 3 def predict (theta, X, threshold=0.5 ): p = sigmoid(X.dot(theta.T)) >= threshold return (p.astype('int' ))
正则化
损失函数
$$ J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\big[-y^{(i)}, log,( h_\theta,(x^{(i)}))-(1-y^{(i)}),log,(1-h_\theta(x^{(i)}))\big] + \frac{\lambda}{2m}\sum_{j=1}^{n}\theta_{j}^{2}$$
向量化的损失函数(矩阵形式)
$$ J(\theta) = \frac{1}{m}\big((,log,(g(X\theta))^Ty+(,log,(1-g(X\theta))^T(1-y)\big) + \frac{\lambda}{2m}\sum_{j=1}^{n}\theta_{j}^{2}$$
1 2 3 4 5 6 7 8 9 10 def costFunctionReg (theta, reg, *args ): m = y.size h = sigmoid(XX.dot(theta)) J = -1.0 *(1.0 /m)*(np.log(h).T.dot(y)+np.log(1 -h).T.dot(1 -y)) + (reg/(2.0 *m))*np.sum (np.square(theta[1 :])) if np.isnan(J[0 ]): return (np.inf) return (J[0 ])
偏导(梯度)
$$ \frac{\delta J(\theta)}{\delta\theta_{j}} = \frac{1}{m}\sum_{i=1}^{m} ( h_\theta (x^{(i)})-y^{(i)})x^{(i)}{j} + \frac{\lambda}{m}\theta {j}$$
向量化的偏导(梯度)
$$ \frac{\delta J(\theta)}{\delta\theta_{j}} = \frac{1}{m} X^T(g(X\theta)-y) + \frac{\lambda}{m}\theta_{j}$$
$$\text{注意,我们另外自己加的参数 } \theta_{0} \text{ 不需要被正则化}$$
1 2 3 4 5 6 7 def gradientReg (theta, reg, *args ): m = y.size h = sigmoid(XX.dot(theta.reshape(-1 ,1 ))) grad = (1.0 /m)*XX.T.dot(h-y) + (reg/m)*np.r_[[[0 ]],theta[1 :].reshape(-1 ,1 )] return (grad.flatten())
最小化损失函数
1 2 res = minimize(costFunctionReg, initial_theta, args=(X,y), jac=gradientReg, options={'maxiter' :400 }) res
预测
1 2 3 def predict (theta, X, threshold=0.5 ): p = sigmoid(X.dot(theta.T)) >= threshold return (p.astype('int' ))