逻辑回归是分类!!

理论证明及交叉熵误差见https://blog.csdn.net/weixin_39445556/article/details/83930186

对于输入x,输出 $ h_{\theta}(x) = g(\theta^{T}x)$

其中 $ g(z)=\frac{1}{1+e^{−z}} $,即Sigmoid函数。

1
2
3
#定义sigmoid函数
def sigmoid(z):
return(1 / (1 + np.exp(-z)))

损失函数

假设输入X为m维向量

$$ J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\big[-y^{(i)}, log,( h_\theta,(x^{(i)}))-(1-y^{(i)}),log,(1-h_\theta(x^{(i)}))\big]$$

向量化的损失函数(矩阵形式)

$$ J(\theta) = \frac{1}{m}\big((,log,(g(X\theta))^Ty+(,log,(1-g(X\theta))^T(1-y)\big)$$

1
2
3
4
5
6
7
8
9
10
#定义损失函数
def costFunction(theta, X, y):
m = y.size
h = sigmoid(X.dot(theta))

J = -1.0*(1.0/m)*(np.log(h).T.dot(y)+np.log(1-h).T.dot(1-y))

if np.isnan(J[0]):
return(np.inf)
return J[0]

求偏导(梯度)

$$ \frac{\delta J(\theta)}{\delta\theta_{j}} = \frac{1}{m}\sum_{i=1}^{m} ( h_\theta (x^{(i)})-y^{(i)})x^{(i)}_{j} $$

向量化的偏导(梯度)

$$ \frac{\delta J(\theta)}{\delta\theta_{j}} = \frac{1}{m} X^T(g(X\theta)-y)$$

1
2
3
4
5
6
7
8
#求解梯度
def gradient(theta, X, y):
m = y.size
h = sigmoid(X.dot(theta.reshape(-1,1)))

grad =(1.0/m)*X.T.dot(h-y)

return(grad.flatten())

最小化损失函数

1
2
res = minimize(costFunction, initial_theta, args=(X,y), jac=gradient, options={'maxiter':400})
res

minimize为scipy.optimize中函数(详见scipy笔记)

预测

1
2
3
def predict(theta, X, threshold=0.5):
p = sigmoid(X.dot(theta.T)) >= threshold
return(p.astype('int'))

正则化

损失函数

$$ J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\big[-y^{(i)}, log,( h_\theta,(x^{(i)}))-(1-y^{(i)}),log,(1-h_\theta(x^{(i)}))\big] + \frac{\lambda}{2m}\sum_{j=1}^{n}\theta_{j}^{2}$$

向量化的损失函数(矩阵形式)

$$ J(\theta) = \frac{1}{m}\big((,log,(g(X\theta))^Ty+(,log,(1-g(X\theta))^T(1-y)\big) + \frac{\lambda}{2m}\sum_{j=1}^{n}\theta_{j}^{2}$$

1
2
3
4
5
6
7
8
9
10
# 定义损失函数
def costFunctionReg(theta, reg, *args):
m = y.size
h = sigmoid(XX.dot(theta))

J = -1.0*(1.0/m)*(np.log(h).T.dot(y)+np.log(1-h).T.dot(1-y)) + (reg/(2.0*m))*np.sum(np.square(theta[1:]))

if np.isnan(J[0]):
return(np.inf)
return(J[0])

偏导(梯度)

$$ \frac{\delta J(\theta)}{\delta\theta_{j}} = \frac{1}{m}\sum_{i=1}^{m} ( h_\theta (x^{(i)})-y^{(i)})x^{(i)}{j} + \frac{\lambda}{m}\theta{j}$$

向量化的偏导(梯度)

$$ \frac{\delta J(\theta)}{\delta\theta_{j}} = \frac{1}{m} X^T(g(X\theta)-y) + \frac{\lambda}{m}\theta_{j}$$

$$\text{注意,我们另外自己加的参数 } \theta_{0} \text{ 不需要被正则化}$$
1
2
3
4
5
6
7
def gradientReg(theta, reg, *args):
m = y.size
h = sigmoid(XX.dot(theta.reshape(-1,1)))

grad = (1.0/m)*XX.T.dot(h-y) + (reg/m)*np.r_[[[0]],theta[1:].reshape(-1,1)]

return(grad.flatten())

最小化损失函数

1
2
res = minimize(costFunctionReg, initial_theta, args=(X,y), jac=gradientReg, options={'maxiter':400})
res

预测

1
2
3
def predict(theta, X, threshold=0.5):
p = sigmoid(X.dot(theta.T)) >= threshold
return(p.astype('int'))