
用deepseek学大模型06-前馈神经网络(FFNN)
deepseek.com:前馈神经网络的目标函数,损失函数,梯度下降 标量和矩阵形式的数学推导,pytorch真实能跑的代码案例以及模型图在tensorboard中的展示, 分析应用场景和优缺点,及如何改进解决及改进方法数据推导。
前馈神经网络(FFNN)完整解析
一、数学推导(标量 + 矩阵形式)
1. 前向传播
以3层网络(输入层 → 隐藏层 → 输出层)为例,定义符号:
- 输入数据: X ∈ R d × N X \in \mathbb{R}^{d \times N} X∈Rd×N( d d d为特征维度, N N N为样本数)
- 权重矩阵: W 1 ∈ R h × d W_1 \in \mathbb{R}^{h \times d} W1∈Rh×d, W 2 ∈ R c × h W_2 \in \mathbb{R}^{c \times h} W2∈Rc×h
- 偏置向量: b 1 ∈ R h b_1 \in \mathbb{R}^h b1∈Rh, b 2 ∈ R c b_2 \in \mathbb{R}^c b2∈Rc
- 激活函数: σ ( ⋅ ) \sigma(\cdot) σ(⋅)(如ReLU)
标量形式(单个样本 x i x_i xi):
z i ( 1 ) = W 1 x i + b 1 a i ( 1 ) = σ ( z i ( 1 ) ) z i ( 2 ) = W 2 a i ( 1 ) + b 2 y ^ i = softmax ( z i ( 2 ) ) ( 分类任务 ) \begin{aligned} z^{(1)}_i &= W_1 x_i + b_1 \\ a^{(1)}_i &= \sigma(z^{(1)}_i) \\ z^{(2)}_i &= W_2 a^{(1)}_i + b_2 \\ \hat{y}_i &= \text{softmax}(z^{(2)}_i) \quad (\text{分类任务}) \end{aligned} zi(1)ai(1)zi(2)y^i=W1xi+b1=σ(zi(1))=W2ai(1)+b2=softmax(zi(2))(分类任务)
矩阵形式(批量数据):
Z 1 = W 1 X + b 1 1 N T A 1 = σ ( Z 1 ) Z 2 = W 2 A 1 + b 2 1 N T Y ^ = softmax ( Z 2 ) \begin{aligned} Z_1 &= W_1 X + b_1 \mathbf{1}_N^T \\ A_1 &= \sigma(Z_1) \\ Z_2 &= W_2 A_1 + b_2 \mathbf{1}_N^T \\ \hat{Y} &= \text{softmax}(Z_2) \end{aligned} Z1A1Z2Y^=W1X+b11NT=σ(Z1)=W2A1+b21NT=softmax(Z2)
(其中 1 N \mathbf{1}_N 1N是长度为 N N N的全1向量)
2. 目标函数与损失函数
-
交叉熵损失(分类任务):
L = − 1 N ∑ i = 1 N ∑ k = 1 c y i , k log y ^ i , k (标量形式) L = -\frac{1}{N} \sum_{i=1}^N \sum_{k=1}^c y_{i,k} \log \hat{y}_{i,k} \quad \text{(标量形式)} L=−N1i=1∑Nk=1∑cyi,klogy^i,k(标量形式)
L = − 1 N Tr ( Y T log Y ^ ) (矩阵形式) L = -\frac{1}{N} \text{Tr}(Y^T \log \hat{Y}) \quad \text{(矩阵形式)} L=−N1Tr(YTlogY^)(矩阵形式)
( Y Y Y为one-hot编码标签矩阵) -
均方误差(回归任务):
L = 1 2 N ∑ i = 1 N ∥ y i − y ^ i ∥ 2 (标量形式) L = \frac{1}{2N} \sum_{i=1}^N \| y_i - \hat{y}_i \|^2 \quad \text{(标量形式)} L=2N1i=1∑N∥yi−y^i∥2(标量形式)
L = 1 2 N ∥ Y − Y ^ ∥ F 2 (矩阵形式) L = \frac{1}{2N} \| Y - \hat{Y} \|_F^2 \quad \text{(矩阵形式)} L=2N1∥Y−Y^∥F2(矩阵形式)
3. 梯度下降推导
反向传播(矩阵形式):
-
输出层梯度:
∂ L ∂ Z 2 = 1 N ( Y ^ − Y ) ( 交叉熵损失 ) \frac{\partial L}{\partial Z_2} = \frac{1}{N} (\hat{Y} - Y) \quad (\text{交叉熵损失}) ∂Z2∂L=N1(Y^−Y)(交叉熵损失)
∂ L ∂ W 2 = ∂ L ∂ Z 2 A 1 T , ∂ L ∂ b 2 = ∂ L ∂ Z 2 1 N \frac{\partial L}{\partial W_2} = \frac{\partial L}{\partial Z_2} A_1^T, \quad \frac{\partial L}{\partial b_2} = \frac{\partial L}{\partial Z_2} \mathbf{1}_N ∂W2∂L=∂Z2∂LA1T,∂b2∂L=∂Z2∂L1N -
隐藏层梯度:
∂ L ∂ Z 1 = ( W 2 T ∂ L ∂ Z 2 ) ⊙ σ ′ ( Z 1 ) \frac{\partial L}{\partial Z_1} = (W_2^T \frac{\partial L}{\partial Z_2}) \odot \sigma'(Z_1) ∂Z1∂L=(W2T∂Z2∂L)⊙σ′(Z1)
∂ L ∂ W 1 = ∂ L ∂ Z 1 X T , ∂ L ∂ b 1 = ∂ L ∂ Z 1 1 N \frac{\partial L}{\partial W_1} = \frac{\partial L}{\partial Z_1} X^T, \quad \frac{\partial L}{\partial b_1} = \frac{\partial L}{\partial Z_1} \mathbf{1}_N ∂W1∂L=∂Z1∂LXT,∂b1∂L=∂Z1∂L1N
二、PyTorch代码实现(含TensorBoard可视化)
1. 数据准备与模型定义
import torch
import torch.nn as nn
from torch.utils.tensorboard import SummaryWriter
from sklearn.datasets import make_moons
# 生成非线性数据
X, y = make_moons(n_samples=1000, noise=0.1)
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.long)
# 定义模型
class FFNN(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(2, 64)
self.fc2 = nn.Linear(64, 2)
self.relu = nn.ReLU()
def forward(self, x):
x = self.relu(self.fc1(x))
return self.fc2(x)
model = FFNN()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
2. 训练与TensorBoard记录
writer = SummaryWriter('runs/ffnn_experiment')
# 记录模型图
dummy_input = torch.randn(1, 2) # 输入示例
writer.add_graph(model, dummy_input)
# 训练循环
for epoch in range(1000):
optimizer.zero_grad()
outputs = model(X)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
# 记录损失
writer.add_scalar('Loss/train', loss.item(), epoch)
if epoch % 100 == 0:
print(f'Epoch {epoch}, Loss: {loss.item():.4f}')
writer.close()
在终端运行:
tensorboard --logdir=runs
3. 可视化输出
TensorBoard中将显示:
- 模型计算图
- 训练损失曲线
三、应用场景与优缺点分析
应用场景
场景类型 | 典型任务 | 数据示例 |
---|---|---|
表格数据分类 | 客户流失预测 | CSV结构化数据 |
简单图像分类 | MNIST手写数字识别 | 28x28灰度图像 |
数值预测 | 股票价格预测 | 时间序列数据 |
优缺点对比
优点 | 缺点 |
---|---|
① 结构简单,易于实现 | ① 全连接层参数量大,计算成本高 |
② 通用性强(万能近似定理) | ② 难以处理空间数据(需CNN) |
③ 可解释性(特征权重可视化) | ③ 梯度消失/爆炸问题 |
四、改进方法与数学推导
1. 权重正则化(L2正则化)
数学推导:
- 修改损失函数:
L reg = L + λ 2 ( ∥ W 1 ∥ F 2 + ∥ W 2 ∥ F 2 ) L_{\text{reg}} = L + \frac{\lambda}{2} (\|W_1\|_F^2 + \|W_2\|_F^2) Lreg=L+2λ(∥W1∥F2+∥W2∥F2) - 梯度更新:
∂ L reg ∂ W l = ∂ L ∂ W l + λ W l \frac{\partial L_{\text{reg}}}{\partial W_l} = \frac{\partial L}{\partial W_l} + \lambda W_l ∂Wl∂Lreg=∂Wl∂L+λWl - PyTorch实现:
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)
2. 批量归一化(BatchNorm)
数学形式:
z ^ ( l ) = z ( l ) − μ B σ B 2 + ϵ , y ( l ) = γ z ^ ( l ) + β \hat{z}^{(l)} = \frac{z^{(l)} - \mu_B}{\sqrt{\sigma_B^2 + \epsilon}}, \quad y^{(l)} = \gamma \hat{z}^{(l)} + \beta z^(l)=σB2+ϵz(l)−μB,y(l)=γz^(l)+β
- PyTorch实现:
self.bn = nn.BatchNorm1d(64)
3. 残差连接(ResNet思想)
数学形式:
a ( l + 1 ) = σ ( W l a ( l ) + b l ) + a ( l ) a^{(l+1)} = \sigma(W_l a^{(l)} + b_l) + a^{(l)} a(l+1)=σ(Wla(l)+bl)+a(l)
- PyTorch实现:
class ResidualBlock(nn.Module): def __init__(self, dim): super().__init__() self.fc = nn.Linear(dim, dim) self.bn = nn.BatchNorm1d(dim) def forward(self, x): return x + self.bn(self.fc(x))
五、核心问题解决方案
问题类型 | 改进方法 | 数学原理 | PyTorch实现示例 |
---|---|---|---|
过拟合 | Dropout | 训练时随机屏蔽神经元 | nn.Dropout(0.5) |
梯度消失 | 残差连接 | 恒等映射保留原始信号 | 见上方ResidualBlock代码 |
训练不稳定 | 批量归一化 | 标准化层输入分布 | nn.BatchNorm1d() |
局部最优 | 学习率衰减 | 动态调整搜索步长 | torch.optim.lr_scheduler |
六、扩展实验建议
-
超参数调优:
# 使用Optuna自动调参 import optuna def objective(trial): lr = trial.suggest_float('lr', 1e-4, 1e-2) hidden_dim = trial.suggest_int('hidden_dim', 32, 256) model = FFNN(hidden_dim) optimizer = torch.optim.Adam(model.parameters(), lr=lr) # ...训练过程... return test_loss
-
模型轻量化:
# 使用模型剪枝 from torch.nn.utils import prune parameters_to_prune = [(model.fc1, 'weight'), (model.fc2, 'weight')] prune.global_unstructured(parameters_to_prune, pruning_method=prune.L1Unstructured, amount=0.2)
通过以上内容,您可全面掌握前馈神经网络的核心理论与工程实现,并能针对实际问题进行优化改进。
附mnist图像识别
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
# 定义MLP模型
class MLP(nn.Module):
def __init__(self):
super(MLP, self).__init__()
self.fc1 = nn.Linear(28*28, 128) # 输入层到隐藏层
self.fc2 = nn.Linear(128, 10) # 隐藏层到输出层
def forward(self, x):
x = torch.relu(self.fc1(x)) # 激活函数
x = self.fc2(x)
return x
# 定义超参数
batch_size = 64
epochs = 5
learning_rate = 0.01
# 准备数据
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)
# 初始化模型、优化器和损失函数
model = MLP()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
writer = SummaryWriter('runs/mnist')
# 记录模型图
dummy_input = torch.randn(1, 784) # 输入示例
writer.add_graph(model, dummy_input)
# 训练模型
for epoch in range(epochs):
model.train()
running_loss = 0.0
for batch_idx, (data, target) in enumerate(train_loader):
data = data.view(-1, 28*28) # 展平输入
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
running_loss += loss.item()
writer.add_scalar('Loss/train', loss.item(), epoch)
print(f'Epoch {epoch+1}/{epochs}, Loss: {running_loss/len(train_loader):.4f}')
# 测试模型
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data = data.view(-1, 28*28)
output = model(data)
test_loss += criterion(output, target).item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
accuracy = correct / len(test_loader.dataset)
writer.close()
print(f'Test set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(test_loader.dataset)} ({100. * accuracy:.2f}%)')
#git clone https://github.com/knamdar/data
更多推荐
所有评论(0)