deepseek.com:前馈神经网络的目标函数,损失函数,梯度下降 标量和矩阵形式的数学推导,pytorch真实能跑的代码案例以及模型图在tensorboard中的展示, 分析应用场景和优缺点,及如何改进解决及改进方法数据推导。

前馈神经网络(FFNN)完整解析


一、数学推导(标量 + 矩阵形式)

1. 前向传播

以3层网络(输入层 → 隐藏层 → 输出层)为例,定义符号:

  • 输入数据: X ∈ R d × N X \in \mathbb{R}^{d \times N} XRd×N d d d为特征维度, N N N为样本数)
  • 权重矩阵: W 1 ∈ R h × d W_1 \in \mathbb{R}^{h \times d} W1Rh×d, W 2 ∈ R c × h W_2 \in \mathbb{R}^{c \times h} W2Rc×h
  • 偏置向量: b 1 ∈ R h b_1 \in \mathbb{R}^h b1Rh, b 2 ∈ R c b_2 \in \mathbb{R}^c b2Rc
  • 激活函数: σ ( ⋅ ) \sigma(\cdot) σ()(如ReLU)

标量形式(单个样本 x i x_i xi
z i ( 1 ) = W 1 x i + b 1 a i ( 1 ) = σ ( z i ( 1 ) ) z i ( 2 ) = W 2 a i ( 1 ) + b 2 y ^ i = softmax ( z i ( 2 ) ) ( 分类任务 ) \begin{aligned} z^{(1)}_i &= W_1 x_i + b_1 \\ a^{(1)}_i &= \sigma(z^{(1)}_i) \\ z^{(2)}_i &= W_2 a^{(1)}_i + b_2 \\ \hat{y}_i &= \text{softmax}(z^{(2)}_i) \quad (\text{分类任务}) \end{aligned} zi(1)ai(1)zi(2)y^i=W1xi+b1=σ(zi(1))=W2ai(1)+b2=softmax(zi(2))(分类任务)

矩阵形式(批量数据)
Z 1 = W 1 X + b 1 1 N T A 1 = σ ( Z 1 ) Z 2 = W 2 A 1 + b 2 1 N T Y ^ = softmax ( Z 2 ) \begin{aligned} Z_1 &= W_1 X + b_1 \mathbf{1}_N^T \\ A_1 &= \sigma(Z_1) \\ Z_2 &= W_2 A_1 + b_2 \mathbf{1}_N^T \\ \hat{Y} &= \text{softmax}(Z_2) \end{aligned} Z1A1Z2Y^=W1X+b11NT=σ(Z1)=W2A1+b21NT=softmax(Z2)
(其中 1 N \mathbf{1}_N 1N是长度为 N N N的全1向量)


2. 目标函数与损失函数
  • 交叉熵损失(分类任务)
    L = − 1 N ∑ i = 1 N ∑ k = 1 c y i , k log ⁡ y ^ i , k (标量形式) L = -\frac{1}{N} \sum_{i=1}^N \sum_{k=1}^c y_{i,k} \log \hat{y}_{i,k} \quad \text{(标量形式)} L=N1i=1Nk=1cyi,klogy^i,k(标量形式)
    L = − 1 N Tr ( Y T log ⁡ Y ^ ) (矩阵形式) L = -\frac{1}{N} \text{Tr}(Y^T \log \hat{Y}) \quad \text{(矩阵形式)} L=N1Tr(YTlogY^)(矩阵形式)
    Y Y Y为one-hot编码标签矩阵)

  • 均方误差(回归任务)
    L = 1 2 N ∑ i = 1 N ∥ y i − y ^ i ∥ 2 (标量形式) L = \frac{1}{2N} \sum_{i=1}^N \| y_i - \hat{y}_i \|^2 \quad \text{(标量形式)} L=2N1i=1Nyiy^i2(标量形式)
    L = 1 2 N ∥ Y − Y ^ ∥ F 2 (矩阵形式) L = \frac{1}{2N} \| Y - \hat{Y} \|_F^2 \quad \text{(矩阵形式)} L=2N1YY^F2(矩阵形式)


3. 梯度下降推导

反向传播(矩阵形式)

  1. 输出层梯度
    ∂ L ∂ Z 2 = 1 N ( Y ^ − Y ) ( 交叉熵损失 ) \frac{\partial L}{\partial Z_2} = \frac{1}{N} (\hat{Y} - Y) \quad (\text{交叉熵损失}) Z2L=N1(Y^Y)(交叉熵损失)
    ∂ L ∂ W 2 = ∂ L ∂ Z 2 A 1 T , ∂ L ∂ b 2 = ∂ L ∂ Z 2 1 N \frac{\partial L}{\partial W_2} = \frac{\partial L}{\partial Z_2} A_1^T, \quad \frac{\partial L}{\partial b_2} = \frac{\partial L}{\partial Z_2} \mathbf{1}_N W2L=Z2LA1T,b2L=Z2L1N

  2. 隐藏层梯度
    ∂ L ∂ Z 1 = ( W 2 T ∂ L ∂ Z 2 ) ⊙ σ ′ ( Z 1 ) \frac{\partial L}{\partial Z_1} = (W_2^T \frac{\partial L}{\partial Z_2}) \odot \sigma'(Z_1) Z1L=(W2TZ2L)σ(Z1)
    ∂ L ∂ W 1 = ∂ L ∂ Z 1 X T , ∂ L ∂ b 1 = ∂ L ∂ Z 1 1 N \frac{\partial L}{\partial W_1} = \frac{\partial L}{\partial Z_1} X^T, \quad \frac{\partial L}{\partial b_1} = \frac{\partial L}{\partial Z_1} \mathbf{1}_N W1L=Z1LXT,b1L=Z1L1N


二、PyTorch代码实现(含TensorBoard可视化)

1. 数据准备与模型定义
import torch
import torch.nn as nn
from torch.utils.tensorboard import SummaryWriter
from sklearn.datasets import make_moons

# 生成非线性数据
X, y = make_moons(n_samples=1000, noise=0.1)
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.long)

# 定义模型
class FFNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(2, 64)
        self.fc2 = nn.Linear(64, 2)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        return self.fc2(x)

model = FFNN()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

2. 训练与TensorBoard记录
writer = SummaryWriter('runs/ffnn_experiment')

# 记录模型图
dummy_input = torch.randn(1, 2)  # 输入示例
writer.add_graph(model, dummy_input)

# 训练循环
for epoch in range(1000):
    optimizer.zero_grad()
    outputs = model(X)
    loss = criterion(outputs, y)
    loss.backward()
    optimizer.step()

    # 记录损失
    writer.add_scalar('Loss/train', loss.item(), epoch)
    
    if epoch % 100 == 0:
        print(f'Epoch {epoch}, Loss: {loss.item():.4f}')

writer.close()

在终端运行:

tensorboard --logdir=runs

3. 可视化输出

TensorBoard中将显示:

  • 模型计算图
  • 训练损失曲线

三、应用场景与优缺点分析

应用场景
场景类型 典型任务 数据示例
表格数据分类 客户流失预测 CSV结构化数据
简单图像分类 MNIST手写数字识别 28x28灰度图像
数值预测 股票价格预测 时间序列数据

优缺点对比
优点 缺点
① 结构简单,易于实现 ① 全连接层参数量大,计算成本高
② 通用性强(万能近似定理) ② 难以处理空间数据(需CNN)
③ 可解释性(特征权重可视化) ③ 梯度消失/爆炸问题

四、改进方法与数学推导

1. 权重正则化(L2正则化)

数学推导

  • 修改损失函数:
    L reg = L + λ 2 ( ∥ W 1 ∥ F 2 + ∥ W 2 ∥ F 2 ) L_{\text{reg}} = L + \frac{\lambda}{2} (\|W_1\|_F^2 + \|W_2\|_F^2) Lreg=L+2λ(W1F2+W2F2)
  • 梯度更新:
    ∂ L reg ∂ W l = ∂ L ∂ W l + λ W l \frac{\partial L_{\text{reg}}}{\partial W_l} = \frac{\partial L}{\partial W_l} + \lambda W_l WlLreg=WlL+λWl
  • PyTorch实现:
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)
    
2. 批量归一化(BatchNorm)

数学形式
z ^ ( l ) = z ( l ) − μ B σ B 2 + ϵ , y ( l ) = γ z ^ ( l ) + β \hat{z}^{(l)} = \frac{z^{(l)} - \mu_B}{\sqrt{\sigma_B^2 + \epsilon}}, \quad y^{(l)} = \gamma \hat{z}^{(l)} + \beta z^(l)=σB2+ϵ z(l)μB,y(l)=γz^(l)+β

  • PyTorch实现:
    self.bn = nn.BatchNorm1d(64)
    
3. 残差连接(ResNet思想)

数学形式
a ( l + 1 ) = σ ( W l a ( l ) + b l ) + a ( l ) a^{(l+1)} = \sigma(W_l a^{(l)} + b_l) + a^{(l)} a(l+1)=σ(Wla(l)+bl)+a(l)

  • PyTorch实现:
    class ResidualBlock(nn.Module):
        def __init__(self, dim):
            super().__init__()
            self.fc = nn.Linear(dim, dim)
            self.bn = nn.BatchNorm1d(dim)
        
        def forward(self, x):
            return x + self.bn(self.fc(x))
    

五、核心问题解决方案
问题类型 改进方法 数学原理 PyTorch实现示例
过拟合 Dropout 训练时随机屏蔽神经元 nn.Dropout(0.5)
梯度消失 残差连接 恒等映射保留原始信号 见上方ResidualBlock代码
训练不稳定 批量归一化 标准化层输入分布 nn.BatchNorm1d()
局部最优 学习率衰减 动态调整搜索步长 torch.optim.lr_scheduler

六、扩展实验建议
  1. 超参数调优

    # 使用Optuna自动调参
    import optuna
    def objective(trial):
        lr = trial.suggest_float('lr', 1e-4, 1e-2)
        hidden_dim = trial.suggest_int('hidden_dim', 32, 256)
        model = FFNN(hidden_dim)
        optimizer = torch.optim.Adam(model.parameters(), lr=lr)
        # ...训练过程...
        return test_loss
    
  2. 模型轻量化

    # 使用模型剪枝
    from torch.nn.utils import prune
    parameters_to_prune = [(model.fc1, 'weight'), (model.fc2, 'weight')]
    prune.global_unstructured(parameters_to_prune, pruning_method=prune.L1Unstructured, amount=0.2)
    

通过以上内容,您可全面掌握前馈神经网络的核心理论与工程实现,并能针对实际问题进行优化改进。

附mnist图像识别

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
# 定义MLP模型
class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)  # 输入层到隐藏层
        self.fc2 = nn.Linear(128, 10)    # 隐藏层到输出层

    def forward(self, x):
        x = torch.relu(self.fc1(x))  # 激活函数
        x = self.fc2(x)
        return x

# 定义超参数
batch_size = 64
epochs = 5
learning_rate = 0.01

# 准备数据
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

# 初始化模型、优化器和损失函数
model = MLP()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

writer = SummaryWriter('runs/mnist')

# 记录模型图
dummy_input = torch.randn(1, 784)  # 输入示例
writer.add_graph(model, dummy_input)

# 训练模型
for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    for batch_idx, (data, target) in enumerate(train_loader):
        data = data.view(-1, 28*28)  # 展平输入
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        writer.add_scalar('Loss/train', loss.item(), epoch)
    print(f'Epoch {epoch+1}/{epochs}, Loss: {running_loss/len(train_loader):.4f}')

# 测试模型
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
    for data, target in test_loader:
        data = data.view(-1, 28*28)
        output = model(data)
        test_loss += criterion(output, target).item()
        pred = output.argmax(dim=1, keepdim=True)
        correct += pred.eq(target.view_as(pred)).sum().item()

test_loss /= len(test_loader.dataset)
accuracy = correct / len(test_loader.dataset)
writer.close()
print(f'Test set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(test_loader.dataset)} ({100. * accuracy:.2f}%)')

#git clone https://github.com/knamdar/data
Logo

欢迎加入DeepSeek 技术社区。在这里,你可以找到志同道合的朋友,共同探索AI技术的奥秘。

更多推荐