动手学深度学习-基础-2.4 丢弃(Dropout)_法规_行业资讯

文章最后更新时间：2023-09-13 13:34:34，由管理员负责审核发布，若内容或图片失效，请留言反馈！

《动手学深度学习》，一起动手，学习深度学习。

2.4 丢弃法(Dropout)

丢弃法，是添加Dropout层并指定丢弃概率。在训练模型时，Dropout层将以指定的丢弃概率随机丢弃上⼀层的输出元素；在测试模型时，Dropout层不发挥作⽤。

图1 丢弃法示意图

图1来源于《Dive into Deep Learning》，图 3.5 隐藏层使用了丢弃法的多层感知机，https://zh-v1.d2l.ai/chapter_deep-learning-basics/dropout.html

2.4.1 神经网络

#! -*- coding: utf-8 -*- import numpy as np import tensorflow as tf from tensorflow.keras.datasets import mnist import os from tensorflow.keras.layers import * # 安装 scikit-learn # pip install sklearn -i https://pypi.tuna.tsinghua.edu.cn/simple from sklearn import manifold, datasets import matplotlib.pyplot as plt # ==================================== # 神经网络 # ==================================== def build_sequence_network(dim, feature_dim=100, num_classes=10, dropout_rate=0.1, normalization=None): # 输入层 if normalization == batch: model = tf.keras.models.Sequential([ tf.keras.Input(shape=(dim,), name=input_layer), BatchNormalization(name=batch_norm), Dropout(rate=dropout_rate), Dense(feature_dim, activation=tanh, name=feature_layer), Dense(num_classes, name=task_layer, activation=softmax), ] ) elif normalization == layer: model = tf.keras.models.Sequential([ tf.keras.Input(shape=(dim,), name=input_layer), LayerNormalization(name=layer_norm), Dropout(rate=dropout_rate), Dense(feature_dim, activation=tanh, name=feature_layer), Dense(num_classes, name=task_layer, activation=softmax), ] ) else: model = tf.keras.models.Sequential([ tf.keras.Input(shape=(dim,), name=input_layer), Dense(feature_dim, activation=tanh, name=feature_layer), Dense(num_classes, name=task_layer, activation=softmax), ] ) # 优化器和损失函数 model.compile(optimizer=adam, loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False), metrics=tf.keras.metrics.SparseCategoricalAccuracy(name=Accuracy)) # 模型汇总 model.summary() return model

2.4.2 例子

# 训练轮次 epochs = 10 # 批次数量 batch_size = 128 # 加载数据 (x_train, y_train), (x_test, y_test) = mnist.load_data() # 形状由(60000, 28, 28)，改为(60000, 768) x_train = np.reshape(x_train, (x_train.shape[0], -1)) x_test = np.reshape(x_test, (x_test.shape[0], -1)) # ============================== # 模型训练 # ============================== norm_names = [None, batch, layer] accuracy_list = [] val_accuracy_list = [] for rate in [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]: model, encoder = build_sequence_network(dim=x_train.shape[1], normalization=layer, dropout_rate=rate) History = model.fit(x_train, y_train, shuffle=True, epochs=epochs, batch_size=batch_size, validation_data=(x_test, y_test)) accuracy_list.append(History.history[Accuracy]) val_accuracy_list.append(History.history[val_Accuracy]) # =========================================== # 显示训练集、验证集准确率 # =========================================== plt.clf() fig, ax = plt.subplots(figsize=(8, 5)) dropout_rates = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] for rate, acc, val_acc in zip(dropout_rates, accuracy_list, val_accuracy_list): # 显示训练集准确率 y = acc x = np.array(range(epochs), dtype=np.int) ax.plot(x, y, label=Dropout_{}.format(str(rate))) # 显示坐标轴名称 ax.set_xlabel(Epochs) ax.set_ylabel(Accuracy) ax.set_title(Normalization) ax.legend() figure_path = ./figures if os.path.exists(figure_path) is False: os.makedirs(figure_path) plt.savefig(os.path.join(figure_path, network_dropout_mnist_accuracy.png)) plt.clf() fig, ax = plt.subplots(figsize=(8, 5)) for rate, acc, val_acc in zip(dropout_rates, accuracy_list, val_accuracy_list): # 显示验证集准确率 y = val_acc x = np.array(range(epochs), dtype=np.int) ax.plot(x, y, label=Dropout_{}.format(str(rate))) # 显示坐标轴名称 ax.set_xlabel(Epochs) ax.set_ylabel(Accuracy) ax.set_title(Normalization) ax.legend() plt.savefig(os.path.join(figure_path, network_dropout_mnist_val_accuracy.png))

2.4.3 分析

图1 训练集准确率

图2 验证集准确率

从图1训练集的准确率来看，随着Dropout的比率越来越高，准确率越来越低，到了05以后开始加速下降。

从图2验证集的准确率来看，也是随着Dropout的比率越来越高，准确率越来越低，到了0.5以后开始加速下降。

Dropout的比率在0.5以下，可能是比较合适的。

参考文献

Sergey Ioffe, Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. https://arxiv.org/abs/1502.03167Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton. Layer Normalization. https://arxiv.org/abs/1607.06450