当前位置：首页 > news >正文

网站内容由什么组成部分组成湛江免费做网站

news 2026/5/6 14:15:38

网站内容由什么组成部分组成,湛江免费做网站,广告公司电话,简洁的网站设计真实数据集引言数据准备数据加载数据预处理数据洗牌批次#xff08;Batches#xff09;训练#xff08;Training#xff09;到目前为止的全部代码#xff1a; 引言在实践中#xff0c;深度学习通常涉及庞大的数据集#xff08;通常以TB甚至更多为单位#xff09;Batches训练Training到目前为止的全部代码引言在实践中深度学习通常涉及庞大的数据集通常以TB甚至更多为单位模型的训练可能需要数天、数周甚至数月。这就是为什么到目前为止我们使用了程序生成的数据集来使学习过程更易管理并保持快速同时学习深度学习的数学和其他相关方面的知识。本书的主要目标是教授神经网络的工作原理而不是深度学习在各种问题中的应用。话虽如此现在我们将探索一个更实际的数据集因为这将带来一些我们尚未考虑的深度学习新挑战。如果你在阅读本书之前已经探索过深度学习你可能已经熟悉也可能感到厌倦MNIST数据集这是一个包含手写数字0到9的图像数据集每张图像的分辨率为28x28像素。它是一个相对较小的数据集对模型来说也相对容易学习。这个数据集曾成为深度学习的“Hello World”并且一度是机器学习算法的基准。然而这个数据集的问题在于获得99%以上的准确率变得极其容易因此它无法提供足够的空间来学习各种参数如何影响模型的学习过程。然而在2017年一家名为Zalando的公司发布了一个名为Fashion MNIST的数据集https://arxiv.org/abs/1708.07747这是MNIST数据集的直接替代品https://github.com/zalandoresearch/fashion-mnist。 Fashion MNIST数据集包含60,000个训练样本和10,000个测试样本这些样本是28x28像素的图像涵盖了10种不同的服装类别例如鞋子、靴子、衬衫、包等。我们稍后会看到一些示例但首先我们需要获取实际的数据。由于原始数据集由包含特定格式编码图像数据的二进制文件组成为了本书的使用我们已经准备并托管了一个预处理数据集其中包含以.png格式保存的图像。通常对于图像来说使用无损压缩是明智的因为有损压缩例如JPEG会通过更改图像数据对图像造成影响。这些图像还根据标签分组并被分成训练组和测试组。样本是服装物品的图像而标签是分类信息。以下是数字标签及其对应的描述数据准备首先我们将从nnfs.io网站获取数据。让我们定义数据集的URL、本地保存的文件名以及解压图像的文件夹 URL https://nnfs.io/datasets/fashion_mnist_images.zip FILE fashion_mnist_images.zip FOLDER fashion_mnist_images接下来使用Python的标准库urllib下载压缩数据如果指定路径下的文件不存在 import os import urllib import urllib.requestif not os.path.isfile(FILE):print(fDownloading {URL} and saving as {FILE}...)urllib.request.urlretrieve(URL, FILE)接下来我们将使用另一个标准的Python库zipfile来解压文件。我们会使用上下文管理器即with关键字它会为我们打开和关闭文件来获取压缩文件的句柄并使用.extractall方法和指定的FOLDER提取所有包含的文件 from zipfile import ZipFileprint(Unzipping images...) with ZipFile(FILE) as zip_images:zip_images.extractall(FOLDER)检索数据的完整代码 from zipfile import ZipFile import os import urllib import urllib.requestURL https://nnfs.io/datasets/fashion_mnist_images.zip FILE fashion_mnist_images.zip FOLDER fashion_mnist_images if not os.path.isfile(FILE):print(fDownloading {URL} and saving as {FILE}...)urllib.request.urlretrieve(URL, FILE)print(Unzipping images...) with ZipFile(FILE) as zip_images:zip_images.extractall(FOLDER)print(Done!)运行 Downloading https://nnfs.io/datasets/fashion_mnist_images.zip and saving as fashion_mnist_images.zip... Unzipping images... Done!现在你应该有一个名为fashion_mnist_images的目录其中包含test和train目录以及数据许可文件。在test和train目录中各有10个子目录编号从0到9。这些数字是与其中图像对应的分类。例如如果我们打开目录0可以看到这些是短袖或无袖衬衫的图像。例如在目录 7 中我们有非靴子鞋或本数据集创建者分类的运动鞋。例如将图像转换为灰度图即将每像素的三通道RGB值转换为单一的黑白范围像素值为0到255是一种常见的做法不过这些图像已经是灰度图。另外将图像调整大小以规范其尺寸也是一种常见的做法但同样地Fashion MNIST数据集已经经过处理所有图像的尺寸都相同28x28。数据加载接下来我们需要将这些图像读入Python并将图像像素数据与相应的标签关联起来。我们可以通过以下方式访问这些目录 import oslabels os.listdir(fashion_mnist_images/train) print(labels)[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]由于子目录名称本身就是标签我们可以通过查看每个编号子目录中的文件来引用每个类别的单个样本 files os.listdir(fashion_mnist_images/train/0) print(files[:10]) print(len(files))[0000.png, 0001.png, 0002.png, 0003.png, 0004.png, 0005.png, 0006.png, 0007.png, 0008.png, 0009.png] 6000如你所见我们有6,000个类别0的样本。总共我们有60,000个样本每个类别6,000个。这意味着我们的数据集已经是平衡的每个类别出现的频率相同。如果数据集未平衡神经网络可能会倾向于预测包含最多图像的类别。这是因为神经网络本质上会寻找最陡峭且最快的梯度下降以减少损失这可能导致陷入局部最小值使模型无法找到全局损失的最小值。我们这里总共有10个类别因此在一个平衡的数据集中随机预测的准确率大约为10%。然而假设数据集中类别的不平衡程度为类别0占64%而类别1到9分别仅占4%。神经网络可能会很快学会始终预测类别0。尽管模型的损失最初会迅速降低但它可能会一直停留在预测类别0上准确率接近64%。在这种情况下我们最好通过削减高频类别的样本数量使每个类别的样本数量相同。另一个选择是使用类别权重在计算损失时为频率较高的类别赋予小于1的权重。然而在实践中我们几乎没有见过这种方法效果很好。对于图像数据另一个选择是通过裁剪、旋转、水平或垂直翻转等操作来扩充样本。在应用这些变换之前需确保它们会生成符合目标的有效样本。幸运的是我们无需担心这一点因为Fashion MNIST数据集已经完全平衡。现在我们将通过查看单个样本来探索数据。为处理图像数据我们将使用包含OpenCV的Python包即cv2库你可以通过pip/pip3安装它 pip3 install opencv-python并加载图像数据 import cv2 image_data cv2.imread(fashion_mnist_images/train/7/0002.png, cv2.IMREAD_UNCHANGED) print(image_data)我们使用cv2.imread()读取图像其中第一个参数是图像的路径。参数cv2.IMREAD_UNCHANGED通知cv2包我们希望以图像保存时的格式读取它们在这种情况下是灰度图。默认情况下即使是灰度图OpenCV也会将这些图像转换为使用所有三个颜色通道。因此我们得到的是一个二维数组——灰度像素值。如果我们在打印之前使用以下代码行格式化这个杂乱的数组NumPy会知道打印更多的字符在一行中因为加载的图像是一个NumPy数组对象 import numpy as np np.set_printoptions(linewidth200)我们仍然可能能够识别出主题 [[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 49 135 182 150 59 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 78 255 220 212 219 255 246 191 155 87 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 57 206 215 203 191 203 212 216 217 220 211 15 0][ 0 0 0 0 0 0 0 0 0 0 1 0 0 0 58 231 220 210 199 209 218 218 217 208 200 215 56 0][ 0 0 0 0 1 2 0 0 4 0 0 0 0 145 213 207 199 187 203 210 216 217 215 215 206 215 130 0][ 0 0 0 0 1 2 4 0 0 0 3 105 225 205 190 201 210 214 213 215 215 212 211 208 205 207 218 0][ 1 5 7 0 0 0 0 0 52 162 217 189 174 157 187 198 202 217 220 223 224 222 217 211 217 201 247 65][ 0 0 0 0 0 0 21 72 185 189 171 171 185 203 200 207 208 209 214 219 222 222 224 215 218 211 212 148][ 0 70 114 129 145 159 179 196 172 176 185 196 199 206 201 210 212 213 216 218 219 217 212 207 208 200 198 173][ 0 122 158 184 194 192 193 196 203 209 211 211 215 218 221 222 226 227 227 226 226 223 222 216 211 208 216 185][ 21 0 0 12 48 82 123 152 170 184 195 211 225 232 233 237 242 242 240 240 238 236 222 209 200 193 185 106][ 26 47 54 18 5 0 0 0 0 0 0 0 0 0 2 4 6 9 9 8 9 6 6 4 2 0 0 0][ 0 10 27 45 55 59 57 50 44 51 58 62 65 56 54 57 59 61 60 63 68 67 66 73 77 74 65 39][ 0 0 0 0 4 9 18 23 26 25 23 25 29 37 38 37 39 36 29 31 33 34 28 24 20 14 7 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]在这种情况下这是一个运动鞋。与其通过格式化原始值来查看图像我们可以使用Matplotlib来可视化它。例如 import matplotlib.pyplot as plt plt.imshow(image_data) plt.show()我们可以检查另一个样本 import matplotlib.pyplot as plt image_data cv2.imread(fashion_mnist_images/train/4/0011.png, cv2.IMREAD_UNCHANGED) plt.imshow(image_data) plt.show()看起来像是一件夹克。如果我们查看之前的表格类别4是“外套”。你可能会对奇怪的颜色感到疑惑但这只是因为Matplotlib默认不期望灰度图像。我们可以在调用plt.imshow()时通过指定cmap颜色映射来通知Matplotlib这是灰度图像 import matplotlib.pyplot as plt image_data cv2.imread(fashion_mnist_images/train/4/0011.png, cv2.IMREAD_UNCHANGED) plt.imshow(image_data, cmapgray) plt.show()现在我们可以遍历所有样本将它们加载到输入数据 X X X和目标 y y y列表中。首先我们扫描训练文件夹正如之前提到的该文件夹包含从0到9命名的子文件夹这些子文件夹同时也充当样本标签。我们遍历这些文件夹及其中的图像将图像添加到一个列表变量命名为 X X X中并将其对应的标签添加到另一个列表变量命名为 y y y中从而形成我们的样本和真实标签目标标签 # Scan all the directories and create a list of labels labels os.listdir(fashion_mnist_images/train) # Create lists for samples and labels X [] y [] # For each label folder for label in labels:# And for each image in given folderfor file in os.listdir(os.path.join(fashion_mnist_images, train, label)):# Read the imageimage cv2.imread(os.path.join(fashion_mnist_images/train, label, file), cv2.IMREAD_UNCHANGED)# And append it and a label to the listsX.append(image)y.append(label)我们需要对测试数据和训练数据执行相同的操作。幸运的是它们已经为我们很好地分开了。很多时候你需要自己将数据分成训练组和测试组。我们将把上述代码转换为一个函数以避免为训练和测试目录重复编写代码。这个函数将接收一个数据集类型训练或测试作为参数以及这些数据集所在路径 import numpy as np import cv2 import os# Loads a MNIST dataset def load_mnist_dataset(dataset, path):# Scan all the directories and create a list of labelslabels os.listdir(os.path.join(path, dataset))# Create lists for samples and labelsX []y []# For each label folderfor label in labels:# And for each image in given folderfor file in os.listdir(os.path.join(path, dataset, label)):# Read the imageimage cv2.imread(os.path.join(path, dataset, label, file), cv2.IMREAD_UNCHANGED)# And append it and a label to the listsX.append(image)y.append(label)# Convert the data to proper numpy arrays and returnreturn np.array(X), np.array(y).astype(uint8)由于 X X X被定义为一个列表并且我们将以NumPy数组形式表示的图像添加到这个列表中因此我们会在最后调用np.array()将 X X X从列表转换为一个正式的NumPy数组。对于标签 y y y也会执行相同的操作因为它们是一个数字列表我们还需要告知NumPy这些标签是整数而非浮点数值。然后我们可以编写一个函数用于创建并返回我们的训练和测试数据 # MNIST dataset (train test) def create_data_mnist(path):# Load both sets separatelyX, y load_mnist_dataset(train, path)X_test, y_test load_mnist_dataset(test, path)# And return all the datareturn X, y, X_test, y_test到目前为止针对我们新数据的代码 import numpy as np import cv2 import os# Loads a MNIST dataset def load_mnist_dataset(dataset, path):# Scan all the directories and create a list of labelslabels os.listdir(os.path.join(path, dataset))# Create lists for samples and labelsX []y []# For each label folderfor label in labels:# And for each image in given folderfor file in os.listdir(os.path.join(path, dataset, label)):# Read the imageimage cv2.imread(os.path.join(path, dataset, label, file), cv2.IMREAD_UNCHANGED)# And append it and a label to the listsX.append(image)y.append(label)# Convert the data to proper numpy arrays and returnreturn np.array(X), np.array(y).astype(uint8)# MNIST dataset (train test) def create_data_mnist(path):# Load both sets separatelyX, y load_mnist_dataset(train, path)X_test, y_test load_mnist_dataset(test, path)# And return all the datareturn X, y, X_test, y_test有了这个函数我们就可以通过以下操作加载数据 # Create dataset X, y, X_test, y_test create_data_mnist(fashion_mnist_images)数据预处理接下来我们将对数据进行缩放不是对图像本身而是表示它们的数字。神经网络在数据范围为0到1或-1到1时通常表现最佳。在这里图像数据的范围是0到255。我们需要决定如何对这些数据进行缩放。通常这一过程需要一些实验和反复试验。例如我们可以将图像缩放到-1到1的范围通过对每个像素值减去所有像素值的最大值的一半即 255 / 2 127.5 255/2 127.5 255/2127.5然后除以这一半从而生成一个范围为-1到1的值。我们也可以通过简单地将数据除以255最大值将其缩放到0到1的范围。首先我们选择将数据缩放到-1到1的范围。在执行这一操作之前我们需要更改NumPy数组的数据类型当前的数据类型是uint8无符号整数范围为0到255的整数值。如果我们不更改NumPy会将其转换为float64数据类型而我们的目的是使用float3232位浮点值。可以通过在NumPy数组对象上调用.astype(np.float32)实现。标签将保持不变 # Create dataset X, y, X_test, y_test create_data_mnist(fashion_mnist_images) # Scale features X (X.astype(np.float32) - 127.5) / 127.5 X_test (X_test.astype(np.float32) - 127.5) / 127.5确保使用相同的方法对训练数据和测试数据进行缩放。稍后在进行预测时你还需要对用于推断的输入数据进行缩放。在不同的地方忘记对数据进行缩放是很常见的错误。同时你需要确保任何预处理操作例如缩放仅基于训练数据集的信息。在这个例子中我们知道最小值min和最大值max分别为0和255并执行了线性缩放。然而通常你需要首先查询数据集的最小值和最大值以用于缩放。如果你的数据集中存在极端异常值最小/最大值方法可能效果不佳。在这种情况下你可以使用平均值和标准差的某种组合来创建缩放方法。缩放时一个常见的错误是允许测试数据集影响对训练数据集所做的变换。对此规则唯一的例外是当数据以线性方式缩放例如通过提到的除以常数的方式。如果使用的是非线性缩放函数可能会将测试或验证数据的信息泄露到训练数据中。任何预处理规则都应该在不了解测试数据集的情况下得出但随后应用于测试数据集。例如你的整个数据集可能最小值为0最大值为125而训练数据集的最小值为0最大值为100。在这种情况下你仍然会使用100作为缩放测试数据集的值。这意味着你的测试数据集在缩放后可能不会完全适合-1到1的范围但这通常不是问题。如果差异较大你可以通过将数据线性缩放再除以某个数值来进行额外的缩放。回到我们的数据让我们检查一下数据是否已经缩放 print(X.min(), X.max())-1.0 1.0接下来我们检查输入数据的形状 print(X.shape)(60000, 28, 28)我们的Dense层处理的是一维向量的批量数据无法直接操作形状为28x28的二维数组图像。我们需要将这些28x28的图像“展平”这意味着将图像数组的每一行依次附加到数组的第一行从而将图像的二维数组转换为一维数组即向量。换句话说这可以看作是将二维数组中的数字展开为类似列表的形式。有一种叫做卷积神经网络的模型可以直接处理二维图像数据但像我们这里的全连接神经网络Dense网络需要一维的样本数据。即使在卷积神经网络中你通常也需要在将数据传递到输出层或Dense层之前对数据进行展平。在NumPy中展平数组可以使用reshape方法并将第一个维度设置为-1表示“根据实际元素数量决定”从而将所有元素放在第一维度中形成一个一维数组。以下是这种概念的一个示例 example np.array([[1,2],[3,4]]) flattened example.reshape(-1) print(example) print(example.shape) print(flattened) print(flattened.shape)[[1 2][3 4]] (2, 2) [1 2 3 4] (4,)我们也可以使用np.flatten()方法但当处理一批样本时我们的意图有所不同。在样本的情况下我们希望保留所有60,000个样本因此我们需要将训练数据的形状调整为(60000, -1)。这将通知NumPy我们希望保留60,000个样本第一维度但将其余的部分展平-1作为第二维度意味着我们希望将所有样本数据放入这个单一维度中形成一维数组。这将创建60,000个样本每个样本包含784个特征。这784个特征是28·28的结果。为此我们将分别使用训练数据X.shape[0]和测试数据X_test.shape[0]的样本数量并对它们进行reshape操作 # Reshape to vectors X X.reshape(X.shape[0], -1) X_test X_test.reshape(X_test.shape[0], -1)你也可以通过显式定义形状来实现相同的结果而不是依赖NumPy的推断 .reshape(X.shape[0], X.shape[1]*X.shape[2])这样会更明确但我们认为这样不太清晰。数据洗牌我们当前的数据集由样本及其目标分类组成按顺序从0到9排列。为了说明这一点我们可以在不同位置查询 y y y数据。前6000个样本的标签都将是0。例如 print(y[0:10])[0 0 0 0 0 0 0 0 0 0]如果我们稍后再进行查询 print(y[6000:6010])[1 1 1 1 1 1 1 1 1 1]如果我们按这种顺序训练网络会导致问题原因与数据集不平衡的问题类似。在训练前6,000个样本时模型会学到最快减少损失的方法是始终预测为0因为它会看到多个仅包含类别0的数据批次。然后在6,000到12,000之间损失会因为标签的变化而最初上升而模型仍然会错误地预测标签为0。随后模型可能会学到现在需要始终预测类别1因为它在优化过程中看到的标签批次全是类别1。模型会在当前批次中重复的标签附近循环于局部最小值并且很可能永远找不到全局最小值。这一过程会一直持续直到我们完成所有样本并重复我们选择的训练轮数epochs。理想情况下每次拟合的样本中应该包含多个类别最好每个类别都有一些以防止模型因为最近看到某个类别较多而对该类别产生偏向。因此我们通常会随机打乱数据。在之前的训练数据例如螺旋数据中我们不需要打乱数据因为我们是一次性对整个数据集进行训练而不是分批次训练。但对于这个更大的数据集我们是以批次进行训练的因此需要打乱数据因为目前数据是按每个标签的6,000个样本块顺序排列的。在打乱数据时我们需要确保样本数组和目标数组同步打乱否则标签将不再与样本匹配导致模型非常混乱在大多数情况下结果也非常错误。因此我们不能简单地分别对它们调用shuffle()方法。有许多方法可以实现这一点但我们的方法是获取所有的“键”在这里是样本和目标数组的索引然后对这些键进行打乱。在这个例子中这些键的值范围是从0到59999。 keys np.array(range(X.shape[0])) print(keys[:10])array([0 1 2 3 4 5 6 7 8 9])然后我们就可以对这些密钥进行洗牌 import nnfs nnfs.init() np.random.shuffle(keys) print(keys[:10])[ 3048 19563 58303 8870 40228 31488 21860 56864 845 25770]现在这基本上就是新的索引顺序我们可以通过操作来应用它 X X[keys] y y[keys]这告诉NumPy按照给定的索引返回对应的值就像我们通常对NumPy数组进行索引一样但这里我们使用的是一个包含随机顺序索引的变量。然后我们可以检查目标数据的一部分切片 print(y[:15])[0 3 9 1 6 5 3 9 0 4 8 9 0 6 6]它们似乎被洗过。我们也可以检查个别样本 import matplotlib.pyplot as plt plt.imshow((X[4].reshape(28, 28))) # Reshape as image is a vector already plt.show()洗牌之后随机的shirt图片然后我们就可以在同一索引下检查该类 print(y[4])6类别6确实是“衬衫”因此这些数据看起来已经正确地打乱了。你可以手动再检查一些数据以确保数据符合预期。如果模型无法训练或表现异常你需要仔细检查数据的预处理过程。批次Batches 到目前为止我们通过将整个数据集作为一个单一的“批次”传递给模型来训练我们的模型。我们在第2章中讨论过为什么一次处理多个样本是更优的但是否存在一个过大的批次大小呢我们的数据集足够小可以一次性传递整个数据集但真实世界中的数据集通常可能有TB或更多的规模这对于大多数计算机来说无法作为一个单一批次处理。批次是数据的一个固定大小的切片。当我们使用批次进行训练时我们一次以一个数据块或“批次”来迭代数据集依次执行前向传播、损失计算、反向传播和优化。如果数据已经被打乱并且每个批次足够大且能在一定程度上代表整个数据集那么可以合理地假设每个批次的梯度方向是朝向全局最小值的良好近似。如果批次太小梯度下降的方向可能会在不同批次之间波动过大导致模型训练耗时较长。常见的批次大小范围是32到128个样本。如果你的内存不足可以使用更小的批次如果想让训练更快可以使用更大的批次但通常这个范围是典型的批次大小范围。通过将批次大小从2增加到8或者从8增加到32通常可以看到准确率和损失的改善。然而继续增大批次大小时关于准确率和损失的提升会逐渐减少。此外与较小的批次相比使用大批次进行训练会变得更慢——就像我们之前使用螺旋数据的例子需要1万轮训练在神经网络中很多时候需要针对具体的数据和模型进行大量的试验和调整。例如假设我们选择批次大小为128并选择进行10轮训练。这意味着在每一轮训练中我们会遍历数据集每次拟合128个样本来训练模型。每次训练的批次被称为一个“步骤”。我们可以通过将样本数量除以批次大小来计算步骤的数量 steps X.shape[0] // BATCH_SIZE我们使用整数除法运算符//而不是浮点除法运算符/来返回整数因为步骤的数量不能包含小数部分。这是我们在每一轮训练中循环执行的迭代次数。如果有一些剩余的样本无法被整除我们可以通过简单地增加一步来将它们包含进去 if steps * BATCH_SIZE X.shape[0]:steps 1我们可以通过一个简单的例子来说明为什么要添加这个 1 batch_size 2 X [1, 2, 3, 4] print(len(X) // batch_size)2X [1, 2, 3, 4, 5] print(len(X) // batch_size)2整数除法会向下取整因此如果有剩余的样本我们会加1以形成包含余数的最后一个批次。以下是使用批次训练模型的代码示例 import nnfs from nnfs.datasets import spiral_datannfs.init()# Create dataset X, y spiral_data(samples100, classes3)EPOCHS 10 # Train 10 times BATCH_SIZE 128 # We take 128 samples at once# Calculate number of steps steps X.shape[0] // BATCH_SIZE # Dividing rounds down. If there are some remaining data, # but not a full batch, this wont include it. # Add 1 to include the remaining samples in 1 more step. if steps * BATCH_SIZE X.shape[0]:steps 1 for epoch in range(EPOCHS):for step in range(steps):batch_X X[step*BATCH_SIZE:(step1)*BATCH_SIZE]batch_y y[step*BATCH_SIZE:(step1)*BATCH_SIZE]# Now we perform forward pass, loss calculation,# backward pass and update parameters我们加载了数据集定义了训练轮数epochs和批次大小batch size然后计算了步骤的数量。接下来我们有两个循环——一个是遍历训练轮数的外循环另一个是遍历步骤的内循环。在每一轮的每一步中我们会选择训练数据的一个切片。现在我们已经知道如何以批次方式训练模型我们希望了解每一步和每一轮的训练损失和准确率。到目前为止我们仅计算了每次拟合的损失但要记住之前我们是一次性针对整个数据集进行拟合的。现在我们对批次统计和轮次统计都感兴趣。对于总体损失和准确率我们希望计算样本级别的平均值。为此我们将在每轮结束时累积所有批次的损失总和和样本数量并计算平均值。我们将在常见的Loss类的calculate方法中添加以下内容 # Add accumulated sum of losses and sample countself.accumulated_sum np.sum(sample_losses)self.accumulated_count len(sample_losses)这里不着急后面会有整体代码展示。在这里我们只需要懂得是什么意思就好。在Loss类中完整实现calculate方法 # Calculates the data and regularization losses# given model output and ground truth valuesdef calculate(self, output, y, *, include_regularizationFalse):# Calculate sample lossessample_losses self.forward(output, y)# Calculate mean lossdata_loss np.mean(sample_losses)# Add accumulated sum of losses and sample countself.accumulated_sum np.sum(sample_losses)self.accumulated_count len(sample_losses)# If just data loss - return itif not include_regularization:return data_loss# Return the data and regularization lossesreturn data_loss, self.regularization_loss() 我们保存了总和和计数以便可以随时计算平均值。为此我们将在Loss类中添加一个名为calculate_accumulated的新方法 # Calculates accumulated lossdef calculate_accumulated(self, *, include_regularizationFalse):# Calculate mean lossdata_loss self.accumulated_sum / self.accumulated_count# If just data loss - return itif not include_regularization:return data_loss# Return the data and regularization lossesreturn data_loss, self.regularization_loss()此方法还可以在include_regularization设置为True时返回正则化损失。正则化损失不需要累积因为它是根据当前的层参数状态计算的在调用时直接生成。我们将在训练过程中使用这一功能但在评估和预测时不会使用我们稍后会对此进行更详细的讨论。最后为了在新一轮训练中重置总和和计数值我们将添加最后一个方法 # Reset variables for accumulated lossdef new_pass(self):self.accumulated_sum 0self.accumulated_count 0完整实现我们的通用Loss类 # Common loss class class Loss:# Regularization loss calculationdef regularization_loss(self): # 0 by defaultregularization_loss 0# Calculate regularization loss# iterate all trainable layersfor layer in self.trainable_layers:# L1 regularization - weights# calculate only when factor greater than 0if layer.weight_regularizer_l1 0:regularization_loss layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))# L2 regularization - weightsif layer.weight_regularizer_l2 0:regularization_loss layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)# L1 regularization - biases# calculate only when factor greater than 0if layer.bias_regularizer_l1 0:regularization_loss layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))# L2 regularization - biasesif layer.bias_regularizer_l2 0:regularization_loss layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)return regularization_loss# Set/remember trainable layersdef remember_trainable_layers(self, trainable_layers):self.trainable_layers trainable_layers# Calculates the data and regularization losses# given model output and ground truth valuesdef calculate(self, output, y, *, include_regularizationFalse):# Calculate sample lossessample_losses self.forward(output, y)# Calculate mean lossdata_loss np.mean(sample_losses)# Add accumulated sum of losses and sample countself.accumulated_sum np.sum(sample_losses)self.accumulated_count len(sample_losses)# If just data loss - return itif not include_regularization:return data_loss# Return the data and regularization lossesreturn data_loss, self.regularization_loss() # Calculates accumulated lossdef calculate_accumulated(self, *, include_regularizationFalse):# Calculate mean lossdata_loss self.accumulated_sum / self.accumulated_count# If just data loss - return itif not include_regularization:return data_loss# Return the data and regularization lossesreturn data_loss, self.regularization_loss()# Reset variables for accumulated lossdef new_pass(self):self.accumulated_sum 0self.accumulated_count 0现在我们需要为Accuracy类实现相同的功能 # Common accuracy class class Accuracy:# Calculates an accuracy# given predictions and ground truth valuesdef calculate(self, predictions, y):# Get comparison resultscomparisons self.compare(predictions, y)# Calculate an accuracyaccuracy np.mean(comparisons)# Add accumulated sum of matching values and sample countself.accumulated_sum np.sum(comparisons)self.accumulated_count len(comparisons)# Return accuracyreturn accuracy # Calculates accumulated accuracydef calculate_accumulated(self):# Calculate an accuracyaccuracy self.accumulated_sum / self.accumulated_count# Return the data and regularization lossesreturn accuracy# Reset variables for accumulated accuracydef new_pass(self):self.accumulated_sum 0self.accumulated_count 0在这里我们在calculate方法中添加了设置accumulated_sum和accumulated_count属性的功能用于计算每轮训练的准确率添加了一个新的calculate_accumulated方法来返回累积的准确率最后添加了一个new_pass方法用于在每轮训练开始时重置accumulated_sum和accumulated_count的值。现在我们将修改Model类的train方法。首先我们将添加一个名为batch_size的新参数这里不着急后面会有整体代码展示。在这里我们只需要懂得是什么意思就好。 def train(self, X, y, *, epochs1, batch_sizeNone, print_every1, validation_dataNone):我们将默认将此参数设置为None这意味着使用整个数据集作为一个批次。在这种情况下每轮训练只需要一步这一步包括将所有数据一次性传递通过网络。 # Default value if batch size is not settrain_steps 1# If there is validation data passed,# set default number of steps for validation as wellif validation_data is not None:validation_steps 1# For better readabilityX_val, y_val validation_data如前所述大多数“现实生活”中的数据集需要的批次大小小于所有样本的数量。我们将使用之前描述的方法来处理这一点对样本总数进行整数除法得到批次数量并最终加1以包括未组成完整批次的剩余样本我们将对训练数据和验证数据都这样处理 # Calculate number of stepsif batch_size is not None:train_steps len(X) // batch_size# Dividing rounds down. If there are some remaining# data, but not a full batch, this wont include it# Add 1 to include this not full batchif train_steps * batch_size len(X):train_steps 1if validation_data is not None:validation_steps len(X_val) // batch_size# Dividing rounds down. If there are some remaining# data, but nor full batch, this wont include it# Add 1 to include this not full batchif validation_steps * batch_size len(X_val):validation_steps 1接下来从顶部开始我们将修改遍历训练轮数的循环打印轮数编号并重置累积的轮次损失和准确率值。然后在其中添加一个新循环用于遍历当前轮次中的各个步骤。 # Main training loopfor epoch in range(1, epochs1):# Print epoch numberprint(fepoch: {epoch})# Reset accumulated values in loss and accuracy objectsself.loss.new_pass()self.accuracy.new_pass()# Iterate over stepsfor step in range(train_steps):在每一步中我们需要获取用于训练的数据批次——如果batch_size参数仍然是默认值None则使用整个数据集否则获取大小为batch_size的切片数据 # If batch size is not set -# train using one step and full datasetif batch_size is None:batch_X Xbatch_y y# Otherwise slice a batchelse:batch_X X[step*batch_size:(step1)*batch_size]batch_y y[step*batch_size:(step1)*batch_size]对于每个批次我们进行拟合并打印信息类似于之前按轮次拟合的方式。不同之处在于现在使用的是batch_X而不是X以及batch_y而不是y。另一个变化是用于打印摘要的if语句现在是基于步骤而不是轮次 # Perform the forward passoutput self.forward(batch_X, trainingTrue)# Calculate lossdata_loss, regularization_loss self.loss.calculate(output, batch_y,include_regularizationTrue)loss data_loss regularization_loss# Get predictions and calculate an accuracypredictions self.output_layer_activation.predictions(output)accuracy self.accuracy.calculate(predictions, batch_y)# Perform backward passself.backward(output, batch_y)# Optimize (update parameters)self.optimizer.pre_update_params()for layer in self.trainable_layers:self.optimizer.update_params(layer)self.optimizer.post_update_params()# Print a summaryif not step % print_every or step train_steps - 1:print(fstep: {step}, facc: {accuracy:.3f}, floss: {loss:.3f} ( fdata_loss: {data_loss:.3f}, freg_loss: {regularization_loss:.3f}), flr: {self.optimizer.current_learning_rate})然后我们需要打印每轮的准确率和损失等信息 # Get and print epoch loss and accuracyepoch_data_loss, epoch_regularization_loss self.loss.calculate_accumulated(include_regularizationTrue)epoch_loss epoch_data_loss epoch_regularization_lossepoch_accuracy self.accuracy.calculate_accumulated()print(ftraining, facc: {epoch_accuracy:.3f}, floss: {epoch_loss:.3f} ( fdata_loss: {epoch_data_loss:.3f}, freg_loss: {epoch_regularization_loss:.3f}), flr: {self.optimizer.current_learning_rate})如果设置了批次大小验证数据很可能大于这个批次大小因此我们也需要为验证数据添加批处理功能 # If there is the validation dataif validation_data is not None:# Reset accumulated values in loss# and accuracy objectsself.loss.new_pass()self.accuracy.new_pass()# Iterate over stepsfor step in range(validation_steps):# If batch size is not set -# train using one step and full datasetif batch_size is None:batch_X X_valbatch_y y_val# Otherwise slice a batchelse:batch_X X_val[step*batch_size:(step1)*batch_size]batch_y y_val[step*batch_size:(step1)*batch_size]# Perform the forward passoutput self.forward(batch_X, trainingFalse)# Calculate the lossself.loss.calculate(output, batch_y)# Get predictions and calculate an accuracypredictions self.output_layer_activation.predictions(output)self.accuracy.calculate(predictions, batch_y)# Get and print validation loss and accuracyvalidation_loss self.loss.calculate_accumulated()validation_accuracy self.accuracy.calculate_accumulated()print(fvalidation, facc: {validation_accuracy:.3f}, floss: {validation_loss:.3f})与当前代码相比我们添加了对损失和准确率对象的new_pass方法的调用该方法会重置训练步骤中累积的值。接着我们引入了批次处理一个遍历步骤的循环并移除了对损失计算结果的捕获在验证过程中我们只关注最终的整体损失而不是每个批次的损失。最后的步骤是处理整体验证损失并将X_val替换为batch_X将y_val替换为batch_y以匹配对训练代码所做的更改。这构成了我们Model类完整的train方法 # Train the model# def train(self, X, y, *, epochs1, print_every1, validation_dataNone):def train(self, X, y, *, epochs1, batch_sizeNone, print_every1, validation_dataNone):# Initialize accuracy objectself.accuracy.init(y)# Default value if batch size is not being settrain_steps 1# If there is validation data passed,# set default number of steps for validation as wellif validation_data is not None:validation_steps 1# For better readabilityX_val, y_val validation_data# Calculate number of stepsif batch_size is not None:train_steps len(X) // batch_size# Dividing rounds down. If there are some remaining# data, but not a full batch, this wont include it# Add 1 to include this not full batchif train_steps * batch_size len(X):train_steps 1if validation_data is not None:validation_steps len(X_val) // batch_size# Dividing rounds down. If there are some remaining# data, but nor full batch, this wont include it# Add 1 to include this not full batchif validation_steps * batch_size len(X_val):validation_steps 1# Main training loopfor epoch in range(1, epochs1):# Print epoch numberprint(fepoch: {epoch})# Reset accumulated values in loss and accuracy objectsself.loss.new_pass()self.accuracy.new_pass()# Iterate over stepsfor step in range(train_steps):# If batch size is not set -# train using one step and full datasetif batch_size is None:batch_X Xbatch_y y# Otherwise slice a batchelse:batch_X X[step*batch_size:(step1)*batch_size]batch_y y[step*batch_size:(step1)*batch_size]# Perform the forward passoutput self.forward(batch_X, trainingTrue)# Calculate lossdata_loss, regularization_loss self.loss.calculate(output, batch_y, include_regularizationTrue)loss data_loss regularization_loss# Get predictions and calculate an accuracypredictions self.output_layer_activation.predictions(output)accuracy self.accuracy.calculate(predictions, batch_y)# Perform backward passself.backward(output, batch_y)# Optimize (update parameters)self.optimizer.pre_update_params()for layer in self.trainable_layers:self.optimizer.update_params(layer)self.optimizer.post_update_params()# Print a summaryif not step % print_every or step train_steps - 1:print(fstep: {step}, facc: {accuracy:.3f}, floss: {loss:.3f} ( fdata_loss: {data_loss:.3f}, freg_loss: {regularization_loss:.3f}), flr: {self.optimizer.current_learning_rate})# Get and print epoch loss and accuracyepoch_data_loss, epoch_regularization_loss self.loss.calculate_accumulated(include_regularizationTrue)epoch_loss epoch_data_loss epoch_regularization_lossepoch_accuracy self.accuracy.calculate_accumulated()print(ftraining, facc: {epoch_accuracy:.3f}, floss: {epoch_loss:.3f} ( fdata_loss: {epoch_data_loss:.3f}, freg_loss: {epoch_regularization_loss:.3f}), flr: {self.optimizer.current_learning_rate})# If there is the validation dataif validation_data is not None:# Reset accumulated values in loss# and accuracy objectsself.loss.new_pass()self.accuracy.new_pass()# Iterate over stepsfor step in range(validation_steps):# If batch size is not set -# train using one step and full datasetif batch_size is None:batch_X X_valbatch_y y_val# Otherwise slice a batchelse:batch_X X_val[step*batch_size:(step1)*batch_size]batch_y y_val[step*batch_size:(step1)*batch_size]# Perform the forward passoutput self.forward(batch_X, trainingFalse)# Calculate the lossself.loss.calculate(output, batch_y)# Get predictions and calculate an accuracypredictions self.output_layer_activation.predictions(output)self.accuracy.calculate(predictions, batch_y)# Get and print validation loss and accuracyvalidation_loss self.loss.calculate_accumulated()validation_accuracy self.accuracy.calculate_accumulated()# Print a summaryprint(fvalidation, facc: {validation_accuracy:.3f}, floss: {validation_loss:.3f})训练Training 此时我们已经准备好使用批次和新数据集进行训练。提醒一下我们通过以下方式创建数据 # Create dataset X, y, X_test, y_test create_data_mnist(fashion_mnist_images)然后洗牌shuffle # Shuffle the training dataset keys np.array(range(X.shape[0])) np.random.shuffle(keys) X X[keys] y y[keys]然后平移样本缩放至 -1 至 1 的范围 # Scale and reshape samples X (X.reshape(X.shape[0], -1).astype(np.float32) - 127.5) / 127.5 X_test (X_test.reshape(X_test.shape[0], -1).astype(np.float32) - 127.5) / 127.5然后构建我们的模型包括两个使用ReLU激活的隐藏层一个使用softmax激活的输出层因为我们正在构建一个分类模型交叉熵损失函数Adam优化器以及分类准确率 # Instantiate the model model Model()# Add layers model.add(Layer_Dense(X.shape[1], 64)) model.add(Activation_ReLU()) model.add(Layer_Dense(64, 64)) model.add(Activation_ReLU()) model.add(Layer_Dense(64, 10)) model.add(Activation_Softmax())设置损耗、优化器和精度对象 # Set loss, optimizer and accuracy objects model.set(lossLoss_CategoricalCrossentropy(),optimizerOptimizer_Adam(decay5e-5),accuracyAccuracy_Categorical())最后我们最后确定并进行培训 # Finalize the model model.finalize()# Train the model model.train(X, y, validation_data(X_test, y_test), epochs5, batch_size128, print_every100)epoch: 1 step: 0, acc: 0.078, loss: 2.473 (data_loss: 2.473, reg_loss: 0.000), lr: 0.001 step: 100, acc: 0.766, loss: 0.527 (data_loss: 0.527, reg_loss: 0.000), lr: 0.0009950248756218907 step: 200, acc: 0.852, loss: 0.417 (data_loss: 0.417, reg_loss: 0.000), lr: 0.0009900990099009901 step: 300, acc: 0.797, loss: 0.511 (data_loss: 0.511, reg_loss: 0.000), lr: 0.0009852216748768474 step: 400, acc: 0.828, loss: 0.434 (data_loss: 0.434, reg_loss: 0.000), lr: 0.000980392156862745 step: 468, acc: 0.865, loss: 0.305 (data_loss: 0.305, reg_loss: 0.000), lr: 0.0009771350400625367 training, acc: 0.797, loss: 0.568 (data_loss: 0.568, reg_loss: 0.000), lr: 0.0009771350400625367 validation, acc: 0.843, loss: 0.445 epoch: 2 step: 0, acc: 0.859, loss: 0.377 (data_loss: 0.377, reg_loss: 0.000), lr: 0.0009770873027505008 step: 100, acc: 0.820, loss: 0.434 (data_loss: 0.434, reg_loss: 0.000), lr: 0.000972337012008362 step: 200, acc: 0.867, loss: 0.318 (data_loss: 0.318, reg_loss: 0.000), lr: 0.0009676326866321544 step: 300, acc: 0.867, loss: 0.430 (data_loss: 0.430, reg_loss: 0.000), lr: 0.0009629736626703259 step: 400, acc: 0.836, loss: 0.398 (data_loss: 0.398, reg_loss: 0.000), lr: 0.0009583592888974076 step: 468, acc: 0.906, loss: 0.248 (data_loss: 0.248, reg_loss: 0.000), lr: 0.0009552466924583273 training, acc: 0.857, loss: 0.397 (data_loss: 0.397, reg_loss: 0.000), lr: 0.0009552466924583273 validation, acc: 0.856, loss: 0.400 epoch: 3 step: 0, acc: 0.883, loss: 0.317 (data_loss: 0.317, reg_loss: 0.000), lr: 0.0009552010698251983 step: 100, acc: 0.852, loss: 0.355 (data_loss: 0.355, reg_loss: 0.000), lr: 0.0009506607091928891 step: 200, acc: 0.883, loss: 0.288 (data_loss: 0.288, reg_loss: 0.000), lr: 0.0009461633077869241 step: 300, acc: 0.898, loss: 0.391 (data_loss: 0.391, reg_loss: 0.000), lr: 0.0009417082587814295 step: 400, acc: 0.859, loss: 0.367 (data_loss: 0.367, reg_loss: 0.000), lr: 0.0009372949667260287 step: 468, acc: 0.906, loss: 0.219 (data_loss: 0.219, reg_loss: 0.000), lr: 0.000934317481080071 training, acc: 0.871, loss: 0.355 (data_loss: 0.355, reg_loss: 0.000), lr: 0.000934317481080071 validation, acc: 0.863, loss: 0.380 epoch: 4 step: 0, acc: 0.867, loss: 0.294 (data_loss: 0.294, reg_loss: 0.000), lr: 0.0009342738356612324 step: 100, acc: 0.852, loss: 0.323 (data_loss: 0.323, reg_loss: 0.000), lr: 0.0009299297903008323 step: 200, acc: 0.883, loss: 0.271 (data_loss: 0.271, reg_loss: 0.000), lr: 0.0009256259545517657 step: 300, acc: 0.914, loss: 0.369 (data_loss: 0.369, reg_loss: 0.000), lr: 0.0009213617727000506 step: 400, acc: 0.875, loss: 0.349 (data_loss: 0.349, reg_loss: 0.000), lr: 0.0009171366992250195 step: 468, acc: 0.906, loss: 0.197 (data_loss: 0.197, reg_loss: 0.000), lr: 0.0009142857142857143 training, acc: 0.878, loss: 0.331 (data_loss: 0.331, reg_loss: 0.000), lr: 0.0009142857142857143 validation, acc: 0.867, loss: 0.368 epoch: 5 step: 0, acc: 0.867, loss: 0.278 (data_loss: 0.278, reg_loss: 0.000), lr: 0.0009142439202779302 step: 100, acc: 0.867, loss: 0.297 (data_loss: 0.297, reg_loss: 0.000), lr: 0.0009100837277029487 step: 200, acc: 0.891, loss: 0.258 (data_loss: 0.258, reg_loss: 0.000), lr: 0.0009059612248595759 step: 300, acc: 0.891, loss: 0.340 (data_loss: 0.340, reg_loss: 0.000), lr: 0.0009018759018759019 step: 400, acc: 0.875, loss: 0.335 (data_loss: 0.335, reg_loss: 0.000), lr: 0.0008978272580355541 step: 468, acc: 0.917, loss: 0.186 (data_loss: 0.186, reg_loss: 0.000), lr: 0.0008950948800572861 training, acc: 0.885, loss: 0.312 (data_loss: 0.312, reg_loss: 0.000), lr: 0.0008950948800572861 validation, acc: 0.871, loss: 0.359模型成功训练并取得了相当不错的准确率。这是在一个新的、真实的、更具挑战性的数据集上完成的并且仅用了5个训练轮次而不是之前螺旋数据的10000轮。同时训练速度也比我们之前对螺旋数据一次性拟合整个数据集时更快。到目前为止我们仅提到打乱训练数据的重要性以及如果尝试在未打乱的数据上进行训练可能会发生什么。现在是一个很好的时机来示例当我们不打乱数据时会发生什么。我们可以将打乱数据的代码注释掉 # # Shuffle the training dataset # keys np.array(range(X.shape[0])) # np.random.shuffle(keys) # X X[keys] # y y[keys]再次运行我们可以看到 epoch: 1 step: 0, acc: 0.000, loss: 2.320 (data_loss: 2.320, reg_loss: 0.000), lr: 0.001 step: 100, acc: 0.000, loss: 3.763 (data_loss: 3.763, reg_loss: 0.000), lr: 0.0009950248756218907 step: 200, acc: 0.000, loss: 2.677 (data_loss: 2.677, reg_loss: 0.000), lr: 0.0009900990099009901 step: 300, acc: 1.000, loss: 0.421 (data_loss: 0.421, reg_loss: 0.000), lr: 0.0009852216748768474 step: 400, acc: 1.000, loss: 0.023 (data_loss: 0.023, reg_loss: 0.000), lr: 0.000980392156862745 step: 468, acc: 1.000, loss: 0.004 (data_loss: 0.004, reg_loss: 0.000), lr: 0.0009771350400625367 training, acc: 0.657, loss: 1.930 (data_loss: 1.930, reg_loss: 0.000), lr: 0.0009771350400625367 validation, acc: 0.109, loss: 5.800 epoch: 2 step: 0, acc: 0.000, loss: 3.527 (data_loss: 3.527, reg_loss: 0.000), lr: 0.0009770873027505008 step: 100, acc: 0.000, loss: 3.722 (data_loss: 3.722, reg_loss: 0.000), lr: 0.000972337012008362 step: 200, acc: 0.531, loss: 1.189 (data_loss: 1.189, reg_loss: 0.000), lr: 0.0009676326866321544 step: 300, acc: 0.961, loss: 0.504 (data_loss: 0.504, reg_loss: 0.000), lr: 0.0009629736626703259 step: 400, acc: 0.984, loss: 0.063 (data_loss: 0.063, reg_loss: 0.000), lr: 0.0009583592888974076 step: 468, acc: 1.000, loss: 0.004 (data_loss: 0.004, reg_loss: 0.000), lr: 0.0009552466924583273 training, acc: 0.746, loss: 1.066 (data_loss: 1.066, reg_loss: 0.000), lr: 0.0009552466924583273 validation, acc: 0.110, loss: 5.365 epoch: 3 step: 0, acc: 0.000, loss: 4.172 (data_loss: 4.172, reg_loss: 0.000), lr: 0.0009552010698251983 step: 100, acc: 0.336, loss: 1.288 (data_loss: 1.288, reg_loss: 0.000), lr: 0.0009506607091928891 step: 200, acc: 0.680, loss: 1.366 (data_loss: 1.366, reg_loss: 0.000), lr: 0.0009461633077869241 step: 300, acc: 1.000, loss: 0.017 (data_loss: 0.017, reg_loss: 0.000), lr: 0.0009417082587814295 step: 400, acc: 0.984, loss: 0.128 (data_loss: 0.128, reg_loss: 0.000), lr: 0.0009372949667260287 step: 468, acc: 1.000, loss: 0.001 (data_loss: 0.001, reg_loss: 0.000), lr: 0.000934317481080071 training, acc: 0.782, loss: 0.838 (data_loss: 0.838, reg_loss: 0.000), lr: 0.000934317481080071 validation, acc: 0.209, loss: 4.189 epoch: 4 step: 0, acc: 0.000, loss: 2.829 (data_loss: 2.829, reg_loss: 0.000), lr: 0.0009342738356612324 step: 100, acc: 0.031, loss: 2.136 (data_loss: 2.136, reg_loss: 0.000), lr: 0.0009299297903008323 step: 200, acc: 0.609, loss: 1.109 (data_loss: 1.109, reg_loss: 0.000), lr: 0.0009256259545517657 step: 300, acc: 0.984, loss: 0.097 (data_loss: 0.097, reg_loss: 0.000), lr: 0.0009213617727000506 step: 400, acc: 0.984, loss: 0.034 (data_loss: 0.034, reg_loss: 0.000), lr: 0.0009171366992250195 step: 468, acc: 1.000, loss: 0.000 (data_loss: 0.000, reg_loss: 0.000), lr: 0.0009142857142857143 training, acc: 0.813, loss: 0.719 (data_loss: 0.719, reg_loss: 0.000), lr: 0.0009142857142857143 validation, acc: 0.164, loss: 7.111 epoch: 5 step: 0, acc: 0.000, loss: 5.931 (data_loss: 5.931, reg_loss: 0.000), lr: 0.0009142439202779302 step: 100, acc: 0.781, loss: 0.784 (data_loss: 0.784, reg_loss: 0.000), lr: 0.0009100837277029487 step: 200, acc: 0.750, loss: 0.808 (data_loss: 0.808, reg_loss: 0.000), lr: 0.0009059612248595759 step: 300, acc: 0.984, loss: 0.133 (data_loss: 0.133, reg_loss: 0.000), lr: 0.0009018759018759019 step: 400, acc: 0.961, loss: 0.091 (data_loss: 0.091, reg_loss: 0.000), lr: 0.0008978272580355541 step: 468, acc: 1.000, loss: 0.002 (data_loss: 0.002, reg_loss: 0.000), lr: 0.0008950948800572861 training, acc: 0.860, loss: 0.544 (data_loss: 0.544, reg_loss: 0.000), lr: 0.0008950948800572861 validation, acc: 0.224, loss: 4.844正如我们所见这种方式效果非常差。我们可以观察到模型在训练过程中接近完美的准确率1但轮次准确率仍然很差验证准确率证明模型并未真正学习。训练准确率快速升高因为模型学会了只预测一个标签因为它反复只看到那个标签。当训练数据中的标签改变时模型很快学会了只预测新的标签因为它在每个批次中看到的全是那个标签。这一过程在一个轮次结束后重复在所有轮次中循环。轮次准确率较低因为模型在切换标签后需要一段时间来学习新标签在此期间准确率较低。验证准确率是在特定轮次的训练结束后计算的正如我们所记得的模型只学会预测一个标签。在验证过程中模型预测的标签是它最后看到的标签——准确率接近 1 / 10 1/10 1/10因为我们的训练数据集包含10个类别。重新启用数据打乱功能然后可以调整模型看看是否能进一步改进结果。以下是一个示例使用更大的模型、更高的学习率衰减以及两倍的训练轮次 # Add layers model.add(Layer_Dense(X.shape[1], 128)) model.add(Activation_ReLU()) model.add(Layer_Dense(128, 128)) model.add(Activation_ReLU()) model.add(Layer_Dense(128, 10)) model.add(Activation_Softmax())# Set loss, optimizer and accuracy objects model.set(lossLoss_CategoricalCrossentropy(),optimizerOptimizer_Adam(decay1e-3),accuracyAccuracy_Categorical())# Finalize the model model.finalize()# Train the model model.train(X, y, validation_data(X_test, y_test), epochs10, batch_size128, print_every100)epoch: 1 step: 0, acc: 0.031, loss: 3.608 (data_loss: 3.608, reg_loss: 0.000), lr: 0.001 step: 100, acc: 0.773, loss: 0.542 (data_loss: 0.542, reg_loss: 0.000), lr: 0.0009090909090909091 step: 200, acc: 0.883, loss: 0.325 (data_loss: 0.325, reg_loss: 0.000), lr: 0.0008333333333333334 step: 300, acc: 0.875, loss: 0.439 (data_loss: 0.439, reg_loss: 0.000), lr: 0.0007692307692307692 step: 400, acc: 0.836, loss: 0.422 (data_loss: 0.422, reg_loss: 0.000), lr: 0.0007142857142857143 step: 468, acc: 0.875, loss: 0.291 (data_loss: 0.291, reg_loss: 0.000), lr: 0.000681198910081744 training, acc: 0.813, loss: 0.519 (data_loss: 0.519, reg_loss: 0.000), lr: 0.000681198910081744 validation, acc: 0.843, loss: 0.429 epoch: 2 step: 0, acc: 0.836, loss: 0.373 (data_loss: 0.373, reg_loss: 0.000), lr: 0.0006807351940095304 step: 100, acc: 0.812, loss: 0.387 (data_loss: 0.387, reg_loss: 0.000), lr: 0.0006373486297004461 step: 200, acc: 0.867, loss: 0.285 (data_loss: 0.285, reg_loss: 0.000), lr: 0.0005991611743559018 step: 300, acc: 0.891, loss: 0.384 (data_loss: 0.384, reg_loss: 0.000), lr: 0.0005652911249293386 step: 400, acc: 0.844, loss: 0.381 (data_loss: 0.381, reg_loss: 0.000), lr: 0.0005350454788657037 step: 468, acc: 0.896, loss: 0.214 (data_loss: 0.214, reg_loss: 0.000), lr: 0.0005162622612287042 training, acc: 0.866, loss: 0.370 (data_loss: 0.370, reg_loss: 0.000), lr: 0.0005162622612287042 validation, acc: 0.859, loss: 0.384 epoch: 3 step: 0, acc: 0.844, loss: 0.330 (data_loss: 0.330, reg_loss: 0.000), lr: 0.0005159958720330237 step: 100, acc: 0.852, loss: 0.328 (data_loss: 0.328, reg_loss: 0.000), lr: 0.0004906771344455348 step: 200, acc: 0.898, loss: 0.251 (data_loss: 0.251, reg_loss: 0.000), lr: 0.0004677268475210477 step: 300, acc: 0.883, loss: 0.345 (data_loss: 0.345, reg_loss: 0.000), lr: 0.00044682752457551384 step: 400, acc: 0.859, loss: 0.352 (data_loss: 0.352, reg_loss: 0.000), lr: 0.00042771599657827206 step: 468, acc: 0.917, loss: 0.185 (data_loss: 0.185, reg_loss: 0.000), lr: 0.0004156275976724854 training, acc: 0.880, loss: 0.329 (data_loss: 0.329, reg_loss: 0.000), lr: 0.0004156275976724854 validation, acc: 0.866, loss: 0.364 epoch: 4 step: 0, acc: 0.852, loss: 0.302 (data_loss: 0.302, reg_loss: 0.000), lr: 0.0004154549231408392 step: 100, acc: 0.875, loss: 0.278 (data_loss: 0.278, reg_loss: 0.000), lr: 0.00039888312724371757 step: 200, acc: 0.930, loss: 0.232 (data_loss: 0.232, reg_loss: 0.000), lr: 0.0003835826620636747 step: 300, acc: 0.898, loss: 0.310 (data_loss: 0.310, reg_loss: 0.000), lr: 0.0003694126339120798 step: 400, acc: 0.867, loss: 0.336 (data_loss: 0.336, reg_loss: 0.000), lr: 0.0003562522265764161 step: 468, acc: 0.917, loss: 0.177 (data_loss: 0.177, reg_loss: 0.000), lr: 0.00034782608695652176 training, acc: 0.890, loss: 0.304 (data_loss: 0.304, reg_loss: 0.000), lr: 0.00034782608695652176 validation, acc: 0.872, loss: 0.352 epoch: 5 step: 0, acc: 0.883, loss: 0.274 (data_loss: 0.274, reg_loss: 0.000), lr: 0.0003477051460361613 step: 100, acc: 0.898, loss: 0.256 (data_loss: 0.256, reg_loss: 0.000), lr: 0.00033602150537634406 step: 200, acc: 0.922, loss: 0.220 (data_loss: 0.220, reg_loss: 0.000), lr: 0.00032509752925877764 step: 300, acc: 0.914, loss: 0.283 (data_loss: 0.283, reg_loss: 0.000), lr: 0.00031486146095717883 step: 400, acc: 0.867, loss: 0.329 (data_loss: 0.329, reg_loss: 0.000), lr: 0.00030525030525030525 step: 468, acc: 0.917, loss: 0.171 (data_loss: 0.171, reg_loss: 0.000), lr: 0.0002990430622009569 training, acc: 0.896, loss: 0.287 (data_loss: 0.287, reg_loss: 0.000), lr: 0.0002990430622009569 validation, acc: 0.874, loss: 0.347 epoch: 6 step: 0, acc: 0.891, loss: 0.254 (data_loss: 0.254, reg_loss: 0.000), lr: 0.0002989536621823617 step: 100, acc: 0.914, loss: 0.241 (data_loss: 0.241, reg_loss: 0.000), lr: 0.00029027576197387516 step: 200, acc: 0.914, loss: 0.209 (data_loss: 0.209, reg_loss: 0.000), lr: 0.0002820874471086037 step: 300, acc: 0.922, loss: 0.267 (data_loss: 0.267, reg_loss: 0.000), lr: 0.00027434842249657066 step: 400, acc: 0.883, loss: 0.321 (data_loss: 0.321, reg_loss: 0.000), lr: 0.000267022696929239 step: 468, acc: 0.927, loss: 0.163 (data_loss: 0.163, reg_loss: 0.000), lr: 0.00026226068712300026 training, acc: 0.901, loss: 0.273 (data_loss: 0.273, reg_loss: 0.000), lr: 0.00026226068712300026 validation, acc: 0.877, loss: 0.343 epoch: 7 step: 0, acc: 0.898, loss: 0.237 (data_loss: 0.237, reg_loss: 0.000), lr: 0.00026219192448872575 step: 100, acc: 0.922, loss: 0.225 (data_loss: 0.225, reg_loss: 0.000), lr: 0.00025549310168625444 step: 200, acc: 0.930, loss: 0.201 (data_loss: 0.201, reg_loss: 0.000), lr: 0.00024912805181863477 step: 300, acc: 0.922, loss: 0.259 (data_loss: 0.259, reg_loss: 0.000), lr: 0.0002430724355858046 step: 400, acc: 0.883, loss: 0.311 (data_loss: 0.311, reg_loss: 0.000), lr: 0.00023730422401518745 step: 468, acc: 0.927, loss: 0.159 (data_loss: 0.159, reg_loss: 0.000), lr: 0.00023353573096683791 training, acc: 0.906, loss: 0.262 (data_loss: 0.262, reg_loss: 0.000), lr: 0.00023353573096683791 validation, acc: 0.878, loss: 0.340 epoch: 8 step: 0, acc: 0.906, loss: 0.224 (data_loss: 0.224, reg_loss: 0.000), lr: 0.00023348120476301658 step: 100, acc: 0.906, loss: 0.214 (data_loss: 0.214, reg_loss: 0.000), lr: 0.00022815423226100847 step: 200, acc: 0.930, loss: 0.191 (data_loss: 0.191, reg_loss: 0.000), lr: 0.0002230649118893598 step: 300, acc: 0.922, loss: 0.253 (data_loss: 0.253, reg_loss: 0.000), lr: 0.00021819768710451667 step: 400, acc: 0.898, loss: 0.307 (data_loss: 0.307, reg_loss: 0.000), lr: 0.00021353833013025838 step: 468, acc: 0.927, loss: 0.156 (data_loss: 0.156, reg_loss: 0.000), lr: 0.00021048200378867611 training, acc: 0.909, loss: 0.252 (data_loss: 0.252, reg_loss: 0.000), lr: 0.00021048200378867611 validation, acc: 0.878, loss: 0.336 epoch: 9 step: 0, acc: 0.906, loss: 0.209 (data_loss: 0.209, reg_loss: 0.000), lr: 0.0002104377104377104 step: 100, acc: 0.922, loss: 0.202 (data_loss: 0.202, reg_loss: 0.000), lr: 0.0002061005770816158 step: 200, acc: 0.922, loss: 0.181 (data_loss: 0.181, reg_loss: 0.000), lr: 0.00020193861066235866 step: 300, acc: 0.922, loss: 0.250 (data_loss: 0.250, reg_loss: 0.000), lr: 0.0001979414093428345 step: 400, acc: 0.898, loss: 0.303 (data_loss: 0.303, reg_loss: 0.000), lr: 0.0001940993788819876 step: 468, acc: 0.938, loss: 0.151 (data_loss: 0.151, reg_loss: 0.000), lr: 0.00019157088122605365 training, acc: 0.912, loss: 0.244 (data_loss: 0.244, reg_loss: 0.000), lr: 0.00019157088122605365 validation, acc: 0.881, loss: 0.335 epoch: 10 step: 0, acc: 0.906, loss: 0.198 (data_loss: 0.198, reg_loss: 0.000), lr: 0.0001915341888527102 step: 100, acc: 0.930, loss: 0.193 (data_loss: 0.193, reg_loss: 0.000), lr: 0.00018793459875963167 step: 200, acc: 0.922, loss: 0.175 (data_loss: 0.175, reg_loss: 0.000), lr: 0.00018446781036709093 step: 300, acc: 0.922, loss: 0.245 (data_loss: 0.245, reg_loss: 0.000), lr: 0.00018112660749864155 step: 400, acc: 0.898, loss: 0.303 (data_loss: 0.303, reg_loss: 0.000), lr: 0.00017790428749332856 step: 468, acc: 0.938, loss: 0.144 (data_loss: 0.144, reg_loss: 0.000), lr: 0.00017577781683951485 training, acc: 0.915, loss: 0.237 (data_loss: 0.237, reg_loss: 0.000), lr: 0.00017577781683951485 validation, acc: 0.881, loss: 0.334通过简单地增加模型规模、衰减和训练轮次我们提高了准确率并降低了损失。到目前为止的全部代码 import numpy as np import cv2 import os# Loads a MNIST dataset def load_mnist_dataset(dataset, path):# Scan all the directories and create a list of labelslabels os.listdir(os.path.join(path, dataset))# Create lists for samples and labelsX []y []# For each label folderfor label in labels:# And for each image in given folderfor file in os.listdir(os.path.join(path, dataset, label)):# Read the imageimage cv2.imread(os.path.join(path, dataset, label, file), cv2.IMREAD_UNCHANGED)# And append it and a label to the listsX.append(image)y.append(label)# Convert the data to proper numpy arrays and returnreturn np.array(X), np.array(y).astype(uint8)# MNIST dataset (train test) def create_data_mnist(path):# Load both sets separatelyX, y load_mnist_dataset(train, path)X_test, y_test load_mnist_dataset(test, path)# And return all the datareturn X, y, X_test, y_testimport numpy as np import nnfs from nnfs.datasets import sine_data, spiral_data import sysnnfs.init()# Dense layer class Layer_Dense:# Layer initializationdef __init__(self, n_inputs, n_neurons,weight_regularizer_l10, weight_regularizer_l20,bias_regularizer_l10, bias_regularizer_l20):# Initialize weights and biases# self.weights 0.01 * np.random.randn(n_inputs, n_neurons)self.weights 0.1 * np.random.randn(n_inputs, n_neurons)self.biases np.zeros((1, n_neurons))# Set regularization strengthself.weight_regularizer_l1 weight_regularizer_l1self.weight_regularizer_l2 weight_regularizer_l2self.bias_regularizer_l1 bias_regularizer_l1self.bias_regularizer_l2 bias_regularizer_l2# Forward passdef forward(self, inputs, training):# Remember input valuesself.inputs inputs# Calculate output values from inputs, weights and biasesself.output np.dot(inputs, self.weights) self.biases# Backward passdef backward(self, dvalues):# Gradients on parametersself.dweights np.dot(self.inputs.T, dvalues)self.dbiases np.sum(dvalues, axis0, keepdimsTrue)# Gradients on regularization# L1 on weightsif self.weight_regularizer_l1 0:dL1 np.ones_like(self.weights)dL1[self.weights 0] -1self.dweights self.weight_regularizer_l1 * dL1# L2 on weightsif self.weight_regularizer_l2 0:self.dweights 2 * self.weight_regularizer_l2 * self.weights# L1 on biasesif self.bias_regularizer_l1 0:dL1 np.ones_like(self.biases)dL1[self.biases 0] -1self.dbiases self.bias_regularizer_l1 * dL1# L2 on biasesif self.bias_regularizer_l2 0:self.dbiases 2 * self.bias_regularizer_l2 * self.biases# Gradient on valuesself.dinputs np.dot(dvalues, self.weights.T)# Dropout class Layer_Dropout: # Initdef __init__(self, rate):# Store rate, we invert it as for example for dropout# of 0.1 we need success rate of 0.9self.rate 1 - rate# Forward passdef forward(self, inputs, training):# Save input valuesself.inputs inputs# If not in the training mode - return valuesif not training:self.output inputs.copy()return# Generate and save scaled maskself.binary_mask np.random.binomial(1, self.rate, sizeinputs.shape) / self.rate# Apply mask to output valuesself.output inputs * self.binary_mask# Backward passdef backward(self, dvalues):# Gradient on valuesself.dinputs dvalues * self.binary_mask# Input layer class Layer_Input:# Forward passdef forward(self, inputs, training):self.output inputs# ReLU activation class Activation_ReLU: # Forward passdef forward(self, inputs, training):# Remember input valuesself.inputs inputs# Calculate output values from inputsself.output np.maximum(0, inputs)# Backward passdef backward(self, dvalues):# Since we need to modify original variable,# lets make a copy of values firstself.dinputs dvalues.copy()# Zero gradient where input values were negativeself.dinputs[self.inputs 0] 0# Calculate predictions for outputsdef predictions(self, outputs):return outputs# Softmax activation class Activation_Softmax:# Forward passdef forward(self, inputs, training):# Remember input valuesself.inputs inputs# Get unnormalized probabilitiesexp_values np.exp(inputs - np.max(inputs, axis1, keepdimsTrue))# Normalize them for each sampleprobabilities exp_values / np.sum(exp_values, axis1, keepdimsTrue)self.output probabilities# Backward passdef backward(self, dvalues):# Create uninitialized arrayself.dinputs np.empty_like(dvalues)# Enumerate outputs and gradientsfor index, (single_output, single_dvalues) in enumerate(zip(self.output, dvalues)):# Flatten output arraysingle_output single_output.reshape(-1, 1)# Calculate Jacobian matrix of the output andjacobian_matrix np.diagflat(single_output) - np.dot(single_output, single_output.T)# Calculate sample-wise gradient# and add it to the array of sample gradientsself.dinputs[index] np.dot(jacobian_matrix, single_dvalues)# Calculate predictions for outputsdef predictions(self, outputs):return np.argmax(outputs, axis1)# Sigmoid activation class Activation_Sigmoid:# Forward passdef forward(self, inputs, training):# Save input and calculate/save output# of the sigmoid functionself.inputs inputsself.output 1 / (1 np.exp(-inputs))# Backward passdef backward(self, dvalues):# Derivative - calculates from output of the sigmoid functionself.dinputs dvalues * (1 - self.output) * self.output# Calculate predictions for outputsdef predictions(self, outputs):return (outputs 0.5) * 1# Linear activation class Activation_Linear:# Forward passdef forward(self, inputs, training):# Just remember valuesself.inputs inputsself.output inputs# Backward passdef backward(self, dvalues):# derivative is 1, 1 * dvalues dvalues - the chain ruleself.dinputs dvalues.copy()# Calculate predictions for outputsdef predictions(self, outputs):return outputs# SGD optimizer class Optimizer_SGD:# Initialize optimizer - set settings,# learning rate of 1. is default for this optimizerdef __init__(self, learning_rate1., decay0., momentum0.):self.learning_rate learning_rateself.current_learning_rate learning_rateself.decay decayself.iterations 0self.momentum momentum# Call once before any parameter updatesdef pre_update_params(self):if self.decay:self.current_learning_rate self.learning_rate * (1. / (1. self.decay * self.iterations))# Update parametersdef update_params(self, layer):# If we use momentumif self.momentum:# If layer does not contain momentum arrays, create them# filled with zerosif not hasattr(layer, weight_momentums):layer.weight_momentums np.zeros_like(layer.weights)# If there is no momentum array for weights# The array doesnt exist for biases yet either.layer.bias_momentums np.zeros_like(layer.biases)# Build weight updates with momentum - take previous# updates multiplied by retain factor and update with# current gradientsweight_updates self.momentum * layer.weight_momentums - self.current_learning_rate * layer.dweightslayer.weight_momentums weight_updates# Build bias updatesbias_updates self.momentum * layer.bias_momentums - self.current_learning_rate * layer.dbiaseslayer.bias_momentums bias_updates# Vanilla SGD updates (as before momentum update)else:weight_updates -self.current_learning_rate * layer.dweightsbias_updates -self.current_learning_rate * layer.dbiases# Update weights and biases using either# vanilla or momentum updateslayer.weights weight_updateslayer.biases bias_updates# Call once after any parameter updatesdef post_update_params(self):self.iterations 1 # Adagrad optimizer class Optimizer_Adagrad:# Initialize optimizer - set settingsdef __init__(self, learning_rate1., decay0., epsilon1e-7):self.learning_rate learning_rateself.current_learning_rate learning_rateself.decay decayself.iterations 0self.epsilon epsilon# Call once before any parameter updatesdef pre_update_params(self):if self.decay:self.current_learning_rate self.learning_rate * (1. / (1. self.decay * self.iterations))# Update parametersdef update_params(self, layer):# If layer does not contain cache arrays,# create them filled with zerosif not hasattr(layer, weight_cache):layer.weight_cache np.zeros_like(layer.weights)layer.bias_cache np.zeros_like(layer.biases)# Update cache with squared current gradientslayer.weight_cache layer.dweights**2layer.bias_cache layer.dbiases**2# Vanilla SGD parameter update normalization# with square rooted cachelayer.weights -self.current_learning_rate * layer.dweights / (np.sqrt(layer.weight_cache) self.epsilon)layer.biases -self.current_learning_rate * layer.dbiases / (np.sqrt(layer.bias_cache) self.epsilon)# Call once after any parameter updatesdef post_update_params(self):self.iterations 1# RMSprop optimizer class Optimizer_RMSprop: # Initialize optimizer - set settingsdef __init__(self, learning_rate0.001, decay0., epsilon1e-7, rho0.9):self.learning_rate learning_rateself.current_learning_rate learning_rateself.decay decayself.iterations 0self.epsilon epsilonself.rho rho# Call once before any parameter updatesdef pre_update_params(self):if self.decay:self.current_learning_rate self.learning_rate * (1. / (1. self.decay * self.iterations))# Update parametersdef update_params(self, layer):# If layer does not contain cache arrays,# create them filled with zerosif not hasattr(layer, weight_cache):layer.weight_cache np.zeros_like(layer.weights)layer.bias_cache np.zeros_like(layer.biases)# Update cache with squared current gradientslayer.weight_cache self.rho * layer.weight_cache (1 - self.rho) * layer.dweights**2layer.bias_cache self.rho * layer.bias_cache (1 - self.rho) * layer.dbiases**2# Vanilla SGD parameter update normalization# with square rooted cachelayer.weights -self.current_learning_rate * layer.dweights / (np.sqrt(layer.weight_cache) self.epsilon)layer.biases -self.current_learning_rate * layer.dbiases / (np.sqrt(layer.bias_cache) self.epsilon)# Call once after any parameter updatesdef post_update_params(self):self.iterations 1# Adam optimizer class Optimizer_Adam:# Initialize optimizer - set settingsdef __init__(self, learning_rate0.001, decay0., epsilon1e-7, beta_10.9, beta_20.999):self.learning_rate learning_rateself.current_learning_rate learning_rateself.decay decayself.iterations 0self.epsilon epsilonself.beta_1 beta_1self.beta_2 beta_2# Call once before any parameter updatesdef pre_update_params(self):if self.decay:self.current_learning_rate self.learning_rate * (1. / (1. self.decay * self.iterations)) # Update parametersdef update_params(self, layer):# If layer does not contain cache arrays,# create them filled with zerosif not hasattr(layer, weight_cache):layer.weight_momentums np.zeros_like(layer.weights)layer.weight_cache np.zeros_like(layer.weights)layer.bias_momentums np.zeros_like(layer.biases)layer.bias_cache np.zeros_like(layer.biases)# Update momentum with current gradientslayer.weight_momentums self.beta_1 * layer.weight_momentums (1 - self.beta_1) * layer.dweightslayer.bias_momentums self.beta_1 * layer.bias_momentums (1 - self.beta_1) * layer.dbiases# Get corrected momentum# self.iteration is 0 at first pass# and we need to start with 1 hereweight_momentums_corrected layer.weight_momentums / (1 - self.beta_1 ** (self.iterations 1))bias_momentums_corrected layer.bias_momentums / (1 - self.beta_1 ** (self.iterations 1))# Update cache with squared current gradientslayer.weight_cache self.beta_2 * layer.weight_cache (1 - self.beta_2) * layer.dweights**2layer.bias_cache self.beta_2 * layer.bias_cache (1 - self.beta_2) * layer.dbiases**2# Get corrected cacheweight_cache_corrected layer.weight_cache / (1 - self.beta_2 ** (self.iterations 1))bias_cache_corrected layer.bias_cache / (1 - self.beta_2 ** (self.iterations 1))# Vanilla SGD parameter update normalization# with square rooted cachelayer.weights -self.current_learning_rate * weight_momentums_corrected / (np.sqrt(weight_cache_corrected) self.epsilon)layer.biases -self.current_learning_rate * bias_momentums_corrected / (np.sqrt(bias_cache_corrected) self.epsilon)# Call once after any parameter updatesdef post_update_params(self):self.iterations 1# Common loss class class Loss:# Regularization loss calculationdef regularization_loss(self): # 0 by defaultregularization_loss 0# Calculate regularization loss# iterate all trainable layersfor layer in self.trainable_layers:# L1 regularization - weights# calculate only when factor greater than 0if layer.weight_regularizer_l1 0:regularization_loss layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))# L2 regularization - weightsif layer.weight_regularizer_l2 0:regularization_loss layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)# L1 regularization - biases# calculate only when factor greater than 0if layer.bias_regularizer_l1 0:regularization_loss layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))# L2 regularization - biasesif layer.bias_regularizer_l2 0:regularization_loss layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)return regularization_loss# Set/remember trainable layersdef remember_trainable_layers(self, trainable_layers):self.trainable_layers trainable_layers# Calculates the data and regularization losses# given model output and ground truth valuesdef calculate(self, output, y, *, include_regularizationFalse):# Calculate sample lossessample_losses self.forward(output, y)# Calculate mean lossdata_loss np.mean(sample_losses)# Add accumulated sum of losses and sample countself.accumulated_sum np.sum(sample_losses)self.accumulated_count len(sample_losses)# If just data loss - return itif not include_regularization:return data_loss# Return the data and regularization lossesreturn data_loss, self.regularization_loss() # Calculates accumulated lossdef calculate_accumulated(self, *, include_regularizationFalse):# Calculate mean lossdata_loss self.accumulated_sum / self.accumulated_count# If just data loss - return itif not include_regularization:return data_loss# Return the data and regularization lossesreturn data_loss, self.regularization_loss()# Reset variables for accumulated lossdef new_pass(self):self.accumulated_sum 0self.accumulated_count 0# Cross-entropy loss class Loss_CategoricalCrossentropy(Loss):# Forward passdef forward(self, y_pred, y_true):# Number of samples in a batchsamples len(y_pred)# Clip data to prevent division by 0# Clip both sides to not drag mean towards any valuey_pred_clipped np.clip(y_pred, 1e-7, 1 - 1e-7)# Probabilities for target values -# only if categorical labelsif len(y_true.shape) 1:correct_confidences y_pred_clipped[range(samples),y_true]# Mask values - only for one-hot encoded labelselif len(y_true.shape) 2:correct_confidences np.sum(y_pred_clipped * y_true, axis1)# Lossesnegative_log_likelihoods -np.log(correct_confidences)return negative_log_likelihoods# Backward passdef backward(self, dvalues, y_true):# Number of samplessamples len(dvalues)# Number of labels in every sample# Well use the first sample to count themlabels len(dvalues[0])# If labels are sparse, turn them into one-hot vectorif len(y_true.shape) 1:y_true np.eye(labels)[y_true]# Calculate gradientself.dinputs -y_true / dvalues# Normalize gradientself.dinputs self.dinputs / samples# Softmax classifier - combined Softmax activation # and cross-entropy loss for faster backward step class Activation_Softmax_Loss_CategoricalCrossentropy(): # # Creates activation and loss function objects# def __init__(self):# self.activation Activation_Softmax()# self.loss Loss_CategoricalCrossentropy()# # Forward pass# def forward(self, inputs, y_true):# # Output layers activation function# self.activation.forward(inputs)# # Set the output# self.output self.activation.output# # Calculate and return loss value# return self.loss.calculate(self.output, y_true)# Backward passdef backward(self, dvalues, y_true):# Number of samplessamples len(dvalues) # If labels are one-hot encoded,# turn them into discrete valuesif len(y_true.shape) 2:y_true np.argmax(y_true, axis1)# Copy so we can safely modifyself.dinputs dvalues.copy()# Calculate gradientself.dinputs[range(samples), y_true] - 1# Normalize gradientself.dinputs self.dinputs / samples# Binary cross-entropy loss class Loss_BinaryCrossentropy(Loss): # Forward passdef forward(self, y_pred, y_true):# Clip data to prevent division by 0# Clip both sides to not drag mean towards any valuey_pred_clipped np.clip(y_pred, 1e-7, 1 - 1e-7)# Calculate sample-wise losssample_losses -(y_true * np.log(y_pred_clipped) (1 - y_true) * np.log(1 - y_pred_clipped))sample_losses np.mean(sample_losses, axis-1)# Return lossesreturn sample_losses # Backward passdef backward(self, dvalues, y_true):# Number of samplessamples len(dvalues)# Number of outputs in every sample# Well use the first sample to count themoutputs len(dvalues[0])# Clip data to prevent division by 0# Clip both sides to not drag mean towards any valueclipped_dvalues np.clip(dvalues, 1e-7, 1 - 1e-7)# Calculate gradientself.dinputs -(y_true / clipped_dvalues - (1 - y_true) / (1 - clipped_dvalues)) / outputs# Normalize gradientself.dinputs self.dinputs / samples# Mean Squared Error loss class Loss_MeanSquaredError(Loss): # L2 loss# Forward passdef forward(self, y_pred, y_true):# Calculate losssample_losses np.mean((y_true - y_pred)**2, axis-1)# Return lossesreturn sample_losses# Backward passdef backward(self, dvalues, y_true):# Number of samplessamples len(dvalues)# Number of outputs in every sample# Well use the first sample to count themoutputs len(dvalues[0])# Gradient on valuesself.dinputs -2 * (y_true - dvalues) / outputs# Normalize gradientself.dinputs self.dinputs / samples# Mean Absolute Error loss class Loss_MeanAbsoluteError(Loss): # L1 lossdef forward(self, y_pred, y_true):# Calculate losssample_losses np.mean(np.abs(y_true - y_pred), axis-1)# Return lossesreturn sample_losses# Backward passdef backward(self, dvalues, y_true):# Number of samplessamples len(dvalues)# Number of outputs in every sample# Well use the first sample to count themoutputs len(dvalues[0])# Calculate gradientself.dinputs np.sign(y_true - dvalues) / outputs# Normalize gradientself.dinputs self.dinputs / samples# Common accuracy class class Accuracy:# Calculates an accuracy# given predictions and ground truth valuesdef calculate(self, predictions, y):# Get comparison resultscomparisons self.compare(predictions, y)# Calculate an accuracyaccuracy np.mean(comparisons)# Add accumulated sum of matching values and sample countself.accumulated_sum np.sum(comparisons)self.accumulated_count len(comparisons)# Return accuracyreturn accuracy # Calculates accumulated accuracydef calculate_accumulated(self):# Calculate an accuracyaccuracy self.accumulated_sum / self.accumulated_count# Return the data and regularization lossesreturn accuracy# Reset variables for accumulated accuracydef new_pass(self):self.accumulated_sum 0self.accumulated_count 0# Accuracy calculation for classification model class Accuracy_Categorical(Accuracy):# No initialization is neededdef init(self, y):pass# Compares predictions to the ground truth valuesdef compare(self, predictions, y):if len(y.shape) 2:y np.argmax(y, axis1)return predictions y# Accuracy calculation for regression model class Accuracy_Regression(Accuracy):def __init__(self):# Create precision propertyself.precision None# Calculates precision value# based on passed in ground truthdef init(self, y, reinitFalse):if self.precision is None or reinit:self.precision np.std(y) / 250# Compares predictions to the ground truth valuesdef compare(self, predictions, y):return np.absolute(predictions - y) self.precision# Model class class Model:def __init__(self):# Create a list of network objectsself.layers []# Softmax classifiers output objectself.softmax_classifier_output None# Add objects to the modeldef add(self, layer):self.layers.append(layer)# Set loss, optimizer and accuracydef set(self, *, loss, optimizer, accuracy):self.loss lossself.optimizer optimizerself.accuracy accuracy# Finalize the modeldef finalize(self):# Create and set the input layerself.input_layer Layer_Input()# Count all the objectslayer_count len(self.layers)# Initialize a list containing trainable layers:self.trainable_layers []# Iterate the objectsfor i in range(layer_count):# If its the first layer,# the previous layer object is the input layerif i 0:self.layers[i].prev self.input_layerself.layers[i].next self.layers[i1]# All layers except for the first and the lastelif i layer_count - 1:self.layers[i].prev self.layers[i-1]self.layers[i].next self.layers[i1]# The last layer - the next object is the loss# Also lets save aside the reference to the last object# whose output is the models outputelse:self.layers[i].prev self.layers[i-1]self.layers[i].next self.lossself.output_layer_activation self.layers[i]# If layer contains an attribute called weights,# its a trainable layer -# add it to the list of trainable layers# We dont need to check for biases -# checking for weights is enough if hasattr(self.layers[i], weights):self.trainable_layers.append(self.layers[i])# Update loss object with trainable layersself.loss.remember_trainable_layers(self.trainable_layers)# If output activation is Softmax and# loss function is Categorical Cross-Entropy# create an object of combined activation# and loss function containing# faster gradient calculationif isinstance(self.layers[-1], Activation_Softmax) and isinstance(self.loss, Loss_CategoricalCrossentropy):# Create an object of combined activation# and loss functionsself.softmax_classifier_output Activation_Softmax_Loss_CategoricalCrossentropy()# Train the model# def train(self, X, y, *, epochs1, print_every1, validation_dataNone):def train(self, X, y, *, epochs1, batch_sizeNone, print_every1, validation_dataNone):# Initialize accuracy objectself.accuracy.init(y)# Default value if batch size is not being settrain_steps 1# If there is validation data passed,# set default number of steps for validation as wellif validation_data is not None:validation_steps 1# For better readabilityX_val, y_val validation_data# Calculate number of stepsif batch_size is not None:train_steps len(X) // batch_size# Dividing rounds down. If there are some remaining# data, but not a full batch, this wont include it# Add 1 to include this not full batchif train_steps * batch_size len(X):train_steps 1if validation_data is not None:validation_steps len(X_val) // batch_size# Dividing rounds down. If there are some remaining# data, but nor full batch, this wont include it# Add 1 to include this not full batchif validation_steps * batch_size len(X_val):validation_steps 1# Main training loopfor epoch in range(1, epochs1):# Print epoch numberprint(fepoch: {epoch})# Reset accumulated values in loss and accuracy objectsself.loss.new_pass()self.accuracy.new_pass()# Iterate over stepsfor step in range(train_steps):# If batch size is not set -# train using one step and full datasetif batch_size is None:batch_X Xbatch_y y# Otherwise slice a batchelse:batch_X X[step*batch_size:(step1)*batch_size]batch_y y[step*batch_size:(step1)*batch_size]# Perform the forward passoutput self.forward(batch_X, trainingTrue)# Calculate lossdata_loss, regularization_loss self.loss.calculate(output, batch_y, include_regularizationTrue)loss data_loss regularization_loss# Get predictions and calculate an accuracypredictions self.output_layer_activation.predictions(output)accuracy self.accuracy.calculate(predictions, batch_y)# Perform backward passself.backward(output, batch_y)# Optimize (update parameters)self.optimizer.pre_update_params()for layer in self.trainable_layers:self.optimizer.update_params(layer)self.optimizer.post_update_params()# Print a summaryif not step % print_every or step train_steps - 1:print(fstep: {step}, facc: {accuracy:.3f}, floss: {loss:.3f} ( fdata_loss: {data_loss:.3f}, freg_loss: {regularization_loss:.3f}), flr: {self.optimizer.current_learning_rate})# Get and print epoch loss and accuracyepoch_data_loss, epoch_regularization_loss self.loss.calculate_accumulated(include_regularizationTrue)epoch_loss epoch_data_loss epoch_regularization_lossepoch_accuracy self.accuracy.calculate_accumulated()print(ftraining, facc: {epoch_accuracy:.3f}, floss: {epoch_loss:.3f} ( fdata_loss: {epoch_data_loss:.3f}, freg_loss: {epoch_regularization_loss:.3f}), flr: {self.optimizer.current_learning_rate})# If there is the validation dataif validation_data is not None:# Reset accumulated values in loss# and accuracy objectsself.loss.new_pass()self.accuracy.new_pass()# Iterate over stepsfor step in range(validation_steps):# If batch size is not set -# train using one step and full datasetif batch_size is None:batch_X X_valbatch_y y_val# Otherwise slice a batchelse:batch_X X_val[step*batch_size:(step1)*batch_size]batch_y y_val[step*batch_size:(step1)*batch_size]# Perform the forward passoutput self.forward(batch_X, trainingFalse)# Calculate the lossself.loss.calculate(output, batch_y)# Get predictions and calculate an accuracypredictions self.output_layer_activation.predictions(output)self.accuracy.calculate(predictions, batch_y)# Get and print validation loss and accuracyvalidation_loss self.loss.calculate_accumulated()validation_accuracy self.accuracy.calculate_accumulated()# Print a summaryprint(fvalidation, facc: {validation_accuracy:.3f}, floss: {validation_loss:.3f})# Performs forward passdef forward(self, X, training):# Call forward method on the input layer# this will set the output property that# the first layer in prev object is expectingself.input_layer.forward(X, training)# Call forward method of every object in a chain# Pass output of the previous object as a parameterfor layer in self.layers:layer.forward(layer.prev.output, training)# layer is now the last object from the list,# return its outputreturn layer.output# Performs backward passdef backward(self, output, y):# If softmax classifierif self.softmax_classifier_output is not None:# First call backward method# on the combined activation/loss# this will set dinputs propertyself.softmax_classifier_output.backward(output, y)# Since well not call backward method of the last layer# which is Softmax activation# as we used combined activation/loss# object, lets set dinputs in this objectself.layers[-1].dinputs self.softmax_classifier_output.dinputs# Call backward method going through# all the objects but last# in reversed order passing dinputs as a parameterfor layer in reversed(self.layers[:-1]):layer.backward(layer.next.dinputs)return# First call backward method on the loss# this will set dinputs property that the last# layer will try to access shortlyself.loss.backward(output, y)# Call backward method going through all the objects# in reversed order passing dinputs as a parameterfor layer in reversed(self.layers):layer.backward(layer.next.dinputs) # Create dataset X, y, X_test, y_test create_data_mnist(fashion_mnist_images)# Shuffle the training dataset keys np.array(range(X.shape[0])) np.random.shuffle(keys) X X[keys] y y[keys]# Scale and reshape samples X (X.reshape(X.shape[0], -1).astype(np.float32) - 127.5) / 127.5 X_test (X_test.reshape(X_test.shape[0], -1).astype(np.float32) - 127.5) / 127.5# Instantiate the model model Model()# # Add layers # model.add(Layer_Dense(X.shape[1], 64)) # model.add(Activation_ReLU()) # model.add(Layer_Dense(64, 64)) # model.add(Activation_ReLU()) # model.add(Layer_Dense(64, 10)) # model.add(Activation_Softmax())# # Set loss, optimizer and accuracy objects # model.set( # lossLoss_CategoricalCrossentropy(), # optimizerOptimizer_Adam(decay5e-5), # accuracyAccuracy_Categorical() # )# # Finalize the model # model.finalize()# # Train the model # model.train(X, y, validation_data(X_test, y_test), epochs5, batch_size128, print_every100)# Add layers model.add(Layer_Dense(X.shape[1], 128)) model.add(Activation_ReLU()) model.add(Layer_Dense(128, 128)) model.add(Activation_ReLU()) model.add(Layer_Dense(128, 10)) model.add(Activation_Softmax())# Set loss, optimizer and accuracy objects model.set(lossLoss_CategoricalCrossentropy(),optimizerOptimizer_Adam(decay1e-3),accuracyAccuracy_Categorical())# Finalize the model model.finalize()# Train the model model.train(X, y, validation_data(X_test, y_test), epochs10, batch_size128, print_every100)本章的章节代码、更多资源和勘误表https://nnfs.io/ch19

查看全文

http://www.hkea.cn/news/14555850/