自动摘要: 官网：[https://pytorch.org/tutorials/beginner/pytorch_with_examples.html](https://pytorch.org/tutorials ……..

官网：https://pytorch.org/tutorials/beginner/pytorch_with_examples.html中文官网：https://pytorch.apachecn.org/#/docs/1.7/07网友示例网址：网站：https://so.csdn.net/so/search?q=pytorch&spm=1001.2101.3001.7020

将用一个三阶多项式拟合 y = sin (x)的问题作为运行示例。该网络将有四个参数，并将使用梯度下降进行训练，以通过最小化网络输出和真实输出之间的欧几里得距离来拟合随机数据。

Tensors

Warm-up: numpy

用Numpy 构建网络。Numpy 提供了一个 n 维数组对象，以及许多用于操作数组的函数。Numpy是科学计算的通用框架，它对计算图、深度学习或梯度一无所知，但是却可以很容易地使用numpy将三阶多项式拟合到正弦函数，方法是使用numpy运算手动实现网络中的正向和反向传递：

import numpy as np
import math

#create random input and output data
x = np.linspace(-math.pi,math.pi,2000)
y = np.sin(x)

#Randomly initialize weights：将每个参数都初始化为某一个闭区间内的随机数
a = np.random.randn()
b = np.random.randn()
c = np.random.randn()
d = np.random.randn()

learning_rate = 1e-6    #1e-6实际上就是科学计数法,1乘以10的-6次方
for t in range(2000):
    #forward pass: compute predicted y
    y_pred = a + b * x + c + x **2 +d * x ** 3

    #compute and print loss
    loss = np.square(y_pred - y).sum()
    if t % 100 == 99:
        print(t,loss)

    #backprop tp compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred*x).sum()
    grad_c = (grad_y_pred*x**2).sum()
    grad_d = (grad_y_pred*x**3).sum()

    #update weights
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f"result: y = {a} + {b} x +{c} x^2 + {d} x^3")

输出结果：

19478.36912480685
24419.251261778834
25574.360965011576
25705.06579423957
25671.37758666033
25627.59865465535
25594.83808699323
25572.500606645255
25557.61060307196
25547.74346998343
25541.21508840787
25536.89753185557
25534.042428319648
25532.154468842375
25530.906050928475
25530.080533239998
25529.53465909364
25529.173699489627
25528.935014746876
25528.77718432666
result: y = 14.593544901681053 + 0.8396782996625627 x +-19.907573693869118 x^2 + -0.09090339082228781 x^3

其中：numpy.random.randn()和numpy.random.randn(d0,d1,…,dn)：

rand函数根据给定维度生成[0,1)之间的一个或一组样本，包含0，不包含1；
randn函数返回一个或一组样本；
dn表格每个维度；
返回值为指定维度的array；
np.random.randn() # 当没有参数时，返回单个数据。

numpy.linspace()：序列生成器，函数用于在线性空间中以均匀步长生成数字序列。numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)生成一个指定大小，指定数据区间的均匀分布序列，参数介绍：

start：序列中数据的下界。
end：序列中数据的上界。
num：生成序列包含num个元素；其值默认为50。
endpoint：取True时，序列包含最大值end；否则不包含；其值默认为True。
retstep：该值取True时，生成的序列中显示间距；反正不显示；其值默认为false。
dtype：数据类型，可以指定生成序列的数据类型；当为None时，根据其他输入推断数据类型。
返回值：是一个数组。

Pytorch:Tensors

Numpy是一个很棒的框架，但它不能利用GPU来加速其数值计算。对于现代深度神经网络，GPU通常提供50倍或更高的加速，因此不幸的是，numpy对于现代深度学习来说还不够。在这里，我们介绍最基本的PyTorch概念：张量。PyTorch张量在概念上与numpy数组相同：张量是n维数组，PyTorch提供了许多函数来操作这些张量。在幕后，张量可以跟踪计算图形和梯度，但它们作为科学计算的通用工具也很有用。与numpy不同，PyTorch张量可以利用GPU来加速其数字计算。要在GPU上运行PyTorch张量，只需指定正确的设备即可。在这里，使用PyTorch张量将三阶多项式拟合到正弦函数。就像上面的numpy示例一样，需要手动实现通过网络的向前和向后传递：

import torch
import math

dtype = torch.float
device = torch.device("cpu")
#device = torch.device("cuda:0") #uncomment this to run GPU

#create random input and output data
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

#randomly initialize weights
a = torch.randn((),device=device,dtype=dtype)
b = torch.randn((),device=device,dtype=dtype)
c = torch.randn((),device=device,dtype=dtype)
d = torch.randn((),device=device,dtype=dtype)

learning_rate = 1e-6
for t in range(2000):
    #forward pass: compute predicted y
    y_pred = a + b * x +c * x ** 2 + d * x ** 3

    #compute and print loss
    loss = (y_pred - y).pow(2).sum().item()    #item()取出单元素张量的元素值并返回该值
    if t % 100 == 99:
        print(t, loss)

    #backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    #update weight using gradient descent
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f"result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3")

输出结果：

1479.1622314453125
1029.018310546875
717.3475341796875
501.3380432128906
351.484130859375
247.42686462402344
175.10452270507812
124.79450225830078
89.76720428466797
65.35990142822266
48.339378356933594
36.46087646484375
28.164865493774414
22.36684799194336
18.311866760253906
15.474104881286621
13.48697566986084
12.094669342041016
11.11857795715332
10.433902740478516
result: y = -0.040275223553180695 + 0.8441053628921509 x + 0.006948145106434822 x^2 + -0.0915331020951271 x^3

其中：torch.linspace(start, end, steps, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor返回一个一维的tensor(张量)，这个张量包含了从start到end（包括端点）的等距的steps个数据点。例：print(torch.linspace(3,10,5)) 输出结果为tensor([ 3.0000, 4.7500, 6.5000, 8.2500, 10.0000])参数：

start(float) -点集的起始值；
end(float) -点集的结束值；
steps(int) - 分割的点数，默认是100；
out (Tensor, optional) - 结果张量；
dtype：返回值（张量）的数据类型。

torch.randn()，返回一个符合均值为0，方差为1的正态分布（标准正态分布）中填充随机数的张量。

autograd

pytorch: tensors and autograd

上面的例子必须手动实现神经网络的前向和反向传递。对于小型两层网络来说，手动实现反向传递并不是什么大问题，但对于大型复杂网络来说，很快就会变得非常棘手。

值得庆幸的是，自动微分可以自动计算神经网络中的反向传递。PyTorch 中的自动分级包正是提供此功能。使用自动分级时，网络的正向传递将定义一个计算图，图中的节点将是张量，边缘将是从输入张量生成输出张量的函数。然后，通过此图反向传播，您可以轻松计算梯度。

这听起来很复杂，在实践中使用起来非常简单。每个张量表示计算图中的一个节点。如果x是一个具有x.requires_grad=True的张量，则x.grad是另一个张量，它保持 x 对某个标量值的梯度。

在这里，使用PyTorch张量和自动渐变来实现拟合正弦波与三阶多项式示例；现在不再需要手动实现通过网络的向后传递。

PyTorch: 定义新的 autograd 函数

在底层，每个原始的 autograd 运算符实际上是两个对张量进行操作的函数。正向函数从输入张量计算输出张量。向后函数接收输出张量相对于某个标量值的梯度，并计算输入张量相对于同一标量值的梯度。

在 PyTorch 中，通过定义一个子类torch.autograd.Function和实现前向及反向传播函数很容易定义 autograd 运算符。之后，可以通过使用新的 autograd 运算符，通过构造一个实例并像函数一样调用它，传递包含输入数据的张量。

在此示例中，将模型定义为代替,其中是三次勒让德多项式。本文编写了自己的自定义 Autograd 函数来计算P[3]的前进和后退，并使用它来实现该模型。

nn model

PyTorch: nn

在 PyTorch 中，nn包定义了一组模块，它们大致等效于神经网络层。模块接收输入张量并计算输出张量，但也可以保持内部状态，例如包含可学习参数的张量。 nn包还定义了一组有用的损失函数，这些函数通常在训练神经网络时使用。

在此示例中，使用nn包来实现多项式模型网络：

import torch
import math

#创建张量来保存输入和输出
x = torch.linspace(-math.pi,math.pi,2000)    #linspace:序列生成器，在给定区间内以均匀步长生成数字序列
y = torch.sin(x)

#这个例子中，输出y是关于x, x^2, x^3的线性函数，所以可以把它看作是一个线性神经网络，准备张量tensor(x, x^2, x^3)
p = torch.tensor([1,2,3])
xx = x.unsqueeze(-1).pow(p)

#在上面代码中，x.unsqueeze(-1) 有 shape (2000, 1)， p 有 shape(3,)；
#对于这个代码，广播语义将应用到获取形状为shape (2000, 3)的一个张量.

#使用 nn 包将模型定义为一个层序列;
#nn.Sequential是一个包含其他模块的模块，按顺序应用它们以产生其输出;
#线性模块的输出是通过输入的线性函数来计算的，并保存内部张量的权重和偏差;
#Flatten 层将线性层的输出展平为一维张量去匹配y的形状.

model = torch.nn.Sequential(
    torch.nn.Linear(3,1),
    torch.nn.Flatten(0,1)    #Flatten(0,1):降维，把多维转换成一维
)

#nn也包含流行的定义好的损失函数，在这个示例中，将使用Mean Squared Error (MSE)损失函数.
loss_fn = torch.nn.MSELoss(reduction='sum')   #sum是对结果矩阵各个元素求和
learning_rate = 1e-6
for t in range(2000):
    #前向传播：通过将 x 传递给模型来计算预测的 y.
    # 覆盖 __call__ 操作符，这样就可以像函数一样调用它们.
    #这样做时，将输入数据的张量传递给模块，它会产生输出数据的张量.
    y_pred = model(xx)

    #计算并打印loss，传递的张量包含y的预测值和真实值，损失函数返回一个包含损失的张量.
    loss = loss_fn(y_pred,y)
    if t % 100 == 99:
        print(t,loss.item())

    #在运行反向传递之前将梯度归零.
    model.zero_grad()

    #后向传递：计算模型所有可学习参数的损失梯度.
    #在内部，每个模块的参数都存储在 requires_grad=True 的张量中，因此此调用将计算模型中所有可学习参数的梯度.
    loss.backward()

    #使用梯度下降更新权重。每个参数都是一个张量，所以可以像以前一样访问它的梯度.
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

#您可以访问“model”的第一层，就像访问列表的第一项.
linear_layer = model[0]

#对于线性层，其参数存储为“权重”和“偏差”.
print(f'result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + '
      f'{linear_layer.weight[:,1].item()} x^2 + {linear_layer.weight[:,2].item()} x^3')

输出结果：

1043.3065185546875
692.8822021484375
461.1615295410156
307.93426513671875
206.61143493652344
139.6107635498047
95.30604553222656
66.00914001464844
46.63624572753906
33.82560348510742
25.354406356811523
19.75271987915039
16.048513412475586
13.599020957946777
11.979267120361328
10.908172607421875
10.199882507324219
9.731518745422363
9.421815872192383
9.217001914978027
result: y = 0.0005339012132026255 + 0.8373093008995056 x + -9.210744610754773e-05 x^2 + -0.09056641906499863 x^3

PyTorch: optim

到目前为止，通过使用torch.no_grad()手动更改持有可学习参数的张量来更新模型的权重。对于像随机梯度下降这样的简单优化算法来说，这并不是一个巨大的负担，但是在实践中，经常使用更复杂的优化器（例如 AdaGrad，RMSProp，Adam 等）来训练神经网络。PyTorch 中的optim包抽象了优化算法的思想，并提供了常用优化算法的实现。在此示例中，将使用nn包像以前一样定义我们的模型，但是我们将使用optim包提供的 RMSprop 算法来优化模型：

import torch
import math

x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
for t in range(2000):
    y_pred = model(xx)

    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

linear_layer = model[0]
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + '
      f'{linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

输出结果：

4199.068359375
1251.85888671875
744.5487060546875
650.283447265625
568.83349609375
476.1234436035156
378.39129638671875
285.1864013671875
203.7239990234375
137.22006225585938
86.24774932861328
50.04513168334961
27.013656616210938
14.817473411560059
10.082954406738281
9.017213821411133
9.001590728759766
8.925690650939941
8.897751808166504
8.914302825927734
Result: y = 0.0004996994393877685 + 0.8562811613082886 x + 0.0004997377400286496 x^2 + -0.0938219279050827 x^3

PyTorch: 自定义 nn 模块

有时，您将需要指定比一系列现有模块更复杂的模型。对于这些情况，您可以通过子类化nn.Module并定义一个forward来定义自己的模块，该模块使用其他模块或在 Tensors 上的其他自动转换操作来接收输入 Tensors 并生成输出 Tensors。

在此示例中，我们将三阶多项式实现为自定义Module子类：

import torch
import math

class Polynomial3(torch.nn.Module):
    def __init__(self):
        #在构造函数中，我们实例化了四个参数并将它们分配为成员参数
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))

    def forward(self,x):
        #在forward函数中，接受输入数据的张量，必须返回输出数据的张量。
        # 可以使用构造函数中定义的模块以及张量上的任意运算符
        return self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3

    def string(self):
        #与 Python 中的任何类一样，也可以在 PyTorch 模块上定义自定义方法
        return f'y = {self.a.item()} + {self.b.item()} x +{self.c.item()} x^2 + {self.d.item()} x^3'

x = torch.linspace(-math.pi,math.pi,2000)
y = torch.sin(x)

#通过实例化上面定义的类来构建模型
model = Polynomial3()

#构建损失函数和优化器.SGD 构造函数中对 model.parameters() 的调用将包含模型成员 nn.Linear 模块的可学习参数。
criterion = torch.nn.MSELoss(reduction='sum')    #均方损失函数loss(x,y)=(x-y)^2
optimizer = torch.optim.SGD(model.parameters(),lr=1e-6)
for t in range(2000):
    #向前传递：把x传给模型，计算出预测y
    y_pred = model(x)

    #计算、打印loss
    loss = criterion(y_pred,y)
    if t % 100 == 99:
        print(t,loss.item())

    #梯度置零，执行向后传递，更新权重
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()


print(f'result: {model.string()}')

输出结果：

1355.4447021484375
900.5872802734375
599.42919921875
400.0167541503906
267.9629211425781
180.5065155029297
122.57975006103516
84.20748138427734
58.78575897216797
41.94147491455078
30.77896499633789
23.38070297241211
18.476526260375977
15.225118637084961
13.069113731384277
11.639115333557129
10.690498352050781
10.06110954284668
9.643372535705566
9.36609935760498
result: y = -0.0066633448004722595 + 0.83480304479599 x +0.0011495368089526892 x^2 + -0.09020992368459702 x^3

PyTorch控制流+权重共享

作为动态图和权重分配的一个例子，我们实现了一个非常奇怪的模型: 一个三五阶多项式在每次正向传播时选择一个介于3和5之间的随机数，并使用该阶数，重复使用相同的权重多次计算第四和第五阶。

对于此模型，我们可以使用常规的 Python 流控制来实现循环，并且可以通过在定义正向传播时简单地多次重复使用相同的参数来实现权重共享。

我们可以轻松地将此模型实现为Module子类：

import random
import torch
import math

class DynamicNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))
        self.e = torch.nn.Parameter(torch.randn(()))

    def forward(self,x):
        y = self.a + self.b * x +self.c * x ** 2 +self.d * x ** 3
        for exp in range(4,random.randint(4,6)):
            y = y + self.e * x ** exp
        return y

    def string(self):
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x ^2 + {self.d.item()} x*3' \
               f'{self.e.item()} x^4 ? + {self.e.item()} x^5 ?'

x = torch.linspace(-math.pi,math.pi,2000)
y = torch.sin(x)

model = DynamicNet()
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(),lr=1e-8,momentum=0.9)

for t in range(30000):
    y_pred = model(x)
    loss = criterion(y_pred,y)
    if t % 2000 == 1999:
        print(t,loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'result:{model.string()}')

输出结果：

1868.185546875
901.3772583007812
471.8838806152344
242.06170654296875
133.9963836669922
131.37423706054688
38.36183166503906
23.234813690185547
16.192235946655273
12.5375394821167
10.831731796264648
9.643186569213867
9.118908882141113
9.150856018066406
9.009603500366211
result:y = -0.011092226952314377 + 0.8570550084114075 x + 0.0014420193620026112 x ^2 + -0.09358233213424683 x*30.00010753657988971099 x^4 ? + 0.00010753657988971099 x^5 ?