Take you to engage in deep learning with PyTorch in thousand words

70 min readSep 11, 2022

Outline:

*Deep learning
*Preface
1.Basic data: Tensor
1.1 Creation of Tensor
1.2 torch. FloatTensor
1.3 torch. IntTensor
1.4 torch.randn
1.5 torch.range
1.6 torch.zeros/ones/empty
2.Second, Tensor’s operation
2.1 torch.abs
2.2 torch.add
2.3 torch.clamp
2.4 torch.div
2.5 torch.pow
2.6 torch.mm
2.7 torch.mv
3.Third, the neural network toolbox torch.nn
3.1 nn. Module class
3.2 Build a simple neural network
4.Torch implements a complete neural network
4.1 torch.autograd and Variable
4.2 Custom propagation functions
4.3 PyTorch’s torch.nn
4.3.1 torch.nn.Sequential
4.3.2 torch.nn.Linear
4.3.3 torch.nn.ReLU
4.3.4 torch.nn.MSELoss
4.3.4 torch.nn.L1Loss
4.3.5 torch.nn.CrossEntropyLoss
4.3.5 Neural networks using loss functions
4.4 Torch.optim of PyTorch
5.Fifth, build a neural network to implement handwritten datasets
5.1 torchvision
5.1.1 torchvision.datasets
5.1.2 torchvision.models
5.1.3 torch.transforms
5.1.3.1 torchvision.transforms.Resize
5.1.3.2 torchvision.transforms.Scale
5.1.3.3 torchvision.transforms.CenterCrop
5.1.3.4 torchvision.transforms.RandomCrop
5.1.3.5 torchvision.transforms.RandomHorizontalFlip
5.1.3.6 torchvision.transforms.RandomVerticalFlip
5.1.3.7 torchvision.transforms.ToTensor
5.1.3.8 torchvision.transforms.ToPILImage:
5.1.4 torch.utils
5.2 Model building and parameter optimization
5.2.1 torch.nn.Conv2d
5.2.2 torch.nn.MaxPool2d
5.2.3 torch.nn.Dropout
5.3 Parameter Optimization
5.3.1 Model Training
5.4 Model Validation
5.5 Complete Code
VI. Conclusion

Preface
Learning a great framework for deep learning is very important, now the mainstream framework is Pytorch and Tesnsorflow, today let’s learn pytorch together!

Basic data: Tensor:

Tensor, or tensors, are basic operands in PyTorch and can be thought of as multidimensional matrices containing elements of a single data type. From a usage point of view, Tensor is very similar to NumPy’s ndarrays, and can be freely converted to each other, except that Tensor also supports GPU acceleration.

Before start, remind…: in my articles,the slash words is code.

1.1 Creation of Tensor

1.2 torch. FloatTensor

torch. FloatTensor is used to generate tensors of data type that are floating-point and passed to torch. The floatTensor parameter can be a list or a dimension value.

import torch
a = torch.FloatTensor(2,3)
b = torch.FloatTensor([2,3,4,5])
a,b

the output is:

(tensor([[1.0561e-38, 1.0102e-38, 9.6429e-39],
[8.4490e-39, 9.6429e-39, 9.1837e-39]]),
tensor([2., 3., 4., 5.]))

1.3 torch. IntTensor

torch. IntTensor is used to generate Tensor of type integer, passed to torch. The parameters of AntTensor can be a list or a dimension value.

import torch
a = torch.FloatTensor(2,3)
b = torch.FloatTensor([2,3,4,5])
a,b

import torch
a = torch.rand(2,3)
a

Get:

tensor([[0.5625, 0.5815, 0.8221],
[0.3589, 0.4180, 0.2158]])

1.4 torch.randn

The method used to generate random Tensor with a floating-point data type and a dimension specified is similar to the method used in numpy.randn to generate random numbers, where the value of a randomly generated floating-point number satisfies a normal distribution with a mean of 0 and a variance of 1.

import torch
a = torch.randn(2,3)
a

Get:

tensor([[-0.0067, -0.0707, -0.6682],
[ 0.8141, 1.1436, 0.5963]])

1.5 torch.range

torch.range is used to generate Tensor with a floating-point data type and a start and end range, so there are three parameters passed to torch.range, namely the start value, the end value, and the step size, where the step size is used to specify the data interval from the start value to the end worth each step.

import torch
a = torch.range(1,20,2)
a

Get:

tensor([ 1., 3., 5., 7., 9., 11., 13., 15., 17., 19.])

1.6 torch.zeros/ones/empty

torch.zeros is used to generate tensors of data type with floating-point types and specified dimensions, but the element values in this floating-point tensor are all 0.

torch.ones generates an array of all 1s.

torch.empty creates an uninitialized value tensor, the size of the tensor is determined by size, size: defines the shape of tensor, which can be either a list or a tuple.

import torch
a = torch.zeros(2,3)
a

Get:

tensor([[0., 0., 0.],
[0., 0., 0.]])

Second, Tensor’s operation

2.1 torch.abs

After passing the parameter to torch.abs returns the absolute value of the input parameter as an output, the input parameter must be a variable of tensor data type, such as:

import torch
a = torch.randn(2,3)
a

The resulting a is:

tensor([[ 0.0948, 0.0530, -0.0986],
[ 1.8926, -2.0569, 1.6617]])

A is treated with abs:

b = torch.abs(a)
b

Get:

tensor([[0.0948, 0.0530, 0.0986],
[1.8926, 2.0569, 1.6617]])

2.2 torch.add

Passing the parameters to torch.add returns the summation result of the input parameters as output, which can be either all variables of the Tensor data type, one variable of the Tensor data type, or one scalar.

import torch
a = torch.randn(2,3)
a
#tensor([[-0.1146, -0.3282, -0.2517],
# [-0.2474, 0.8323, -0.9292]])

b = torch.randn(2,3)
b
#tensor([[ 0.9526, 1.5841, -3.2665],
# [-0.4831, 0.9259, -0.5054]])

c = torch.add(a,b)
c

output of c:

tensor([[ 0.8379, 1.2559, -3.5182],
[-0.7305, 1.7582, -1.4346]])

Another one:

d = torch.randn(2,3)
d

#we get d is…
#tensor([[ 0.1473, 0.7631, -0.1953],
# [-0.2796, -0.7265, 0.7142]])

We add d to a scalar 10:

e = torch.add(d,10)
e

Get:

tensor([[10.1473, 10.7631, 9.8047],
[ 9.7204, 9.2735, 10.7142]])

2.3 torch.clamp

torch.clamp is to crop the input parameters according to a custom range, and finally the result of the parameter cropping as output, so the input parameters have a total of three, namely the variable of the Tensor data type that needs to be cropped, the upper border of the crop and the lower boundary of the crop, the specific cropping process is: use each element in the variable to compare the value of the upper border of the crop and the lower boundary of the crop, if the value of the element is less than the value of the lower border of the crop, The element is rewritten to the value of the lower boundary of the crop; Similarly, if the value of an element is greater than the value of the clipped upper bound, the element is rewritten to the value of the clipped upper bound. Let’s look directly at the example:

a = torch.randn(2,3)
a
#We get a is：
#tensor([[-1.4049, 1.0336, 1.2820],
# [ 0.7610, -1.7475, 0.2414]])

We do the clamp operation on b:

b = torch.clamp(a,-0.1,0.1)
b
#We get b is：
#tensor([[-0.1000, 0.1000, 0.1000],
# [ 0.1000, -0.1000, 0.1000]])

2.4 torch.div

Torch.div is the result of passing the argument to torch.div and returning the quotient result of the input parameter as output, likewise, the parameters participating in the operation can all be variables of the Tensor data type, or they can be a combination of variables of the Tensor data type and scalar. Let’s look at the examples.

a = torch.randn(2,3)
a
#We get a …：
#tensor([[ 0.6276, 0.6397, -0.0762],
# [-0.4193, -0.5528, 1.5192]])

b = torch.randn(2,3)
b
#We get b…：
#tensor([[ 0.9219, 0.2120, 0.1155],
# [ 1.1086, -1.1442, 0.2999]])

Div operations are performed on a, b

c = torch.div(a,b)
c
#get c…：
#tensor([[ 0.6808, 3.0173, -0.6602],
# [-0.3782, 0.4831, 5.0657]])

2.5 torch.pow

torch.pow: After passing the parameters to torch.pow, the result of the power of the input parameters is returned as output, and the parameters participating in the operation can all be variables of the Tensor data type, or they can be a combination of variables of the Tensor data type and scalars.
a = torch.randn(2,3)
a
#Get a is…：
#tensor([[ 0.3896, -0.1475, 0.1104],
# [-0.6908, -0.0472, -1.5310]])

Square a

b = torch.pow(a,2)
b
#We get b is the square of a：
#tensor([[1.5181e-01, 2.1767e-02, 1.2196e-02],
# [4.7722e-01, 2.2276e-03, 2.3441e+00]])

2.6 torch.mm

orch.mm: pass the parameter to the torch.mm and return the product result of the input parameter as an output, but this product is not the same as the previous torch.mul operation method, torch.mm use the multiplication rules between the matrices for calculation, so the passed parameters will be treated as a matrix, and the dimension of the parameter naturally also meets the preconditions of matrix multiplication, that is, the number of rows of the previous matrix must be equal to the number of columns of the latter matrix
Let’s look at an example:

a = torch.randn(2,3)
a
#We get a…：
#tensor([[ 0.1057, 0.0104, -0.1547],
# [ 0.5010, -0.0735, 0.4067]])

and…

b = torch.randn(2,3)
b
#We get b：
#tensor([[ 1.1971, -1.4010, 1.1277],
# [-0.3076, 0.9171, 1.9135]])

Then we perform matrix multiplication operations with the resulting a,b:

c = torch.mm(a,b.T)
c
#tensor([[-0.0625, -0.3190],
# [ 1.1613, 0.5567]])

2.7 torch.mv

After passing the parameters to the torch.mv returns the product result of the input parameters as output, torch.mv calculated using the multiplication rules between the matrix and the vector, the first parameter passed in represents the matrix, and the second parameter represents the vector, and the order cannot be reversed.
Let’s look at an example:

a = torch.randn(2,3)
a
#We get a…：
#tensor([[ 1.0909, -1.1679, 0.3161],
[-0.8952, -2.1351, -0.9667]])

b = torch.randn(3)
b
#We get b…：
#tensor([-1.4689, 1.6197, 0.7209])

Then we perform matrix multiplication operations with the resulting a,b:

c = torch.mv(a,b)
c
#tensor([-3.2663, -2.8402])

Third, the neural network toolbox “torch.nn”

Although the torch.autograd library implements autodevation and gradient backpropagation, if we want to complete the training of a model, we still need the automatic update of handwritten parameters and the control of the training process, which is still not convenient enough. To this end, PyTorch further provides a more integrated modular interface, torch.nn, which is built on Autograd and provides a range of functions such as network modules, optimizers, and initialization strategies.

3.1 nn. Module class

nn. Module is a neural network class provided by PyTorch, and implements the definition of each layer of the network and the forward computation and backpropagation mechanism in the class. In practical use, if you want to implement a neural network, you only need to inherit nn. Module, which defines the model structure and parameters in initialization, and writes a network forward procedure in the function forward().

1．nn. Parameter function

2.forward() function with backpropagation

3. Nesting of multiple Modules

4．nn. Module and nn.functional library

5．nn. Sequential() module

#We use “torch.nn” to implement a MLP
from torch import nn

class MLP(nn.Module):

# class MLP inherited from nn.Module
def __init__(self, in_dim, hid_dim1, hid_dim2, out_dim):
super(MLP, self).__init__()
self.layer = nn.Sequential(
nn.Linear(in_dim, hid_dim1),
nn.ReLU(),
nn.Linear(hid_dim1, hid_dim2),
nn.ReLU(),
nn.Linear(hid_dim2, out_dim),
nn.ReLU()
)
def forward(self, x):
x = self.layer(x)
return x

3.2 Build a simple neural network

Below we use torch to build a simple neural network:
1, we set the input node to 1000, the hidden layer node to 100, the output layer node to 10
2, input 100 data with 1000 features, after the hidden layer into 100 features with 10 classification results, and then the result will be propagated backwards

import torch
batch_n = 100 #the numbers of input data every single batch
hidden_layer = 100
input_data = 1000 #the features of every data is 1000
output_data = 10

x = torch.randn(batch_n,input_data)
y = torch.randn(batch_n,output_data)

w1 = torch.randn(input_data,hidden_layer)
w2 = torch.randn(hidden_layer,output_data)

epoch_n = 20
lr = 1e-6

for epoch in range(epoch_n):
h1=x.mm(w1)#(100,1000)*(1000,100) →100*100
print(h1.shape)
h1=h1.clamp(min=0)
y_pred = h1.mm(w2)

loss = (y_pred-y).pow(2).sum()
print(“epoch:{},loss:{:.4f}”.format(epoch,loss))

grad_y_pred = 2*(y_pred-y)
grad_w2 = h1.t().mm(grad_y_pred)

grad_h = grad_y_pred.clone()
grad_h = grad_h.mm(w2.t())
grad_h.clamp_(min=0)#Assign all values less than 0 to 0，Equivalent sigmoid
grad_w1 = x.t().mm(grad_h)

w1 = w1 -lr*grad_w1
w2 = w2 -lr*grad_w2

then … we get

torch.Size([100, 100])
epoch:0,loss:112145.7578
torch.Size([100, 100])
epoch:1,loss:110014.8203
torch.Size([100, 100])
epoch:2,loss:107948.0156
torch.Size([100, 100])
epoch:3,loss:105938.6719
torch.Size([100, 100])
epoch:4,loss:103985.1406
torch.Size([100, 100])
epoch:5,loss:102084.9609
torch.Size([100, 100])
epoch:6,loss:100236.9844
torch.Size([100, 100])
epoch:7,loss:98443.3359
torch.Size([100, 100])
epoch:8,loss:96699.5938
torch.Size([100, 100])
epoch:9,loss:95002.5234
torch.Size([100, 100])
epoch:10,loss:93349.7969
torch.Size([100, 100])
epoch:11,loss:91739.8438
torch.Size([100, 100])
epoch:12,loss:90171.6875
torch.Size([100, 100])
epoch:13,loss:88643.1094
torch.Size([100, 100])
epoch:14,loss:87152.6406
torch.Size([100, 100])
epoch:15,loss:85699.4297
torch.Size([100, 100])
epoch:16,loss:84282.2500
torch.Size([100, 100])
epoch:17,loss:82899.9062
torch.Size([100, 100])
epoch:18,loss:81550.3984
torch.Size([100, 100])
epoch:19,loss:80231.1484

lower and lower loss.

Torch implements a complete neural network

4.1 torch.autograd and Variable

The main function of the torch.autograd package is to complete the chain differentiation in the backward propagation of the neural network, and manually writing these derivative programs will lead to the phenomenon of repeated wheel building.

The function process of automatic gradient is roughly as follows: first generate a calculation graph in the forward propagation process of the neural network through variables of the input Tensor data type, and then accurately calculate the gradient that needs to be updated for each parameter according to this calculation graph and the output result, and complete the gradient update of the parameter by completing the propagation.

The Variable class in the torch.autograd package that is needed to complete the automatic gradient encapsulates the tensor data type variables we define, and after the encapsulation, each node in the calculation graph is a Variable object, so that the function of the automatic gradient can be applied.

Below we use autograd to implement a neural network model with a two-tier structure.

import torch
from torch.autograd import Variable
batch_n = 100
hidden_layer = 100
input_data = 1000
output_data = 10

x = Variable(torch.randn(batch_n,input_data),requires_grad=False)
y = Variable(torch.randn(batch_n,output_data),requires_grad=False)
#Encapsulated operations on tensor data type variables with Variable. requires_grad if it is False, it means that the variable does not retain the gradient value during the automatic gradient calculation.
w1 = Variable(torch.randn(input_data,hidden_layer),requires_grad=True)
w2 = Variable(torch.randn(hidden_layer,output_data),requires_grad=True)

#learning rate&number of iterations assignment
epoch_n=50
lr=1e-6

for epoch in range(epoch_n):
h1=x.mm(w1)#(100,1000)*(1000,100) →100*100
print(h1.shape)
h1=h1.clamp(min=0)
y_pred = h1.mm(w2)
#y_pred = x.mm(w1).clamp(min=0).mm(w2)
loss = (y_pred-y).pow(2).sum()
print(“epoch:{},loss:{:.4f}”.format(epoch,loss.data))

# grad_y_pred = 2*(y_pred-y)
# grad_w2 = h1.t().mm(grad_y_pred)
loss.backward()#backprobagation
# grad_h = grad_y_pred.clone()
# grad_h = grad_h.mm(w2.t())
# grad_h.clamp_(min=0)#Assign all values less than 0 to 0, which is equivalent to a sigmoid
# grad_w1 = x.t().mm(grad_h)
w1.data -= lr*w1.grad.data
w2.data -= lr*w2.grad.data

w1.grad.data.zero_()
w2.grad.data.zero_()

# w1 = w1 -lr*grad_w1
# w2 = w2 -lr*grad_w2

And we get…

get the result：
torch.Size([100, 100])
epoch:0,loss:54572212.0000
torch.Size([100, 100])
epoch:1,loss:133787328.0000
torch.Size([100, 100])
epoch:2,loss:491439904.0000
torch.Size([100, 100])
epoch:3,loss:683004416.0000
torch.Size([100, 100])
epoch:4,loss:13681055.0000
torch.Size([100, 100])
epoch:5,loss:8058388.0000
torch.Size([100, 100])
epoch:6,loss:5327059.5000
torch.Size([100, 100])
epoch:7,loss:3777382.5000
torch.Size([100, 100])
epoch:8,loss:2818449.5000
torch.Size([100, 100])
epoch:9,loss:2190285.0000
torch.Size([100, 100])
epoch:10,loss:1760991.0000
torch.Size([100, 100])
epoch:11,loss:1457116.3750
torch.Size([100, 100])
epoch:12,loss:1235850.6250
torch.Size([100, 100])
epoch:13,loss:1069994.0000
torch.Size([100, 100])
epoch:14,loss:942082.4375
torch.Size([100, 100])
epoch:15,loss:841170.6250
torch.Size([100, 100])
epoch:16,loss:759670.1875
torch.Size([100, 100])
epoch:17,loss:692380.5625
torch.Size([100, 100])
epoch:18,loss:635755.0625
torch.Size([100, 100])
epoch:19,loss:587267.1250
torch.Size([100, 100])
epoch:20,loss:545102.0000
torch.Size([100, 100])
epoch:21,loss:508050.6250
torch.Size([100, 100])
epoch:22,loss:475169.9375
torch.Size([100, 100])
epoch:23,loss:445762.8750
torch.Size([100, 100])
epoch:24,loss:419216.2812
torch.Size([100, 100])
epoch:25,loss:395124.9375
torch.Size([100, 100])
epoch:26,loss:373154.8438
torch.Size([100, 100])
epoch:27,loss:352987.6875
torch.Size([100, 100])
epoch:28,loss:334429.0000
torch.Size([100, 100])
epoch:29,loss:317317.7500
torch.Size([100, 100])
epoch:30,loss:301475.8125
torch.Size([100, 100])
epoch:31,loss:286776.8750
torch.Size([100, 100])
epoch:32,loss:273114.4062
torch.Size([100, 100])
epoch:33,loss:260383.6406
torch.Size([100, 100])
epoch:34,loss:248532.8125
torch.Size([100, 100])
epoch:35,loss:237452.3750
torch.Size([100, 100])
epoch:36,loss:227080.5156
torch.Size([100, 100])
epoch:37,loss:217362.9375
torch.Size([100, 100])
epoch:38,loss:208250.5312
torch.Size([100, 100])
epoch:39,loss:199686.1094
torch.Size([100, 100])
epoch:40,loss:191620.0312
torch.Size([100, 100])
epoch:41,loss:184017.4375
torch.Size([100, 100])
epoch:42,loss:176841.0156
torch.Size([100, 100])
epoch:43,loss:170073.1719
torch.Size([100, 100])
epoch:44,loss:163686.5000
torch.Size([100, 100])
epoch:45,loss:157641.5000
torch.Size([100, 100])
epoch:46,loss:151907.0000
torch.Size([100, 100])
epoch:47,loss:146470.1250
torch.Size([100, 100])
epoch:48,loss:141305.3594
torch.Size([100, 100])
epoch:49,loss:136396.7031

Same, lower and lower loss…

4.2 Custom propagation functions

In fact, in addition to the automatic gradient method, we can also complete the rewriting of the forward propagation function and the backward propagation function by building a new class that inherits the torch.nn.Module. In this new class, we use forward as the keyword for the forward propagation function and backward as the keyword for the backward propagation function. Let’s do a custom propagation function:

import torch
from torch.autograd import Variable
batch_n = 64
hidden_layer = 100
input_data = 1000
output_data = 10
class Model(torch.nn.Module): #Complete the operation of class inheritance
def __init__(self):
super(Model,self).__init__()#initial the class

def forward(self,input,w1,w2):
x = torch.mm(input,w1)
x = torch.clamp(x,min = 0)
x = torch.mm(x,w2)
return x

def backward(self):
pass
model = Model()
x = Variable(torch.randn(batch_n,input_data),requires_grad=False)
y = Variable(torch.randn(batch_n,output_data),requires_grad=False)
#Encapsulated operations on tensor data type variables with Variable. requires_grad if it is F, it means that the variable does not retain the gradient value during the automatic gradient calculation.
w1 = Variable(torch.randn(input_data,hidden_layer),requires_grad=True)
w2 = Variable(torch.randn(hidden_layer,output_data),requires_grad=True)

epoch_n=30

for epoch in range(epoch_n):
y_pred = model(x,w1,w2)

loss = (y_pred-y).pow(2).sum()
print(“epoch:{},loss:{:.4f}”.format(epoch,loss.data))
loss.backward()
w1.data -= lr*w1.grad.data
w2.data -= lr*w2.grad.data

w1.grad.data.zero_()
w2.grad.data.zero_()

And we get the result:

4.3 PyTorch’s torch.nn

4.3.1 torch.nn.Sequential

The torch.nn.Sequential class is a sequence container in torch.nn, which implements the neural network model by nesting various implementations in the container, the most important thing is that the parameters are automatically passed according to the sequence we have defined.

import torch
from torch.autograd import Variable
batch_n = 100
hidden_layer = 100
input_data = 1000
output_data = 10

x = Variable(torch.randn(batch_n,input_data),requires_grad=False)
y = Variable(torch.randn(batch_n,output_data),requires_grad=False)
models = torch.nn.Sequential(
torch.nn.Linear(input_data,hidden_layer),
torch.nn.ReLU(),
torch.nn.Linear(hidden_layer,output_data)
)
#torch.nn.Sequential is the specific structure of the neural network model we built, Linear completes the linear transformation from the hidden layer to the output layer, and then activates it with the ReLU activation function
#torch.nn.Sequential class is a sequence container in torch.nn, which implements the construction of neural network models by nesting various types of neural network models in the container.
#Most important is, the parameters are automatically passed in the sequence we define.

4.3.2 torch.nn.Linear

The torch.nn.Linear class is used to define the linear layers of the model, that is, to complete the linear transformations between the different layers mentioned earlier. There are 3 parameters accepted by the linear layer: the number of input features, the number of output features, whether to use bias, the default is True, using the torch.nn.Linear class, the weight parameters and biases of the corresponding dimensions will be automatically generated, and for the generated weight parameters and biases, our model defaults to using a better parameter initialization method than the previous simple random method.

4.3.3 torch.nn.ReLU

torch.nn.ReLU belongs to the nonlinear activation classification and does not require input parameters by default when defining. Of course, there are many classes of nonlinear activation functions to choose from in the torch.nn package, such as PReLU, LeaKyReLU, Tanh, Sigmoid, Softmax, etc.

4.3.4 torch.nn.MSELoss

The torch.nn.MSELoss class uses the mean squared error function to calculate the loss value, defining the object of the class without passing in any parameters, but requiring two parameters of the same dimension to be entered when using the instance.

import torch
from torch.autograd import Variable
loss_f = torch.nn.MSELoss()
x = Variable(torch.randn(100,100))
y = Variable(torch.randn(100,100))
loss = loss_f(x,y)
loss.data
#tensor(1.9529)

4.3.4 torch.nn.L1Loss

The torch.nn.L1Loss class uses the average absolute error function to calculate the loss value, and the object of the class is defined without passing in any parameters, but when using the instance, you need to enter two parameters of the same dimension to calculate.

import torch
from torch.autograd import Variable
loss_f = torch.nn.L1Loss()
x = Variable(torch.randn(100,100))
y = Variable(torch.randn(100,100))
loss = loss_f(x,y)
loss.data
#tensor(1.1356)

4.3.5 torch.nn.CrossEntropyLoss

The torch.nn.CrossEntropyLoss class is used to calculate cross-entropy, defining the object of the class without passing in any parameters, but when using the instance, you need to enter two parameters that meet the calculation conditions for cross-entropy.

import torch
from torch.autograd import Variable
loss_f = torch.nn.CrossEntropyLoss()
x = Variable(torch.randn(3,5))
y = Variable(torch.LongTensor(3).random_(5))#3 random numbers which in 0 4
loss = loss_f(x,y)
loss.data
#tensor(2.3413)

4.3.5 Neural networks using loss functions

import torch
from torch.autograd import Variable
import torch
from torch.autograd import Variable
loss_fn = torch.nn.MSELoss()
x = Variable(torch.randn(100,100))
y = Variable(torch.randn(100,100))
loss = loss_fn(x,y)

batch_n = 100
hidden_layer = 100
input_data = 1000
output_data = 10

x = Variable(torch.randn(batch_n,input_data),requires_grad=False)
y = Variable(torch.randn(batch_n,output_data),requires_grad=False)

models = torch.nn.Sequential(
torch.nn.Linear(input_data,hidden_layer),
torch.nn.ReLU(),
torch.nn.Linear(hidden_layer,output_data)
)
#torch.nn.Sequential

Inside the parentheses is the specific structure of the neural network model we built, Linear completes the linear transformation from the hidden layer to the output layer, and then activates it with the ReLU activation function
#torch.nn.Sequential class is a sequence container in torch.nn, which implements the construction of neural network models by nesting various types of neural network models in the container.
#Most important is, the parameters are automatically passed in the sequence we define.

for epoch in range(epoch_n):
y_pred = models(x)

loss = loss_fn(y_pred,y)
if epoch%1000 == 0:
print(“epoch:{},loss:{:.4f}”.format(epoch,loss.data))
models.zero_grad()

loss.backward()

for param in models.parameters():
param.data -= param.grad.data*lr

4.4 Torch.optim of PyTorch

The torch.optim package provides a very large number of classes that enable automatic parameter optimization, such as SGD, AdaGrad, RMSProp, Adam, etc.
Implement neural networks using automatically optimized classes:

import torch
from torch.autograd import Variable

batch_n = 100
hidden_layer = 100
input_data = 1000
output_data = 10

x = Variable(torch.randn(batch_n,input_data),requires_grad=False)
y = Variable(torch.randn(batch_n,output_data),requires_grad=False)

models = torch.nn.Sequential(
torch.nn.Linear(input_data,hidden_layer),
torch.nn.ReLU(),
torch.nn.Linear(hidden_layer,output_data)
)

# loss_fn = torch.nn.MSELoss()
# x = Variable(torch.randn(100,100))
# y = Variable(torch.randn(100,100))
# loss = loss_fn(x,y)

epoch_n=10000
lr=1e-4
loss_fn = torch.nn.MSELoss()

optimzer = torch.optim.Adam(models.parameters(),lr=lr)
#use torch.optim.Adam class as an optimization function for our model parameters, where the inputs are: the optimized parameters and the initial values of the learning rate.
#Because what we need to optimize is all the parameters in the model, so the parameters passed aremodels.parameters()

#To do so, the code for model training is as follows：
for epoch in range(epoch_n):
y_pred = models(x)
loss = loss_fn(y_pred,y)
print(“Epoch:{},Loss:{:.4f}”.format(epoch,loss.data))
optimzer.zero_grad()#Normalize the gradients of the model parameters to 0

loss.backward()
optimzer.step()#The parameters of each node are updated with the calculated gradient values.

Fifth, build a neural network to implement handwritten datasets

5.1 torchvision

Torchvision is a library in PyTorch dedicated to working with images. There are four broad categories in this package.

torchvision.datasets

torchvision.models

torchvision.transforms

torchvision.utils

5.1.1 torchvision.datasets

torchvision.datasets can be downloaded and loaded on some datasets, such as MNIST can be downloaded and loaded using torchvision.datasets.MNIST COCO, ImageNet, CIFCAR, etc.

Here’s the MNIST dataset loaded with torchvision.datasets:

data_train = datasets.MNIST(root=”./data/”,
transform=transform,
train = True,
download = True)
data_test = datasets.MNIST(root=”./data/”,
transform = transform,
train = False)

5.1.2 torchvision.models

Torchvision.models provides us with a trained model that we can use directly after loading.

The submodule of the torchvision.models module contains the following model structure. As:

AlexNet

VGG

ResNet

SqueezeNet

DenseNet, etc

We can directly use the following code to quickly create a model with random initialization of weights:

import torchvision.models as models
resnet18 = models.resnet18()
alexnet = models.alexnet()
squeezenet = models.squeezenet1_0()
densenet = models.densenet_161()

It is also possible to load a pretrained model by using pretrained=True:

import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)

5.1.3 torch.transforms

There are a number of data transformation classes in torch.transforms, such as:

5.1.3.1 torchvision.transforms.Resize

Used to scale the loaded image data to the size we need. The passed parameter can be an integer piece of data or a sequence similar to (h,w). h represents height, w represents width, and if you enter integer data then h and w are equal to this number.

5.1.3.2 torchvision.transforms.Scale

Used to scale the loaded image data to the size we need. Similar to Resize.

5.1.3.3 torchvision.transforms.CenterCrop

Used to crop the loaded picture to the size we need with the image center as the reference point. The argument passed to this class can be either an integer of data or a sequence similar to (h,w).

5.1.3.4 torchvision.transforms.RandomCrop

Used to randomly crop the loaded image to the size we need. The argument passed to this class can be either an integer of data or a sequence similar to (h,w).
5.1.3.5 torchvision.transforms.RandomHorizontalFlip
Used to flip the loaded picture horizontally with random probabilities. We pass a custom random probability to this class, if not defined, use a default probability of 0.5

5.1.3.6 torchvision.transforms.RandomVerticalFlip

Used to flip loaded pictures vertically with random probabilities. We pass a custom random probability to this class, if not defined, use a default probability of 0.5

5.1.3.7 torchvision.transforms.ToTensor

It is used to type the loaded image data, converting the variables that previously formed the PIL image data into tensor data types, allowing PyTorch to calculate and process them.

5.1.3.8 torchvision.transforms.ToPILImage:

It is used to convert the data of tensor variables into PIL image data, mainly for convenient image display.

Here’s how the MNIST dataset is manipulated using transformers:

#torchvision.transforms: Common image transformations, such as cropping, rotating, and so on；
transform=transforms.Compose(
[transforms.ToTensor(), #convert the PILImage type to tensor type!
transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
#Anterior (0.5, 0.5, 0.5) is the mean on the three channels of R G B, followed by (0.5, 0.5, 0.5) is the standard deviation of the three channels
])
#Above code we can put transforms. Compose() is seen as a container that combines multiple data transformations at the same time.
#The parameters passed in are a list, the elements in the list are transforms on the loaded data.

5.1.4 torch.utils

Regarding torchvision.utils we introduce a class for loading data: torch.utils.data.DataLoader and

In the torch.utils.data.DataLoader class, the dataset parameter specifies the name of the dataset we load, batch_size parameter sets the number of images in each package, and the shuffle setting to True means that the loading process will randomly scramble the data and package it.

There are also torchvision.utils.make_grid construct a batch of pictures into a grid pattern of pictures.

images,labels = next(iter(data_loader_train))
# dataiter = iter(data_loader_train)
# images, labels = dataiter.next()

img = torchvision.utils.make_grid(images)

img = img.numpy().transpose(1,2,0)
std = [0.5,0.5,0.5]
mean = [0.5,0.5,0.5]
img = img*std+mean
print([labels[i] for i in range(64)])
plt.imshow(img)

Here, iter and next get a batch of picture data and its corresponding picture label, and then use torchvision.utils.make_grid to construct a batch of pictures into a grid pattern After torchvision.utils.make_grid, the image dimension becomes channel, h, w three-dimensional, because the picture is displayed with matplotlib. The data we want to use is an array and the dimension is (height, weight, channel) i.e. the color channel at the end, so we need to use numpy and transpose to complete the conversion of the original data type and the exchange of data dimensions.

5.2 Model building and parameter optimization

Implement convolutional neural network model construction:

import math
import torch
import torch.nn as nn
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()

#A fully connected layer after the convolutional layer was built, as well as a classifier
self.conv1 = nn.Sequential(
nn.Conv2d(3,64,kernel_size=3,stride=1,padding=1),
nn.ReLU(),
nn.Conv2d(64,128,kernel_size=3,stride=1,padding=1),
nn.ReLU(),
nn.MaxPool2d(stride=2,kernel_size=2)
)

self.dense = torch.nn.Sequential(
nn.Linear(14*14*128,1024),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(1024,10)
)

def forward(self,x):
x=self.conv1(x)
x=x.view(-1,14*14*128)
x=self.dense(x)
return x

5.2.1 torch.nn.Conv2d

The main parameters of the convolutional layer used to build a convolutional neural network are:

Number of input channels, number of output channels, convolutional kernel size, convolutional core movement step, and paddingde values (for filling boundary pixels)

5.2.2 torch.nn.MaxPool2d

To implement the largest pooling layer of the convolutional sister neural network, the main parameters are:

The size of the pooled window, the pooling window movement step and paddingde value

5.2.3 torch.nn.Dropout

It is used to prevent convolutional neural networks from overfitting during training, and the principle is to zero out some parameters of the convolutional neural network model with a certain random probability to achieve the purpose of reducing the neural connections between the adjacent two layers

5.3 Parameter Optimization

After building the model, we can train and optimize the parameters of the model:

model = Model()
cost = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
print(model)

5.3.1 Model Training

n_epochs = 5

for epoch in range(n_epochs):
running_loss = 0.0
running_correct = 0
print(“Epoch {}/{}”.format(epoch,n_epochs))
print(“-”*10)
for data in data_loader_train:
X_train,y_train = data
X_train,y_train = Variable(X_train),Variable(y_train)
outputs = model(X_train)
_,pred=torch.max(outputs.data,1)
optimizer.zero_grad()
loss = cost(outputs,y_train)

loss.backward()
optimizer.step()
running_loss += loss.data
running_correct += torch.sum(pred == y_train.data)
testing_correct = 0
for data in data_loader_test:
X_test,y_test = data
X_test,y_test = Variable(X_test),Variable(y_test)
outputs = model(X_test)
_,pred=torch.max(outputs.data,1)
testing_correct += torch.sum(pred == y_test.data)
print(“Loss is:{:4f},Train Accuracy is:{:.4f}%,Test Accuracy is:{:.4f}”.format(running_loss/len(data_train),100*running_correct/len(data_train)
,100*testing_correct/len(data_test)))

5.4 Model Validation

In order to verify that the model we trained is really as accurate as the known results are displayed, the best way is to randomly select a part of the pictures in the test set, use the trained model to make predictions, see how far from the real value, and visualize the results. The test code is as follows:

data_loader_test = torch.utils.data.DataLoader(dataset=data_test,
batch_size = 4,
shuffle = True)
X_test,y_test = next(iter(data_loader_test))
inputs = Variable(X_test)
pred = model(inputs)
_,pred = torch.max(pred,1)

print(“Predict Label is:”,[i for i in pred.data])
print(“Real Label is:”,[i for i in y_test])
img = torchvision.utils.make_grid(X_test)
img = img.numpy().transpose(1,2,0)

std = [0.5,0.5,0.5]
mean = [0.5,0.5,0.5]
img = img*std+mean
plt.imshow(img)

get:

5.5 Complete Code

import torch
import torchvision
from torchvision import datasets,transforms
from torch.autograd import Variable
import numpy as np
import matplotlib.pyplot as plt
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Lambda(lambda x: x.repeat(3,1,1)),
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
])

data_train = datasets.MNIST(root=”./data/”,
transform=transform,
train = True,
download = True)
data_test = datasets.MNIST(root=”./data/”,
transform = transform,
train = False)

data_loader_train=torch.utils.data.DataLoader(dataset=data_train,
batch_size=64,
shuffle=True,
#num_workers=2
)
data_loader_test=torch.utils.data.DataLoader(dataset=data_test,
batch_size=64,
shuffle=True)
#num_workers=2)
images,labels = next(iter(data_loader_train))
# dataiter = iter(data_loader_train)
# images, labels = dataiter.next()

img = torchvision.utils.make_grid(images)

img = img.numpy().transpose(1,2,0)
std = [0.5,0.5,0.5]
mean = [0.5,0.5,0.5]
img = img*std+mean
print([labels[i] for i in range(64)])
plt.imshow(img)

import math
import torch
import torch.nn as nn
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()

self.conv1 = nn.Sequential(
nn.Conv2d(3,64,kernel_size=3,stride=1,padding=1),
nn.ReLU(),
nn.Conv2d(64,128,kernel_size=3,stride=1,padding=1),
nn.ReLU(),
nn.MaxPool2d(stride=2,kernel_size=2)
)

self.dense = torch.nn.Sequential(
nn.Linear(14*14*128,1024),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(1024,10)
)

def forward(self,x):
x=self.conv1(x)
x=x.view(-1,14*14*128)
x=self.dense(x)
return x

model = Model()
cost = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
print(model)

n_epochs = 5

print(“Predict Label is:”,[i for i in pred.data])
print(“Real Label is:”,[i for i in y_test])
img = torchvision.utils.make_grid(X_test)
img = img.numpy().transpose(1,2,0)

std = [0.5,0.5,0.5]
mean = [0.5,0.5,0.5]
img = img*std+mean
plt.imshow(img)

VI. Conclusion

About a learning summary of pytorch, look forward to everyone to communicate with me, leave a message or private message to my instagram or email, learn together, and progress together!

Take you to engage in deep learning with PyTorch in thousand words

Outline:

Basic data: Tensor:

1.2 torch. FloatTensor

1.3 torch. IntTensor

1.4 torch.randn

1.5 torch.range

1.6 torch.zeros/ones/empty

Second, Tensor’s operation

2.1 torch.abs

2.2 torch.add

2.3 torch.clamp

2.4 torch.div

2.5 torch.pow

2.6 torch.mm

2.7 torch.mv

Third, the neural network toolbox “torch.nn”

3.1 nn. Module class

3.2 Build a simple neural network

Torch implements a complete neural network

4.1 torch.autograd and Variable

4.2 Custom propagation functions

4.3 PyTorch’s torch.nn

4.3.2 torch.nn.Linear

4.3.3 torch.nn.ReLU

4.3.4 torch.nn.MSELoss

4.3.4 torch.nn.L1Loss

4.3.5 torch.nn.CrossEntropyLoss

4.3.5 Neural networks using loss functions

4.4 Torch.optim of PyTorch

Fifth, build a neural network to implement handwritten datasets

5.1 torchvision

5.1.1 torchvision.datasets

5.1.2 torchvision.models

5.1.3 torch.transforms

5.1.3.1 torchvision.transforms.Resize

5.1.3.2 torchvision.transforms.Scale

5.1.3.3 torchvision.transforms.CenterCrop

5.1.3.4 torchvision.transforms.RandomCrop

5.1.3.6 torchvision.transforms.RandomVerticalFlip

5.1.3.7 torchvision.transforms.ToTensor

5.1.3.8 torchvision.transforms.ToPILImage:

5.1.4 torch.utils

5.2 Model building and parameter optimization

5.2.1 torch.nn.Conv2d

5.2.2 torch.nn.MaxPool2d

5.2.3 torch.nn.Dropout

5.3 Parameter Optimization

5.3.1 Model Training

5.4 Model Validation

5.5 Complete Code

VI. Conclusion

Written by KevinLuo

No responses yet