Take you to engage in deep learning with PyTorch in thousand words
Outline:
*Deep learning
*Preface
1.Basic data: Tensor
1.1 Creation of Tensor
1.2 torch. FloatTensor
1.3 torch. IntTensor
1.4 torch.randn
1.5 torch.range
1.6 torch.zeros/ones/empty
2.Second, Tensor’s operation
2.1 torch.abs
2.2 torch.add
2.3 torch.clamp
2.4 torch.div
2.5 torch.pow
2.6 torch.mm
2.7 torch.mv
3.Third, the neural network toolbox torch.nn
3.1 nn. Module class
3.2 Build a simple neural network
4.Torch implements a complete neural network
4.1 torch.autograd and Variable
4.2 Custom propagation functions
4.3 PyTorch’s torch.nn
4.3.1 torch.nn.Sequential
4.3.2 torch.nn.Linear
4.3.3 torch.nn.ReLU
4.3.4 torch.nn.MSELoss
4.3.4 torch.nn.L1Loss
4.3.5 torch.nn.CrossEntropyLoss
4.3.5 Neural networks using loss functions
4.4 Torch.optim of PyTorch
5.Fifth, build a neural network to implement handwritten datasets
5.1 torchvision
5.1.1 torchvision.datasets
5.1.2 torchvision.models
5.1.3 torch.transforms
5.1.3.1 torchvision.transforms.Resize
5.1.3.2 torchvision.transforms.Scale
5.1.3.3 torchvision.transforms.CenterCrop
5.1.3.4 torchvision.transforms.RandomCrop
5.1.3.5 torchvision.transforms.RandomHorizontalFlip
5.1.3.6 torchvision.transforms.RandomVerticalFlip
5.1.3.7 torchvision.transforms.ToTensor
5.1.3.8 torchvision.transforms.ToPILImage:
5.1.4 torch.utils
5.2 Model building and parameter optimization
5.2.1 torch.nn.Conv2d
5.2.2 torch.nn.MaxPool2d
5.2.3 torch.nn.Dropout
5.3 Parameter Optimization
5.3.1 Model Training
5.4 Model Validation
5.5 Complete Code
VI. Conclusion
Preface
Learning a great framework for deep learning is very important, now the mainstream framework is Pytorch and Tesnsorflow, today let’s learn pytorch together!
Basic data: Tensor:
Tensor, or tensors, are basic operands in PyTorch and can be thought of as multidimensional matrices containing elements of a single data type. From a usage point of view, Tensor is very similar to NumPy’s ndarrays, and can be freely converted to each other, except that Tensor also supports GPU acceleration.
Before start, remind…: in my articles,the slash words is code.
1.1 Creation of Tensor
1.2 torch. FloatTensor
torch. FloatTensor is used to generate tensors of data type that are floating-point and passed to torch. The floatTensor parameter can be a list or a dimension value.
import torch
a = torch.FloatTensor(2,3)
b = torch.FloatTensor([2,3,4,5])
a,b
the output is:
(tensor([[1.0561e-38, 1.0102e-38, 9.6429e-39],
[8.4490e-39, 9.6429e-39, 9.1837e-39]]),
tensor([2., 3., 4., 5.]))
1.3 torch. IntTensor
torch. IntTensor is used to generate Tensor of type integer, passed to torch. The parameters of AntTensor can be a list or a dimension value.
import torch
a = torch.FloatTensor(2,3)
b = torch.FloatTensor([2,3,4,5])
a,b
import torch
a = torch.rand(2,3)
a
Get:
tensor([[0.5625, 0.5815, 0.8221],
[0.3589, 0.4180, 0.2158]])
1.4 torch.randn
The method used to generate random Tensor with a floating-point data type and a dimension specified is similar to the method used in numpy.randn to generate random numbers, where the value of a randomly generated floating-point number satisfies a normal distribution with a mean of 0 and a variance of 1.
import torch
a = torch.randn(2,3)
a
Get:
tensor([[-0.0067, -0.0707, -0.6682],
[ 0.8141, 1.1436, 0.5963]])
1.5 torch.range
torch.range is used to generate Tensor with a floating-point data type and a start and end range, so there are three parameters passed to torch.range, namely the start value, the end value, and the step size, where the step size is used to specify the data interval from the start value to the end worth each step.
import torch
a = torch.range(1,20,2)
a
Get:
tensor([ 1., 3., 5., 7., 9., 11., 13., 15., 17., 19.])
1.6 torch.zeros/ones/empty
torch.zeros is used to generate tensors of data type with floating-point types and specified dimensions, but the element values in this floating-point tensor are all 0.
torch.ones generates an array of all 1s.
torch.empty creates an uninitialized value tensor, the size of the tensor is determined by size, size: defines the shape of tensor, which can be either a list or a tuple.
import torch
a = torch.zeros(2,3)
a
Get:
tensor([[0., 0., 0.],
[0., 0., 0.]])
Second, Tensor’s operation
2.1 torch.abs
After passing the parameter to torch.abs returns the absolute value of the input parameter as an output, the input parameter must be a variable of tensor data type, such as:
import torch
a = torch.randn(2,3)
a
The resulting a is:
tensor([[ 0.0948, 0.0530, -0.0986],
[ 1.8926, -2.0569, 1.6617]])
A is treated with abs:
b = torch.abs(a)
b
Get:
tensor([[0.0948, 0.0530, 0.0986],
[1.8926, 2.0569, 1.6617]])
2.2 torch.add
Passing the parameters to torch.add returns the summation result of the input parameters as output, which can be either all variables of the Tensor data type, one variable of the Tensor data type, or one scalar.
import torch
a = torch.randn(2,3)
a
#tensor([[-0.1146, -0.3282, -0.2517],
# [-0.2474, 0.8323, -0.9292]])
b = torch.randn(2,3)
b
#tensor([[ 0.9526, 1.5841, -3.2665],
# [-0.4831, 0.9259, -0.5054]])
c = torch.add(a,b)
c
output of c:
tensor([[ 0.8379, 1.2559, -3.5182],
[-0.7305, 1.7582, -1.4346]])
Another one:
d = torch.randn(2,3)
d
#we get d is…
#tensor([[ 0.1473, 0.7631, -0.1953],
# [-0.2796, -0.7265, 0.7142]])
We add d to a scalar 10:
e = torch.add(d,10)
e
Get:
tensor([[10.1473, 10.7631, 9.8047],
[ 9.7204, 9.2735, 10.7142]])
2.3 torch.clamp
torch.clamp is to crop the input parameters according to a custom range, and finally the result of the parameter cropping as output, so the input parameters have a total of three, namely the variable of the Tensor data type that needs to be cropped, the upper border of the crop and the lower boundary of the crop, the specific cropping process is: use each element in the variable to compare the value of the upper border of the crop and the lower boundary of the crop, if the value of the element is less than the value of the lower border of the crop, The element is rewritten to the value of the lower boundary of the crop; Similarly, if the value of an element is greater than the value of the clipped upper bound, the element is rewritten to the value of the clipped upper bound. Let’s look directly at the example:
a = torch.randn(2,3)
a
#We get a is:
#tensor([[-1.4049, 1.0336, 1.2820],
# [ 0.7610, -1.7475, 0.2414]])
We do the clamp operation on b:
b = torch.clamp(a,-0.1,0.1)
b
#We get b is:
#tensor([[-0.1000, 0.1000, 0.1000],
# [ 0.1000, -0.1000, 0.1000]])
2.4 torch.div
Torch.div is the result of passing the argument to torch.div and returning the quotient result of the input parameter as output, likewise, the parameters participating in the operation can all be variables of the Tensor data type, or they can be a combination of variables of the Tensor data type and scalar. Let’s look at the examples.
a = torch.randn(2,3)
a
#We get a …:
#tensor([[ 0.6276, 0.6397, -0.0762],
# [-0.4193, -0.5528, 1.5192]])
b = torch.randn(2,3)
b
#We get b…:
#tensor([[ 0.9219, 0.2120, 0.1155],
# [ 1.1086, -1.1442, 0.2999]])
Div operations are performed on a, b
c = torch.div(a,b)
c
#get c…:
#tensor([[ 0.6808, 3.0173, -0.6602],
# [-0.3782, 0.4831, 5.0657]])
2.5 torch.pow
torch.pow: After passing the parameters to torch.pow, the result of the power of the input parameters is returned as output, and the parameters participating in the operation can all be variables of the Tensor data type, or they can be a combination of variables of the Tensor data type and scalars.
a = torch.randn(2,3)
a
#Get a is…:
#tensor([[ 0.3896, -0.1475, 0.1104],
# [-0.6908, -0.0472, -1.5310]])
Square a
b = torch.pow(a,2)
b
#We get b is the square of a:
#tensor([[1.5181e-01, 2.1767e-02, 1.2196e-02],
# [4.7722e-01, 2.2276e-03, 2.3441e+00]])
2.6 torch.mm
orch.mm: pass the parameter to the torch.mm and return the product result of the input parameter as an output, but this product is not the same as the previous torch.mul operation method, torch.mm use the multiplication rules between the matrices for calculation, so the passed parameters will be treated as a matrix, and the dimension of the parameter naturally also meets the preconditions of matrix multiplication, that is, the number of rows of the previous matrix must be equal to the number of columns of the latter matrix
Let’s look at an example:
a = torch.randn(2,3)
a
#We get a…:
#tensor([[ 0.1057, 0.0104, -0.1547],
# [ 0.5010, -0.0735, 0.4067]])
and…
b = torch.randn(2,3)
b
#We get b:
#tensor([[ 1.1971, -1.4010, 1.1277],
# [-0.3076, 0.9171, 1.9135]])
Then we perform matrix multiplication operations with the resulting a,b:
c = torch.mm(a,b.T)
c
#tensor([[-0.0625, -0.3190],
# [ 1.1613, 0.5567]])
2.7 torch.mv
After passing the parameters to the torch.mv returns the product result of the input parameters as output, torch.mv calculated using the multiplication rules between the matrix and the vector, the first parameter passed in represents the matrix, and the second parameter represents the vector, and the order cannot be reversed.
Let’s look at an example:
a = torch.randn(2,3)
a
#We get a…:
#tensor([[ 1.0909, -1.1679, 0.3161],
[-0.8952, -2.1351, -0.9667]])
b = torch.randn(3)
b
#We get b…:
#tensor([-1.4689, 1.6197, 0.7209])
Then we perform matrix multiplication operations with the resulting a,b:
c = torch.mv(a,b)
c
#tensor([-3.2663, -2.8402])
Third, the neural network toolbox “torch.nn”
Although the torch.autograd library implements autodevation and gradient backpropagation, if we want to complete the training of a model, we still need the automatic update of handwritten parameters and the control of the training process, which is still not convenient enough. To this end, PyTorch further provides a more integrated modular interface, torch.nn, which is built on Autograd and provides a range of functions such as network modules, optimizers, and initialization strategies.
3.1 nn. Module class
nn. Module is a neural network class provided by PyTorch, and implements the definition of each layer of the network and the forward computation and backpropagation mechanism in the class. In practical use, if you want to implement a neural network, you only need to inherit nn. Module, which defines the model structure and parameters in initialization, and writes a network forward procedure in the function forward().
1.nn. Parameter function
2.forward() function with backpropagation
3. Nesting of multiple Modules
4.nn. Module and nn.functional library
5.nn. Sequential() module
#We use “torch.nn” to implement a MLP
from torch import nn
class MLP(nn.Module):
# class MLP inherited from nn.Module
def __init__(self, in_dim, hid_dim1, hid_dim2, out_dim):
super(MLP, self).__init__()
self.layer = nn.Sequential(
nn.Linear(in_dim, hid_dim1),
nn.ReLU(),
nn.Linear(hid_dim1, hid_dim2),
nn.ReLU(),
nn.Linear(hid_dim2, out_dim),
nn.ReLU()
)
def forward(self, x):
x = self.layer(x)
return x
3.2 Build a simple neural network
Below we use torch to build a simple neural network:
1, we set the input node to 1000, the hidden layer node to 100, the output layer node to 10
2, input 100 data with 1000 features, after the hidden layer into 100 features with 10 classification results, and then the result will be propagated backwards
import torch
batch_n = 100 #the numbers of input data every single batch
hidden_layer = 100
input_data = 1000 #the features of every data is 1000
output_data = 10
x = torch.randn(batch_n,input_data)
y = torch.randn(batch_n,output_data)
w1 = torch.randn(input_data,hidden_layer)
w2 = torch.randn(hidden_layer,output_data)
epoch_n = 20
lr = 1e-6
for epoch in range(epoch_n):
h1=x.mm(w1)#(100,1000)*(1000,100) →100*100
print(h1.shape)
h1=h1.clamp(min=0)
y_pred = h1.mm(w2)
loss = (y_pred-y).pow(2).sum()
print(“epoch:{},loss:{:.4f}”.format(epoch,loss))
grad_y_pred = 2*(y_pred-y)
grad_w2 = h1.t().mm(grad_y_pred)
grad_h = grad_y_pred.clone()
grad_h = grad_h.mm(w2.t())
grad_h.clamp_(min=0)#Assign all values less than 0 to 0,Equivalent sigmoid
grad_w1 = x.t().mm(grad_h)
w1 = w1 -lr*grad_w1
w2 = w2 -lr*grad_w2
then … we get
torch.Size([100, 100])
epoch:0,loss:112145.7578
torch.Size([100, 100])
epoch:1,loss:110014.8203
torch.Size([100, 100])
epoch:2,loss:107948.0156
torch.Size([100, 100])
epoch:3,loss:105938.6719
torch.Size([100, 100])
epoch:4,loss:103985.1406
torch.Size([100, 100])
epoch:5,loss:102084.9609
torch.Size([100, 100])
epoch:6,loss:100236.9844
torch.Size([100, 100])
epoch:7,loss:98443.3359
torch.Size([100, 100])
epoch:8,loss:96699.5938
torch.Size([100, 100])
epoch:9,loss:95002.5234
torch.Size([100, 100])
epoch:10,loss:93349.7969
torch.Size([100, 100])
epoch:11,loss:91739.8438
torch.Size([100, 100])
epoch:12,loss:90171.6875
torch.Size([100, 100])
epoch:13,loss:88643.1094
torch.Size([100, 100])
epoch:14,loss:87152.6406
torch.Size([100, 100])
epoch:15,loss:85699.4297
torch.Size([100, 100])
epoch:16,loss:84282.2500
torch.Size([100, 100])
epoch:17,loss:82899.9062
torch.Size([100, 100])
epoch:18,loss:81550.3984
torch.Size([100, 100])
epoch:19,loss:80231.1484
lower and lower loss.
Torch implements a complete neural network
4.1 torch.autograd and Variable
The main function of the torch.autograd package is to complete the chain differentiation in the backward propagation of the neural network, and manually writing these derivative programs will lead to the phenomenon of repeated wheel building.
The function process of automatic gradient is roughly as follows: first generate a calculation graph in the forward propagation process of the neural network through variables of the input Tensor data type, and then accurately calculate the gradient that needs to be updated for each parameter according to this calculation graph and the output result, and complete the gradient update of the parameter by completing the propagation.
The Variable class in the torch.autograd package that is needed to complete the automatic gradient encapsulates the tensor data type variables we define, and after the encapsulation, each node in the calculation graph is a Variable object, so that the function of the automatic gradient can be applied.
Below we use autograd to implement a neural network model with a two-tier structure.
import torch
from torch.autograd import Variable
batch_n = 100
hidden_layer = 100
input_data = 1000
output_data = 10
x = Variable(torch.randn(batch_n,input_data),requires_grad=False)
y = Variable(torch.randn(batch_n,output_data),requires_grad=False)
#Encapsulated operations on tensor data type variables with Variable. requires_grad if it is False, it means that the variable does not retain the gradient value during the automatic gradient calculation.
w1 = Variable(torch.randn(input_data,hidden_layer),requires_grad=True)
w2 = Variable(torch.randn(hidden_layer,output_data),requires_grad=True)
#learning rate&number of iterations assignment
epoch_n=50
lr=1e-6
for epoch in range(epoch_n):
h1=x.mm(w1)#(100,1000)*(1000,100) →100*100
print(h1.shape)
h1=h1.clamp(min=0)
y_pred = h1.mm(w2)
#y_pred = x.mm(w1).clamp(min=0).mm(w2)
loss = (y_pred-y).pow(2).sum()
print(“epoch:{},loss:{:.4f}”.format(epoch,loss.data))
# grad_y_pred = 2*(y_pred-y)
# grad_w2 = h1.t().mm(grad_y_pred)
loss.backward()#backprobagation
# grad_h = grad_y_pred.clone()
# grad_h = grad_h.mm(w2.t())
# grad_h.clamp_(min=0)#Assign all values less than 0 to 0, which is equivalent to a sigmoid
# grad_w1 = x.t().mm(grad_h)
w1.data -= lr*w1.grad.data
w2.data -= lr*w2.grad.data
w1.grad.data.zero_()
w2.grad.data.zero_()
# w1 = w1 -lr*grad_w1
# w2 = w2 -lr*grad_w2
And we get…
get the result:
torch.Size([100, 100])
epoch:0,loss:54572212.0000
torch.Size([100, 100])
epoch:1,loss:133787328.0000
torch.Size([100, 100])
epoch:2,loss:491439904.0000
torch.Size([100, 100])
epoch:3,loss:683004416.0000
torch.Size([100, 100])
epoch:4,loss:13681055.0000
torch.Size([100, 100])
epoch:5,loss:8058388.0000
torch.Size([100, 100])
epoch:6,loss:5327059.5000
torch.Size([100, 100])
epoch:7,loss:3777382.5000
torch.Size([100, 100])
epoch:8,loss:2818449.5000
torch.Size([100, 100])
epoch:9,loss:2190285.0000
torch.Size([100, 100])
epoch:10,loss:1760991.0000
torch.Size([100, 100])
epoch:11,loss:1457116.3750
torch.Size([100, 100])
epoch:12,loss:1235850.6250
torch.Size([100, 100])
epoch:13,loss:1069994.0000
torch.Size([100, 100])
epoch:14,loss:942082.4375
torch.Size([100, 100])
epoch:15,loss:841170.6250
torch.Size([100, 100])
epoch:16,loss:759670.1875
torch.Size([100, 100])
epoch:17,loss:692380.5625
torch.Size([100, 100])
epoch:18,loss:635755.0625
torch.Size([100, 100])
epoch:19,loss:587267.1250
torch.Size([100, 100])
epoch:20,loss:545102.0000
torch.Size([100, 100])
epoch:21,loss:508050.6250
torch.Size([100, 100])
epoch:22,loss:475169.9375
torch.Size([100, 100])
epoch:23,loss:445762.8750
torch.Size([100, 100])
epoch:24,loss:419216.2812
torch.Size([100, 100])
epoch:25,loss:395124.9375
torch.Size([100, 100])
epoch:26,loss:373154.8438
torch.Size([100, 100])
epoch:27,loss:352987.6875
torch.Size([100, 100])
epoch:28,loss:334429.0000
torch.Size([100, 100])
epoch:29,loss:317317.7500
torch.Size([100, 100])
epoch:30,loss:301475.8125
torch.Size([100, 100])
epoch:31,loss:286776.8750
torch.Size([100, 100])
epoch:32,loss:273114.4062
torch.Size([100, 100])
epoch:33,loss:260383.6406
torch.Size([100, 100])
epoch:34,loss:248532.8125
torch.Size([100, 100])
epoch:35,loss:237452.3750
torch.Size([100, 100])
epoch:36,loss:227080.5156
torch.Size([100, 100])
epoch:37,loss:217362.9375
torch.Size([100, 100])
epoch:38,loss:208250.5312
torch.Size([100, 100])
epoch:39,loss:199686.1094
torch.Size([100, 100])
epoch:40,loss:191620.0312
torch.Size([100, 100])
epoch:41,loss:184017.4375
torch.Size([100, 100])
epoch:42,loss:176841.0156
torch.Size([100, 100])
epoch:43,loss:170073.1719
torch.Size([100, 100])
epoch:44,loss:163686.5000
torch.Size([100, 100])
epoch:45,loss:157641.5000
torch.Size([100, 100])
epoch:46,loss:151907.0000
torch.Size([100, 100])
epoch:47,loss:146470.1250
torch.Size([100, 100])
epoch:48,loss:141305.3594
torch.Size([100, 100])
epoch:49,loss:136396.7031
Same, lower and lower loss…
4.2 Custom propagation functions
In fact, in addition to the automatic gradient method, we can also complete the rewriting of the forward propagation function and the backward propagation function by building a new class that inherits the torch.nn.Module. In this new class, we use forward as the keyword for the forward propagation function and backward as the keyword for the backward propagation function. Let’s do a custom propagation function:
import torch
from torch.autograd import Variable
batch_n = 64
hidden_layer = 100
input_data = 1000
output_data = 10
class Model(torch.nn.Module): #Complete the operation of class inheritance
def __init__(self):
super(Model,self).__init__()#initial the class
def forward(self,input,w1,w2):
x = torch.mm(input,w1)
x = torch.clamp(x,min = 0)
x = torch.mm(x,w2)
return x
def backward(self):
pass
model = Model()
x = Variable(torch.randn(batch_n,input_data),requires_grad=False)
y = Variable(torch.randn(batch_n,output_data),requires_grad=False)
#Encapsulated operations on tensor data type variables with Variable. requires_grad if it is F, it means that the variable does not retain the gradient value during the automatic gradient calculation.
w1 = Variable(torch.randn(input_data,hidden_layer),requires_grad=True)
w2 = Variable(torch.randn(hidden_layer,output_data),requires_grad=True)
epoch_n=30
for epoch in range(epoch_n):
y_pred = model(x,w1,w2)
loss = (y_pred-y).pow(2).sum()
print(“epoch:{},loss:{:.4f}”.format(epoch,loss.data))
loss.backward()
w1.data -= lr*w1.grad.data
w2.data -= lr*w2.grad.data
w1.grad.data.zero_()
w2.grad.data.zero_()
And we get the result:
4.3 PyTorch’s torch.nn
4.3.1 torch.nn.Sequential
The torch.nn.Sequential class is a sequence container in torch.nn, which implements the neural network model by nesting various implementations in the container, the most important thing is that the parameters are automatically passed according to the sequence we have defined.
import torch
from torch.autograd import Variable
batch_n = 100
hidden_layer = 100
input_data = 1000
output_data = 10
x = Variable(torch.randn(batch_n,input_data),requires_grad=False)
y = Variable(torch.randn(batch_n,output_data),requires_grad=False)
models = torch.nn.Sequential(
torch.nn.Linear(input_data,hidden_layer),
torch.nn.ReLU(),
torch.nn.Linear(hidden_layer,output_data)
)
#torch.nn.Sequential is the specific structure of the neural network model we built, Linear completes the linear transformation from the hidden layer to the output layer, and then activates it with the ReLU activation function
#torch.nn.Sequential class is a sequence container in torch.nn, which implements the construction of neural network models by nesting various types of neural network models in the container.
#Most important is, the parameters are automatically passed in the sequence we define.
4.3.2 torch.nn.Linear
The torch.nn.Linear class is used to define the linear layers of the model, that is, to complete the linear transformations between the different layers mentioned earlier. There are 3 parameters accepted by the linear layer: the number of input features, the number of output features, whether to use bias, the default is True, using the torch.nn.Linear class, the weight parameters and biases of the corresponding dimensions will be automatically generated, and for the generated weight parameters and biases, our model defaults to using a better parameter initialization method than the previous simple random method.
4.3.3 torch.nn.ReLU
torch.nn.ReLU belongs to the nonlinear activation classification and does not require input parameters by default when defining. Of course, there are many classes of nonlinear activation functions to choose from in the torch.nn package, such as PReLU, LeaKyReLU, Tanh, Sigmoid, Softmax, etc.
4.3.4 torch.nn.MSELoss
The torch.nn.MSELoss class uses the mean squared error function to calculate the loss value, defining the object of the class without passing in any parameters, but requiring two parameters of the same dimension to be entered when using the instance.
import torch
from torch.autograd import Variable
loss_f = torch.nn.MSELoss()
x = Variable(torch.randn(100,100))
y = Variable(torch.randn(100,100))
loss = loss_f(x,y)
loss.data
#tensor(1.9529)
4.3.4 torch.nn.L1Loss
The torch.nn.L1Loss class uses the average absolute error function to calculate the loss value, and the object of the class is defined without passing in any parameters, but when using the instance, you need to enter two parameters of the same dimension to calculate.
import torch
from torch.autograd import Variable
loss_f = torch.nn.L1Loss()
x = Variable(torch.randn(100,100))
y = Variable(torch.randn(100,100))
loss = loss_f(x,y)
loss.data
#tensor(1.1356)
4.3.5 torch.nn.CrossEntropyLoss
The torch.nn.CrossEntropyLoss class is used to calculate cross-entropy, defining the object of the class without passing in any parameters, but when using the instance, you need to enter two parameters that meet the calculation conditions for cross-entropy.
import torch
from torch.autograd import Variable
loss_f = torch.nn.CrossEntropyLoss()
x = Variable(torch.randn(3,5))
y = Variable(torch.LongTensor(3).random_(5))#3 random numbers which in 0 4
loss = loss_f(x,y)
loss.data
#tensor(2.3413)
4.3.5 Neural networks using loss functions
import torch
from torch.autograd import Variable
import torch
from torch.autograd import Variable
loss_fn = torch.nn.MSELoss()
x = Variable(torch.randn(100,100))
y = Variable(torch.randn(100,100))
loss = loss_fn(x,y)
batch_n = 100
hidden_layer = 100
input_data = 1000
output_data = 10
x = Variable(torch.randn(batch_n,input_data),requires_grad=False)
y = Variable(torch.randn(batch_n,output_data),requires_grad=False)
models = torch.nn.Sequential(
torch.nn.Linear(input_data,hidden_layer),
torch.nn.ReLU(),
torch.nn.Linear(hidden_layer,output_data)
)
#torch.nn.Sequential
Inside the parentheses is the specific structure of the neural network model we built, Linear completes the linear transformation from the hidden layer to the output layer, and then activates it with the ReLU activation function
#torch.nn.Sequential class is a sequence container in torch.nn, which implements the construction of neural network models by nesting various types of neural network models in the container.
#Most important is, the parameters are automatically passed in the sequence we define.
for epoch in range(epoch_n):
y_pred = models(x)
loss = loss_fn(y_pred,y)
if epoch%1000 == 0:
print(“epoch:{},loss:{:.4f}”.format(epoch,loss.data))
models.zero_grad()
loss.backward()
for param in models.parameters():
param.data -= param.grad.data*lr
4.4 Torch.optim of PyTorch
The torch.optim package provides a very large number of classes that enable automatic parameter optimization, such as SGD, AdaGrad, RMSProp, Adam, etc.
Implement neural networks using automatically optimized classes:
import torch
from torch.autograd import Variable
batch_n = 100
hidden_layer = 100
input_data = 1000
output_data = 10
x = Variable(torch.randn(batch_n,input_data),requires_grad=False)
y = Variable(torch.randn(batch_n,output_data),requires_grad=False)
models = torch.nn.Sequential(
torch.nn.Linear(input_data,hidden_layer),
torch.nn.ReLU(),
torch.nn.Linear(hidden_layer,output_data)
)
# loss_fn = torch.nn.MSELoss()
# x = Variable(torch.randn(100,100))
# y = Variable(torch.randn(100,100))
# loss = loss_fn(x,y)
epoch_n=10000
lr=1e-4
loss_fn = torch.nn.MSELoss()
optimzer = torch.optim.Adam(models.parameters(),lr=lr)
#use torch.optim.Adam class as an optimization function for our model parameters, where the inputs are: the optimized parameters and the initial values of the learning rate.
#Because what we need to optimize is all the parameters in the model, so the parameters passed aremodels.parameters()
#To do so, the code for model training is as follows:
for epoch in range(epoch_n):
y_pred = models(x)
loss = loss_fn(y_pred,y)
print(“Epoch:{},Loss:{:.4f}”.format(epoch,loss.data))
optimzer.zero_grad()#Normalize the gradients of the model parameters to 0
loss.backward()
optimzer.step()#The parameters of each node are updated with the calculated gradient values.
Fifth, build a neural network to implement handwritten datasets
5.1 torchvision
Torchvision is a library in PyTorch dedicated to working with images. There are four broad categories in this package.
torchvision.datasets
torchvision.models
torchvision.transforms
torchvision.utils
5.1.1 torchvision.datasets
torchvision.datasets can be downloaded and loaded on some datasets, such as MNIST can be downloaded and loaded using torchvision.datasets.MNIST COCO, ImageNet, CIFCAR, etc.
Here’s the MNIST dataset loaded with torchvision.datasets:
data_train = datasets.MNIST(root=”./data/”,
transform=transform,
train = True,
download = True)
data_test = datasets.MNIST(root=”./data/”,
transform = transform,
train = False)
5.1.2 torchvision.models
Torchvision.models provides us with a trained model that we can use directly after loading.
The submodule of the torchvision.models module contains the following model structure. As:
AlexNet
VGG
ResNet
SqueezeNet
DenseNet, etc
We can directly use the following code to quickly create a model with random initialization of weights:
import torchvision.models as models
resnet18 = models.resnet18()
alexnet = models.alexnet()
squeezenet = models.squeezenet1_0()
densenet = models.densenet_161()
It is also possible to load a pretrained model by using pretrained=True:
import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
5.1.3 torch.transforms
There are a number of data transformation classes in torch.transforms, such as:
5.1.3.1 torchvision.transforms.Resize
Used to scale the loaded image data to the size we need. The passed parameter can be an integer piece of data or a sequence similar to (h,w). h represents height, w represents width, and if you enter integer data then h and w are equal to this number.
5.1.3.2 torchvision.transforms.Scale
Used to scale the loaded image data to the size we need. Similar to Resize.
5.1.3.3 torchvision.transforms.CenterCrop
Used to crop the loaded picture to the size we need with the image center as the reference point. The argument passed to this class can be either an integer of data or a sequence similar to (h,w).
5.1.3.4 torchvision.transforms.RandomCrop
Used to randomly crop the loaded image to the size we need. The argument passed to this class can be either an integer of data or a sequence similar to (h,w).
5.1.3.5 torchvision.transforms.RandomHorizontalFlip
Used to flip the loaded picture horizontally with random probabilities. We pass a custom random probability to this class, if not defined, use a default probability of 0.5
5.1.3.6 torchvision.transforms.RandomVerticalFlip
Used to flip loaded pictures vertically with random probabilities. We pass a custom random probability to this class, if not defined, use a default probability of 0.5
5.1.3.7 torchvision.transforms.ToTensor
It is used to type the loaded image data, converting the variables that previously formed the PIL image data into tensor data types, allowing PyTorch to calculate and process them.
5.1.3.8 torchvision.transforms.ToPILImage:
It is used to convert the data of tensor variables into PIL image data, mainly for convenient image display.
Here’s how the MNIST dataset is manipulated using transformers:
#torchvision.transforms: Common image transformations, such as cropping, rotating, and so on;
transform=transforms.Compose(
[transforms.ToTensor(), #convert the PILImage type to tensor type!
transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
#Anterior (0.5, 0.5, 0.5) is the mean on the three channels of R G B, followed by (0.5, 0.5, 0.5) is the standard deviation of the three channels
])
#Above code we can put transforms. Compose() is seen as a container that combines multiple data transformations at the same time.
#The parameters passed in are a list, the elements in the list are transforms on the loaded data.
5.1.4 torch.utils
Regarding torchvision.utils we introduce a class for loading data: torch.utils.data.DataLoader and
In the torch.utils.data.DataLoader class, the dataset parameter specifies the name of the dataset we load, batch_size parameter sets the number of images in each package, and the shuffle setting to True means that the loading process will randomly scramble the data and package it.
data_loader_train=torch.utils.data.DataLoader(dataset=data_train,
batch_size=64,
shuffle=True,
#num_workers=2
)
data_loader_test=torch.utils.data.DataLoader(dataset=data_test,
batch_size=64,
shuffle=True)
#num_workers=2)
There are also torchvision.utils.make_grid construct a batch of pictures into a grid pattern of pictures.
images,labels = next(iter(data_loader_train))
# dataiter = iter(data_loader_train)
# images, labels = dataiter.next()
img = torchvision.utils.make_grid(images)
img = img.numpy().transpose(1,2,0)
std = [0.5,0.5,0.5]
mean = [0.5,0.5,0.5]
img = img*std+mean
print([labels[i] for i in range(64)])
plt.imshow(img)
Here, iter and next get a batch of picture data and its corresponding picture label, and then use torchvision.utils.make_grid to construct a batch of pictures into a grid pattern After torchvision.utils.make_grid, the image dimension becomes channel, h, w three-dimensional, because the picture is displayed with matplotlib. The data we want to use is an array and the dimension is (height, weight, channel) i.e. the color channel at the end, so we need to use numpy and transpose to complete the conversion of the original data type and the exchange of data dimensions.
5.2 Model building and parameter optimization
Implement convolutional neural network model construction:
import math
import torch
import torch.nn as nn
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
#A fully connected layer after the convolutional layer was built, as well as a classifier
self.conv1 = nn.Sequential(
nn.Conv2d(3,64,kernel_size=3,stride=1,padding=1),
nn.ReLU(),
nn.Conv2d(64,128,kernel_size=3,stride=1,padding=1),
nn.ReLU(),
nn.MaxPool2d(stride=2,kernel_size=2)
)
self.dense = torch.nn.Sequential(
nn.Linear(14*14*128,1024),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(1024,10)
)
def forward(self,x):
x=self.conv1(x)
x=x.view(-1,14*14*128)
x=self.dense(x)
return x
5.2.1 torch.nn.Conv2d
The main parameters of the convolutional layer used to build a convolutional neural network are:
Number of input channels, number of output channels, convolutional kernel size, convolutional core movement step, and paddingde values (for filling boundary pixels)
5.2.2 torch.nn.MaxPool2d
To implement the largest pooling layer of the convolutional sister neural network, the main parameters are:
The size of the pooled window, the pooling window movement step and paddingde value
5.2.3 torch.nn.Dropout
It is used to prevent convolutional neural networks from overfitting during training, and the principle is to zero out some parameters of the convolutional neural network model with a certain random probability to achieve the purpose of reducing the neural connections between the adjacent two layers
5.3 Parameter Optimization
After building the model, we can train and optimize the parameters of the model:
model = Model()
cost = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
print(model)
5.3.1 Model Training
n_epochs = 5
for epoch in range(n_epochs):
running_loss = 0.0
running_correct = 0
print(“Epoch {}/{}”.format(epoch,n_epochs))
print(“-”*10)
for data in data_loader_train:
X_train,y_train = data
X_train,y_train = Variable(X_train),Variable(y_train)
outputs = model(X_train)
_,pred=torch.max(outputs.data,1)
optimizer.zero_grad()
loss = cost(outputs,y_train)
loss.backward()
optimizer.step()
running_loss += loss.data
running_correct += torch.sum(pred == y_train.data)
testing_correct = 0
for data in data_loader_test:
X_test,y_test = data
X_test,y_test = Variable(X_test),Variable(y_test)
outputs = model(X_test)
_,pred=torch.max(outputs.data,1)
testing_correct += torch.sum(pred == y_test.data)
print(“Loss is:{:4f},Train Accuracy is:{:.4f}%,Test Accuracy is:{:.4f}”.format(running_loss/len(data_train),100*running_correct/len(data_train)
,100*testing_correct/len(data_test)))
5.4 Model Validation
In order to verify that the model we trained is really as accurate as the known results are displayed, the best way is to randomly select a part of the pictures in the test set, use the trained model to make predictions, see how far from the real value, and visualize the results. The test code is as follows:
data_loader_test = torch.utils.data.DataLoader(dataset=data_test,
batch_size = 4,
shuffle = True)
X_test,y_test = next(iter(data_loader_test))
inputs = Variable(X_test)
pred = model(inputs)
_,pred = torch.max(pred,1)
print(“Predict Label is:”,[i for i in pred.data])
print(“Real Label is:”,[i for i in y_test])
img = torchvision.utils.make_grid(X_test)
img = img.numpy().transpose(1,2,0)
std = [0.5,0.5,0.5]
mean = [0.5,0.5,0.5]
img = img*std+mean
plt.imshow(img)
get:
5.5 Complete Code
import torch
import torchvision
from torchvision import datasets,transforms
from torch.autograd import Variable
import numpy as np
import matplotlib.pyplot as plt
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Lambda(lambda x: x.repeat(3,1,1)),
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
])
data_train = datasets.MNIST(root=”./data/”,
transform=transform,
train = True,
download = True)
data_test = datasets.MNIST(root=”./data/”,
transform = transform,
train = False)
data_loader_train=torch.utils.data.DataLoader(dataset=data_train,
batch_size=64,
shuffle=True,
#num_workers=2
)
data_loader_test=torch.utils.data.DataLoader(dataset=data_test,
batch_size=64,
shuffle=True)
#num_workers=2)
images,labels = next(iter(data_loader_train))
# dataiter = iter(data_loader_train)
# images, labels = dataiter.next()
img = torchvision.utils.make_grid(images)
img = img.numpy().transpose(1,2,0)
std = [0.5,0.5,0.5]
mean = [0.5,0.5,0.5]
img = img*std+mean
print([labels[i] for i in range(64)])
plt.imshow(img)
import math
import torch
import torch.nn as nn
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(3,64,kernel_size=3,stride=1,padding=1),
nn.ReLU(),
nn.Conv2d(64,128,kernel_size=3,stride=1,padding=1),
nn.ReLU(),
nn.MaxPool2d(stride=2,kernel_size=2)
)
self.dense = torch.nn.Sequential(
nn.Linear(14*14*128,1024),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(1024,10)
)
def forward(self,x):
x=self.conv1(x)
x=x.view(-1,14*14*128)
x=self.dense(x)
return x
model = Model()
cost = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
print(model)
n_epochs = 5
for epoch in range(n_epochs):
running_loss = 0.0
running_correct = 0
print(“Epoch {}/{}”.format(epoch,n_epochs))
print(“-”*10)
for data in data_loader_train:
X_train,y_train = data
X_train,y_train = Variable(X_train),Variable(y_train)
outputs = model(X_train)
_,pred=torch.max(outputs.data,1)
optimizer.zero_grad()
loss = cost(outputs,y_train)
loss.backward()
optimizer.step()
running_loss += loss.data
running_correct += torch.sum(pred == y_train.data)
testing_correct = 0
for data in data_loader_test:
X_test,y_test = data
X_test,y_test = Variable(X_test),Variable(y_test)
outputs = model(X_test)
_,pred=torch.max(outputs.data,1)
testing_correct += torch.sum(pred == y_test.data)
print(“Loss is:{:4f},Train Accuracy is:{:.4f}%,Test Accuracy is:{:.4f}”.format(running_loss/len(data_train),100*running_correct/len(data_train)
,100*testing_correct/len(data_test)))
data_loader_test = torch.utils.data.DataLoader(dataset=data_test,
batch_size = 4,
shuffle = True)
X_test,y_test = next(iter(data_loader_test))
inputs = Variable(X_test)
pred = model(inputs)
_,pred = torch.max(pred,1)
print(“Predict Label is:”,[i for i in pred.data])
print(“Real Label is:”,[i for i in y_test])
img = torchvision.utils.make_grid(X_test)
img = img.numpy().transpose(1,2,0)
std = [0.5,0.5,0.5]
mean = [0.5,0.5,0.5]
img = img*std+mean
plt.imshow(img)
VI. Conclusion
About a learning summary of pytorch, look forward to everyone to communicate with me, leave a message or private message to my instagram or email, learn together, and progress together!