pytorch使用小结

针对pytorch官方文档常用的api进行梳理。重使用不重原理实现,以经验为主。

可能会读一些简单的源码,解释不一定准确,可能随时会改。

配置Confuguration

  1. 固定种子

    seed = 2
    torch.manual_seed(seed)
    
  2. cuda

    • torch.cuda.is_available 判断运行环境是否支持CUDA
    • torch.device 根据参数获取计算设备
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    

张量Tensor

tensor和array的区别主要在于:tensor可以驻留在GPU上加速。所以pytorch中网络输入输出也是tensor。

创建

  1. 通用方法:torch.tensor / torch.as_tensor 将传入的数据(list or numpy.array)转化为tensor

    如果传入numpy array,torch.tensor会复制一份新的数据,而torch.as_tensor则共享同一份数据,如果传的是list则两种方法都会复制数据

    ar1 = np.array([1, 2, 3])
    ts1 = torch.tensor(ar1, dtype=torch.float, device=torch.device('cuda'))
    ts1[1] = 10
    ar1[2] = 20
    print(ar1, ts1)
    
    [ 1  2 20] tensor([ 1., 10.,  3.], device='cuda:0')
    
  2. 创建全0或全1的tensor:torch.zeorestorch.ones

  3. torch.from_numpy()方法,可以看一下from_numpy()tensor()方法的区别:https://blog.csdn.net/github_28260175/article/details/105382060

    简而言之,当ndarray的类型为float32时都可以,但当类型不为float32时,tensor()方法可能会与预期不符,因此应该尽量用from_numpy()方法

梯度开关

  1. 创建tensor时设置requires_grad参数

    ar2 = np.array([1, 2, 3])
    ts2 = torch.tensor(ar2, dtype=torch.float,requires_grad=True, device=torch.device('cuda'))
    out = ts2.sum()
    out.backward()
    ts2.grad
    ---
    tensor([1., 1., 1.], device='cuda:0')
    
  2. 调用requires_grad_()detach()方法转换已有tensor

    ar3 = np.array([1, 2, 3])
    ts3 = torch.tensor(ar2, dtype=torch.float, device=torch.device('cuda'))
    ts3 = ts3.requires_grad_()
    # ts3 = ts3.detach()
    out = ts3.sum()
    out.backward()
    ts3.grad
    ---
    tensor([1., 1., 1.], device='cuda:0')
    

转换

  1. item()方法 当tensor只有一个数时取出这个数

    ones = torch.zeros([2,4], dtype=torch.float32)
    ones[1, 1].item()
    ---
    0.0
    
  2. numpy()方法 tensor转ndarray,不复制(共享同一份数据)

    ones = torch.zeros([2,4], dtype=torch.float32)
    npa = ones.numpy()
    npa[1, 1] = 1
    ones
    ---
    tensor([[0., 0., 0., 0.],
            [0., 1., 0., 0.]])
    
  3. 数据类型转换,如:float()方法,tensor.float() 等价于self.to(torch.float32),在打印会显示dtype,如果为float32就不显示,可以认为tensor默认是float32

    ar3 = np.array([1, 2, 3])
    ts3 = torch.tensor(ar3)
    ts3 = ts3.to(torch.float32)
    print(ts3)
    ts3 = ts3.to(torch.float16)
    print(ts3)
    ts3 = ts3.to(torch.int32)
    print(ts3)
    ---
    tensor([1., 2., 3.])
    tensor([1., 2., 3.], dtype=torch.float16)
    tensor([1, 2, 3], dtype=torch.int32
    
    1. cuda()方法,将tensor放入CUDA memory并返回。一般只会把小批量的数据放入cuda中防止炸显存。

      print(ts4)
      ts4 = ts4.cuda()
      print(ts4)
      ---
      tensor([1, 2, 3, 4, 5, 6], dtype=torch.int32)
      tensor([1, 2, 3, 4, 5, 6], device='cuda:0', dtype=torch.int32)
      
    2. view()方法,改变tensor的shape,显然共享数据。类似的方法是view_as(tensor),转化为和另一个tensor一样的形状

      print(ts4)
      ts5 = ts4.view(-1, 3)
      print(ts5)
      ts5[1, 1] = 100
      ts4
      ---
      tensor([  1,   2,   3,   4, 100,   6], device='cuda:0', dtype=torch.int32)
      tensor([[  1,   2,   3],
            [  4, 100,   6]], device='cuda:0', dtype=torch.int32)
      tensor([  1,   2,   3,   4, 100,   6], device='cuda:0', dtype=torch.int32)
      

数据Data

import torch.utils.data as torchdata

主要是将tensor转化为【数据集】,其实就是用一种可以用迭代器访问的方式组织tensor,便于后续训练,分批量之类的功能也在这里完成。

TensorDataset

class TensorDataset(Dataset[Tuple[Tensor, ...]]):
    r"""Dataset wrapping tensors.

    Each sample will be retrieved by indexing tensors along the first dimension.

    Args:
        *tensors (Tensor): tensors that have the same size of the first dimension.
    """
    tensors: Tuple[Tensor, ...]

    def __init__(self, *tensors: Tensor) -> None:
        assert all(tensors[0].size(0) == tensor.size(0) for tensor in tensors), "Size mismatch between tensors"
        self.tensors = tensors

    def __getitem__(self, index):
        return tuple(tensor[index] for tensor in self.tensors)

    def __len__(self):
        return self.tensors[0].size(0)

将多个长度相同的tensor打包,通常用于打包输入和标签。其实只提供了两种方法,长度len和下标访问。

print(ts1)
print(ts2)
print(ts3)
tensorset = torchdata.TensorDataset(ts1, ts2, ts3)
print(len(tensorset))
print(tensorset[1])
---
tensor([ 1., 10.,  3.], device='cuda:0')
tensor([1., 2., 3.], device='cuda:0', requires_grad=True)
tensor([1., 2., 3.], device='cuda:0', requires_grad=True)
3
(tensor(10., device='cuda:0'), tensor(2., device='cuda:0', grad_fn=<SelectBackward0>), tensor(2., device='cuda:0', grad_fn=<SelectBackward0>))

DataLoader

可以将TensorDataset转dataset,也可以只转一个tensor。

常用的参数就是batch_size和shuffle,决定每一批数据集大小,以及是否打乱

tensorset = torchdata.TensorDataset(ts1, ts2, ts3)
dataloader = torchdata.DataLoader(tensorset, batch_size = 2, shuffle = True)
for data in dataloader:
    print(data)
---
[tensor([10.,  1.], device='cuda:0'), tensor([2., 1.], device='cuda:0', grad_fn=<StackBackward0>), tensor([2., 1.], grad_fn=<StackBackward0>)]
[tensor([3.], device='cuda:0'), tensor([3.], device='cuda:0', grad_fn=<StackBackward0>), tensor([3.], grad_fn=<StackBackward0>)]

迭代器访问的时候也可以像这样把不同的数据分开来:

tensorset = torchdata.TensorDataset(ts1, ts2, ts3)

dataloader = torchdata.DataLoader(tensorset, batch_size = 2, shuffle = True)

for d1, d2, d3 in dataloader:
    print(d1)
---
tensor([10.,  3.], device='cuda:0')
tensor([1.], device='cuda:0')

神经网络Neutral network

from torch import nn

网络模型

nn.Module是所有网络模型的父类。搭建网络模型通常会从nn.Module继承一个类,然后重写__init__()forward(),以最简单的三成MLP为例:

class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        # 输入层 -> 隐藏层
        self.fc1 = nn.Linear(input_size, hidden_size)
        # 隐藏层 -> 输出层
        self.fc2 = nn.Linear(hidden_size, num_classes)
    def forward(self, x):
        out = self.fc1(x)
        out = self.fc2(out)
        return out
  1. cpu()cuda()方法,将模型的所有参数移入CPU/GPU
  2. eval()train()方法,设置当前模型用于评估还是训练(评估时会停用Dropout层等)eval()等价于train(false)
  3. to()方法,目前主要用两种
    1. to(device),功能应该和cpu()cuda()一样
    2. to(dtype),将模型参数转化成对应类型,和float()double()方法效果一样

网络层Layer

  1. 线性层Linear

损失函数Loss Function

  1. 交叉熵CrossEntropy()
  2. 二分类损失函数BCELoss()

优化器Optimizer

  1. SGD
  2. Adam