如何以清洁高效的方式在火炬中获得迷你批次?
deep-learning
machine-learning
numpy
python
8
0

我试图做一个简单的事情,就是使用火炬用随机梯度下降(SGD)训练线性模型:

import numpy as np

import torch
from torch.autograd import Variable

import pdb

def get_batch2(X,Y,M,dtype):
    X,Y = X.data.numpy(), Y.data.numpy()
    N = len(Y)
    valid_indices = np.array( range(N) )
    batch_indices = np.random.choice(valid_indices,size=M,replace=False)
    batch_xs = torch.FloatTensor(X[batch_indices,:]).type(dtype)
    batch_ys = torch.FloatTensor(Y[batch_indices]).type(dtype)
    return Variable(batch_xs, requires_grad=False), Variable(batch_ys, requires_grad=False)

def poly_kernel_matrix( x,D ):
    N = len(x)
    Kern = np.zeros( (N,D+1) )
    for n in range(N):
        for d in range(D+1):
            Kern[n,d] = x[n]**d;
    return Kern

## data params
N=5 # data set size
Degree=4 # number dimensions/features
D_sgd = Degree+1
##
x_true = np.linspace(0,1,N) # the real data points
y = np.sin(2*np.pi*x_true)
y.shape = (N,1)
## TORCH
dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU
X_mdl = poly_kernel_matrix( x_true,Degree )
X_mdl = Variable(torch.FloatTensor(X_mdl).type(dtype), requires_grad=False)
y = Variable(torch.FloatTensor(y).type(dtype), requires_grad=False)
## SGD mdl
w_init = torch.zeros(D_sgd,1).type(dtype)
W = Variable(w_init, requires_grad=True)
M = 5 # mini-batch size
eta = 0.1 # step size
for i in range(500):
    batch_xs, batch_ys = get_batch2(X_mdl,y,M,dtype)
    # Forward pass: compute predicted y using operations on Variables
    y_pred = batch_xs.mm(W)
    # Compute and print loss using operations on Variables. Now loss is a Variable of shape (1,) and loss.data is a Tensor of shape (1,); loss.data[0] is a scalar value holding the loss.
    loss = (1/N)*(y_pred - batch_ys).pow(2).sum()
    # Use autograd to compute the backward pass. Now w will have gradients
    loss.backward()
    # Update weights using gradient descent; w1.data are Tensors,
    # w.grad are Variables and w.grad.data are Tensors.
    W.data -= eta * W.grad.data
    # Manually zero the gradients after updating weights
    W.grad.data.zero_()

#
c_sgd = W.data.numpy()
X_mdl = X_mdl.data.numpy()
y = y.data.numpy()
#
Xc_pinv = np.dot(X_mdl,c_sgd)
print('J(c_sgd) = ', (1/N)*(np.linalg.norm(y-Xc_pinv)**2) )
print('loss = ',loss.data[0])

代码运行良好,尽管我的get_batch2方法看起来真的很简单/天真,可能是因为我对pytorch不熟悉,但是我找不到在他们讨论如何检索数据批次的好地方。我遍历了他们的教程( http://pytorch.org/tutorials/beginner/pytorch_with_examples.html )和数据集( http://pytorch.org/tutorials/beginner/data_loading_tutorial.html ),但是没有运气。所有教程似乎都假设一个人在开始时就已经具有批处理和批处理大小,然后继续训练该数据而不更改它(具体请参见http://pytorch.org/tutorials/beginner/pytorch_with_examples.html# pytorch变量和autograd )。

所以我的问题是我是否真的需要将我的数据转回numpy,以便可以获取它的一些随机样本,然后使用Variable将其转回到pytorch以便能够在内存中进行训练?有没有办法用火炬获得迷你批次?

我看了割炬提供的一些功能,但是运气不好:

#pdb.set_trace()
#valid_indices = torch.arange(0,N).numpy()
#valid_indices = np.array( range(N) )
#batch_indices = np.random.choice(valid_indices,size=M,replace=False)
#indices = torch.LongTensor(batch_indices)
#batch_xs, batch_ys = torch.index_select(X_mdl, 0, indices), torch.index_select(y, 0, indices)
#batch_xs,batch_ys = torch.index_select(X_mdl, 0, indices), torch.index_select(y, 0, indices)

即使我提供的代码工作正常,我还是担心它不是一种有效的实现,而且如果我使用GPU,则会进一步放慢速度(因为我猜想它会将东西放到内存中,然后再将它们取回放置)他们这样的GPU很愚蠢)。


我根据建议使用torch.index_select()的答案实施了一个新方法:

def get_batch2(X,Y,M):
    '''
    get batch for pytorch model
    '''
    # TODO fix and make it nicer, there is pytorch forum question
    #X,Y = X.data.numpy(), Y.data.numpy()
    X,Y = X, Y
    N = X.size()[0]
    batch_indices = torch.LongTensor( np.random.randint(0,N+1,size=M) )
    pdb.set_trace()
    batch_xs = torch.index_select(X,0,batch_indices)
    batch_ys = torch.index_select(Y,0,batch_indices)
    return Variable(batch_xs, requires_grad=False), Variable(batch_ys, requires_grad=False)

但是,这似乎有问题,因为如果X,Y不是变量X,Y那么它将不起作用……这确实很奇怪。我将此添加到pytorch论坛: https ://discuss.pytorch.org/t/how-to-get-mini-batches-in-pytorch-in-a-clean-and-efficiency-way/10322

现在,我正在努力为gpu进行这项工作。我最新的版本:

def get_batch2(X,Y,M,dtype):
    '''
    get batch for pytorch model
    '''
    # TODO fix and make it nicer, there is pytorch forum question
    #X,Y = X.data.numpy(), Y.data.numpy()
    X,Y = X, Y
    N = X.size()[0]
    if dtype ==  torch.cuda.FloatTensor:
        batch_indices = torch.cuda.LongTensor( np.random.randint(0,N,size=M) )# without replacement
    else:
        batch_indices = torch.LongTensor( np.random.randint(0,N,size=M) ).type(dtype)  # without replacement
    pdb.set_trace()
    batch_xs = torch.index_select(X,0,batch_indices)
    batch_ys = torch.index_select(Y,0,batch_indices)
    return Variable(batch_xs, requires_grad=False), Variable(batch_ys, requires_grad=False)

错误:

RuntimeError: tried to construct a tensor from a int sequence, but found an item of type numpy.int64 at index (0)

我不明白,我真的必须做:

ints = [ random.randint(0,N) for i i range(M)]

得到整数?

如果数据可以是变量,也将是理想的。看来torch.index_select对于Variable类型数据不起作用。

此整数列表仍然不起作用:

TypeError: torch.addmm received an invalid combination of arguments - got (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor), but expected one of:
 * (torch.cuda.FloatTensor source, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (torch.cuda.FloatTensor source, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (float beta, torch.cuda.FloatTensor source, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (torch.cuda.FloatTensor source, float alpha, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (float beta, torch.cuda.FloatTensor source, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (torch.cuda.FloatTensor source, float alpha, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
 * (float beta, torch.cuda.FloatTensor source, float alpha, torch.cuda.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
      didn't match because some of the arguments have invalid types: (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor)
 * (float beta, torch.cuda.FloatTensor source, float alpha, torch.cuda.sparse.FloatTensor mat1, torch.cuda.FloatTensor mat2, *, torch.cuda.FloatTensor out)
      didn't match because some of the arguments have invalid types: (int, torch.cuda.FloatTensor, int, torch.cuda.FloatTensor, torch.FloatTensor, out=torch.cuda.FloatTensor)
参考资料:
Stack Overflow
收藏
评论
共 3 个回答
高赞 时间 活跃

使用数据加载器。

数据集

首先,您定义一个数据集。您可以使用软件包的数据集torchvision.datasets或使用ImageFolder随后Imagenet的结构数据集类。

trainset=torchvision.datasets.ImageFolder(root='/path/to/your/data/trn', transform=generic_transform)
testset=torchvision.datasets.ImageFolder(root='/path/to/your/data/val', transform=generic_transform)

变身

转换对于动态处理加载的数据非常有用。如果使用图像,则必须使用ToTensor()转换将已加载的图像从PIL转换为torch.tensor 。可以将更多转换打包为复合转换,如下所示。

generic_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.ToPILImage(),
    #transforms.CenterCrop(size=128),
    transforms.Lambda(lambda x: myimresize(x, (128, 128))),
    transforms.ToTensor(),
    transforms.Normalize((0., 0., 0.), (6, 6, 6))
])

资料载入器

然后,您定义一个数据加载器,该数据加载器在训练时准备下一批。您可以设置用于数据加载的线程数。

trainloader=torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True, num_workers=8)
testloader=torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False, num_workers=8)

为了进行培训,您只需要列举数据加载器即可。

  for i, data in enumerate(trainloader, 0):
    inputs, labels = data    
    inputs, labels = Variable(inputs.cuda()), Variable(labels.cuda())
    # continue training...

NumPy的东西

是。您必须使用.numpy()方法将torch.tensor转换为numpy才能对其进行处理。如果使用的是CUDA,则必须先使用.cpu()方法将数据从GPU下载到CPU,然后再调用.numpy() 。就个人而言,来自MATLAB背景,我更喜欢使用火炬张量进行大部分工作,然后仅将数据转换为numpy以便可视化。还要记住,割炬在通道优先模式下存储数据,而numpy和PIL与通道最后模式一起工作。这意味着您需要使用np.rollaxis将通道轴移动到最后一个。下面是示例代码。

np.rollaxis(make_grid(mynet.ftrextractor(inputs).data, nrow=8, padding=1).cpu().numpy(), 0, 3)

记录中

我发现可视化特征图的最佳方法是使用张量板。可以在yunjey / pytorch-tutorial中找到代码。

收藏
评论

如果我正确地理解了您的代码,则您的get_batch2函数似乎正在从数据集中获取随机迷你批,而没有跟踪您已经在一个纪元中使用了哪些索引。此实现的问题在于,它可能不会利用您的所有数据。

我通常进行批处理的方法是使用torch.randperm(N)对所有可能的顶点创建随机排列,然后分批循环遍历它们。例如:

n_epochs = 100 # or whatever
batch_size = 128 # or whatever

for epoch in range(n_epochs):

    # X is a torch Variable
    permutation = torch.randperm(X.size()[0])

    for i in range(0,X.size()[0], batch_size):
        optimizer.zero_grad()

        indices = permutation[i:i+batch_size]
        batch_x, batch_y = X[indices], Y[indices]

        # in case you wanted a semi-full example
        outputs = model.forward(batch_x)
        loss = lossfunction(outputs,batch_y)

        loss.backward()
        optimizer.step()

如果要复制和粘贴,请确保在纪元循环开始之前的某个位置定义优化器,模型和损失函数。

关于您的错误,请尝试使用torch.from_numpy(np.random.randint(0,N,size=M)).long()代替torch.LongTensor(np.random.randint(0,N,size=M)) 。我不确定这是否可以解决您遇到的错误,但可以解决将来的错误。

收藏
评论

不确定您要做什么。 WRT批处理您不必转换为numpy。您可以只使用index_select() ,例如:

for epoch in range(500):
    k=0
    loss = 0
    while k < X_mdl.size(0):

        random_batch = [0]*5
        for i in range(k,k+M):
            random_batch[i] = np.random.choice(N-1)
        random_batch = torch.LongTensor(random_batch)
        batch_xs = X_mdl.index_select(0, random_batch)
        batch_ys = y.index_select(0, random_batch)

        # Forward pass: compute predicted y using operations on Variables
        y_pred = batch_xs.mul(W)
        # etc..

不过,其余的代码也必须更改。


我的猜测是,您想创建一个将X张量和Y张量连接在一起的get_batch函数。就像是:

def make_batch(list_of_tensors):
    X, y = list_of_tensors[0]
    # may need to unsqueeze X and y to get right dimensions
    for i, (sample, label) in enumerate(list_of_tensors[1:]):
        X = torch.cat((X, sample), dim=0)
        y = torch.cat((y, label), dim=0)
    return X, y

然后在训练过程中,通过切片选择示例,例如max_batch_size = 32。

for epoch:
  X, y = make_batch(list_of_tensors)
  X = Variable(X, requires_grad=False)
  y = Variable(y, requires_grad=False)

  k = 0   
   while k < X.size(0):
     inputs = X[k:k+max_batch_size,:]
     labels = y[k:k+max_batch_size,:]
     # some computation
     k+= max_batch_size
收藏
评论
新手导航
  • 社区规范
  • 提出问题
  • 进行投票
  • 个人资料
  • 优化问题
  • 回答问题

关于我们

常见问题

内容许可

联系我们

@2020 AskGo
京ICP备20001863号