Hits
3 min read (193 words)

In a machine learning pipeline, understanding how to script a PyTorch model is essential. As explained in an excellent introductory post, a few advantages of scripting include:

  • Saving and transferring the model to environments outside of Python
  • Obtaining an intermediate representation that can be further optimized

Recently, I explored AWS SageMaker, focusing specifically on deploying a custom model within the service. Through the official documentation, I discovered that the TorchScript format is also a standard format for saving trained models.

In this post, I would like to rather focus on the performance aspects of scripted models, particularly in light of claimed performance benefits here. To investigate this, I extended the original script, varying the batch size and increasing the number of repetitions.

import torchvision
import torch
from time import perf_counter
import numpy as np
from torchvision.models import ResNet18_Weights

def timer(f,*args):   
    start = perf_counter()
    f(*args)
    return 1000*(perf_counter() - start)
  
def get_model(device='cpu', scripted=False, a=None):
    model = torchvision.models.resnet18(weights=ResNet18_Weights.DEFAULT).to(device)
    model.eval()
    if scripted:
        with torch.jit.optimized_execution(True):
            model = torch.jit.script(model, a)
    return model

def get_tensor(device='cpu', bs=1):
    return torch.rand(bs, 3, 224, 224).to(device)

for scripted_mode in [False, True]:
    for device in ['cpu', 'cuda']:
        for bs in [1, 32, 128]:
            a = get_tensor(device, bs)
            model = get_model(device, scripted_mode, a)
            res = np.mean([timer(model,a) for _ in range(100)])
            print(
                f"Scripted: {scripted_mode}, Device: {device}, BS: {bs} Time: {res:.3f}"
            )

And here are the collected results:

Table 1. Comparing performance of the serialized model with Torchscript
Table 1. Comparing performance of the serialized model with Torchscript

Based on my experiment results, I did not find generalized performance benefits as claimed in the post, for either CPU or GPU devices. Furthermore, there’s an ongoing discussion about this in the PyTorch forums. In summary, I found comparable performance for both scripted and non-scripted models. Although the scripted version is faster for a single batch, the trend reverses as batch size increases.