Visualizing the latent space of an autoencoder
Autoencoders are a type of model that compress large sets of data into a smaller, simplified form and then reconstruct the original from this compressed version. This compression is valuable because it lets us capture complex, high-dimensional data in a lower-dimensional form, making it useful for tasks like data compression and feature extraction.
In essence, an autoencoder is a neural network that learns two main tasks: encoding, which reduces the data’s size, and decoding, which reconstructs the original data from this smaller form. I would like to suggest a reader this another excellent resource here for a further reading. Since there are plenty of open-source autoencoder code examples in PyTorch available online, I won’t dive into the model and training setup here. Instead, I’d like to take a look at how the data appears after training a basic autoencoder for 100 epochs to see the results of this low-dimensional transformation.
import torch
from sklearn.manifold import TSNE
from sklearn.metrics import silhouette_score
def plot_latent_space(images, labels, model):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# compare epoch 1 and 96
for ep in [1, 96]:
# encoder-decoder | L(784<->256) : L(256<->64) : L(64<->20)
model.load_state_dict(torch.load(f"./weights/ep_{str(ep).zfill(2)}.pth", weights_only=True))
model.eval()
# dataset | MNIST
images = images.view(images.size(0), -1)
images = images.to(device)
# forward pass, only encoder
enc, _ = model(images)
latent_space = enc.detach().cpu().numpy()
# reduce into 2 dimensions
ltr = TSNE(n_components=2).fit_transform(latent_space)
# getting score
score = silhouette_score(ltr, labels, metric='euclidean')
# plot
plt.figure(figsize=(10, 10))
scatter = plt.scatter(ltr[:, 0], ltr[:, 1], c=labels, cmap='tab10')
plt.colorbar(scatter, ticks=range(10))
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')
plt.title(f'tSNE. Epoch {ep}, Sil score: {score:.3f}')
The key parts of the function above are commented. It’s important to note that, from the autoencoder (which includes both an encoder and a decoder), we’re primarily interested in the output of the encoder. This output gives us the latent space—a compressed, lower-dimensional representation of the features. Essentially, we can think of it as a form of data compression.
The Silhouette score is a metric used to evaluate how well clusters are defined. It ranges from -1 to +1, where a higher score indicates that samples are well matched to their own clusters and poorly matched to neighboring clusters. We can use sklearn.metrics.silhouette_score
to compute the mean Silhouette coefficient across all samples. As shown, the latent space created by the autoencoder shows clearer differentiation between the MNIST class digits.