One of my job responsibilities is to teach engineers and data scientists how to use the PyTorch neural network code library. There are many examples of how the max pooling in a CNN works, but they tend to be too generic (not specific to PyTorch), or too specific (a very low-level explanation of the library functions).
Here’s an example that I use. The demo sets up an input of a simple 4×4 grayscale (1 channel) image with dummy pixel values 0 through 15. The demo sets up a MaxPool2D layer with a 2×2 kernel and stride = 1 and applies it to the 4×4 input.
The diagram shows how applying the max pooling layer results in a 3×3 array of numbers. Using max pooling has three benefits. First, it helps prevent model over-fitting by regularizing input. Second, it improves training speed by reducing the number of parameters to learn. Third, it provides basic translation invariance.
The demo leaves out a ton of optional details but the point of my demo is to explain how PyTorch max pooling works, not to dive into the details.

Other kinds of pooling. Left: A pickup truck in a pool. Right: A pool in a pickup truck.
Demo code:
# maxpool_demo.py
# PyTorch 1.10.0-CPU Anaconda3-2020.02 Python 3.7.6
# Windows 10/11
import numpy as np
import torch as T
device = T.device('cpu')
print("\nBegin PyTorch max pooling demo ")
x = np.arange(16, dtype=np.float32)
x = x.reshape(1, 1, 4, 4) # bs, channels, height, width
X = T.tensor(x, dtype=T.float32).to(device)
print("\nSource input: ")
print(X)
pool1 = T.nn.MaxPool2d(2, stride=1)
z1 = pool1(X)
print("\nMaxPool with kernel=2, stride=1: ")
print(z1)
pool2 = T.nn.MaxPool2d(2, stride=2)
z2 = pool2(X)
print("\nMaxPool with kernel=2, stride=2: ")
print(z2)
print("\nEnd max pooling demo ")


.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2025 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2025 G2E Conference
2025 iSC West Conference
Nice explanation of how max pooling works.
I am currently trying to build a more advanced version of my CNN implementations. The biggest problem I see is the extreme computational load which increases with additional pooling. The idea is that with a stride 2 in the convolution step, the pooling step can be compensated more cheaply.
Would you give that idea a chance?