Overview
Teaching: 30 min
Exercises: 20 minQuestions
What support does Cython provide for NumPy?
Objectives
Learn different techniques for using NumPy programs with Cython.
NumPy can be used from Cython in exactly the same manner as in regular Python, however Cython also has a number of features that support fast access to NumPy arrays that can result in significant performance gains. In this section, we will look at how some of these features can be used.
The code below does 2D discrete convolution of an image with a filter.
import numpy as np
def naive_convolve(f, g):
# f is an image and is indexed by (v, w)
# g is a filter kernel and is indexed by (s, t),
# it needs odd dimensions
# h is the output image and is indexed by (x, y),
# it is not cropped
if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
raise ValueError("Only odd dimensions on filter supported")
# smid and tmid are number of pixels between the center pixel
# and the edge, ie for a 5x5 filter they will be 2.
#
# The output size is calculated by adding smid, tmid to each
# side of the dimensions of the input image.
vmax = f.shape[0]
wmax = f.shape[1]
smax = g.shape[0]
tmax = g.shape[1]
smid = smax // 2
tmid = tmax // 2
xmax = vmax + 2*smid
ymax = wmax + 2*tmid
# Allocate result image.
h = np.zeros([xmax, ymax], dtype=f.dtype)
# Do convolution
for x in range(xmax):
for y in range(ymax):
# Calculate pixel value for h at (x,y). Sum one component
# for each pixel (s, t) of the filter g.
s_from = max(smid - x, -smid)
s_to = min((xmax - x) - smid, smid + 1)
t_from = max(tmid - y, -tmid)
t_to = min((ymax - y) - tmid, tmid + 1)
value = 0
for s in range(s_from, s_to):
for t in range(t_from, t_to):
v = x - smid + s
w = y - tmid + t
value += g[smid - s, tmid - t] * f[v, w]
h[x, y] = value
return h
Let’s get a baseline on how fast this code executes:
%timeit naive_convolve(np.array([[1, 1, 1]], dtype=np.int), np.array([[1],[2],[1]], dtype=np.int))
The slowest run took 79.78 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 28.7 µs per loop
As we saw previously, we can simply compile this code using Cython and expect some performance improvements. Let’s copy the
naive_convolve
function from the cell above, then rename the function convolve1
and compile it with Cython.
%%cython
import numpy as np
def convolve1(f, g):
# f is an image and is indexed by (v, w)
# g is a filter kernel and is indexed by (s, t),
# it needs odd dimensions
# h is the output image and is indexed by (x, y),
# it is not cropped
if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
raise ValueError("Only odd dimensions on filter supported")
# smid and tmid are number of pixels between the center pixel
# and the edge, ie for a 5x5 filter they will be 2.
#
# The output size is calculated by adding smid, tmid to each
# side of the dimensions of the input image.
vmax = f.shape[0]
wmax = f.shape[1]
smax = g.shape[0]
tmax = g.shape[1]
smid = smax // 2
tmid = tmax // 2
xmax = vmax + 2*smid
ymax = wmax + 2*tmid
# Allocate result image.
h = np.zeros([xmax, ymax], dtype=f.dtype)
# Do convolution
for x in range(xmax):
for y in range(ymax):
# Calculate pixel value for h at (x,y). Sum one component
# for each pixel (s, t) of the filter g.
s_from = max(smid - x, -smid)
s_to = min((xmax - x) - smid, smid + 1)
t_from = max(tmid - y, -tmid)
t_to = min((ymax - y) - tmid, tmid + 1)
value = 0
for s in range(s_from, s_to):
for t in range(t_from, t_to):
v = x - smid + s
w = y - tmid + t
value += g[smid - s, tmid - t] * f[v, w]
h[x, y] = value
return h
We should run some tests to make sure that the functions produce the same results:
from nose.tools import assert_equal
res1 = naive_convolve(np.array([[1, 1, 1]], dtype=np.int), np.array([[1],[2],[1]], dtype=np.int))
res2 = convolve1(np.array([[1, 1, 1]], dtype=np.int), np.array([[1],[2],[1]], dtype=np.int))
assert_equal((res1==res2).all(), True)
This doesn’t produce any output, so that means the assertion succeeded. Now we can time the new function and see if it is any faster.
%timeit convolve1(np.array([[1, 1, 1]], dtype=np.int), np.array([[1],[2],[1]], dtype=np.int))
10000 loops, best of 3: 19.5 µs per loop
So we got a speedup of 28.7 / 19.5 ~ 1.5 times. Not bad.
The next step is to add Cython data types to the code. This code will no longer be compatible with Python, so the consequences of doing this must be carefully considered. The most important change is to use variables that have the same data type as the elements of the NumPy arrays.
For normal variables, this simply means adding a line like cdef type variable
(where type
is the appropriate C type) for each variable
in the program. Since it is also possible to initialize the variable at the same time it is declared, we can simplify the code by declaring
it the first time it is used. In the convolve1
case above, we know that the indices of the NumPy arrays are integers, so we
can do something like:
cdef int vmax = f.shape[0]
cdef int wmax = f.shape[1]
cdef int smax = g.shape[0]
cdef int tmax = g.shape[1]
...
This is simpler than doing the following for each variable:
cdef int vmax
vmax = f.shape[0]
...
It is important to type all variables. Cython does not give any warnings if you do not, however the code will run much slower.
The NumPy arrays themselves also need to have type information added, and they correspond to the np.ndarray
type. The f
and g
arrays appear as parameters to the function, so the type can be added as part of the function declaration. The h
array is initialized
to zeros in the code so requires a cdef np.ndarray
to be added as follows:
...
def convolve1(np.ndarray f, np.ndarray g):
...
cdef np.ndarray h = np.zeros([xmax, ymax], dtype=f.dtype)
...
The one remaining variable to be changed is value
. We would like to use the same type as that stored in the NumPy arrays, however we
can’t just use np.int
as this is a run-time type. However, for every run-time type, there is a corresponding compile-time type that ends
in _t
. This allows us to declare value
as follows:
cdef np.int_t value
Challenge
The code below will not compile as parts have been commented out. Copy this code into a Jupyter cell, or into IPython and rename the function
convolve2
. Read each of the commented sections for a description of how the data types are added, then uncomment the line to enable to statement. When you have completed this for the whole function, run the code to ensure that it compiles correctly.%%cython import numpy as np # "cimport" is used to import special compile-time information # about the numpy module (this is stored in a file numpy.pxd which is # currently part of the Cython distribution). #cimport numpy as np # "def" can type its arguments but not have a return type. The type of the # arguments for a "def" function is checked at run-time when entering the # function. # # The arrays f, g and h is typed as "np.ndarray" instances. The only effect # this has is to a) insert checks that the function arguments really are # NumPy arrays, and b) make some attribute access like f.shape[0] much # more efficient. (In this example this doesn't matter though.) def convolve2(np.ndarray f, np.ndarray g): if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: raise ValueError("Only odd dimensions on filter supported") # The "cdef" keyword is also used within functions to type variables. It # can only be used at the top indendation level (there are non-trivial # problems with allowing them in other places, though we'd love to see # good and thought out proposals for it). # # For the indices, the "int" type is used. This corresponds to a C int, # other C types (like "unsigned int") could have been used instead. #cdef int vmax = f.shape[0] #cdef int wmax = f.shape[1] #cdef int smax = g.shape[0] #cdef int tmax = g.shape[1] #cdef int smid = smax // 2 #cdef int tmid = tmax // 2 #cdef int xmax = vmax + 2*smid #cdef int ymax = wmax + 2*tmid #cdef np.ndarray h = np.zeros([xmax, ymax], dtype=np.int) #cdef int x, y, s, t, v, w # It is very important to type ALL your variables. You do not get any # warnings if not, only much slower code (they are implicitly typed as # Python objects). #cdef int s_from, s_to, t_from, t_to # For the value variable, we want to use the same data type as is # stored in the array, so we use "np.int_t". # NB! An important side-effect of this is that if "value" overflows its # datatype size, it will simply wrap around like in C, rather than raise # an error like in Python. #cdef np.int_t value for x in range(xmax): for y in range(ymax): s_from = max(smid - x, -smid) s_to = min((xmax - x) - smid, smid + 1) t_from = max(tmid - y, -tmid) t_to = min((ymax - y) - tmid, tmid + 1) value = 0 for s in range(s_from, s_to): for t in range(t_from, t_to): v = x - smid + s w = y - tmid + t value += g[smid - s, tmid - t] * f[v, w] h[x, y] = value return h
Next, run the following test to make sure the result is the same:
from nose.tools import assert_equal res1 = naive_convolve(np.array([[1, 1, 1]], dtype=np.int), np.array([[1],[2],[1]], dtype=np.int)) res2 = convolve2(np.array([[1, 1, 1]], dtype=np.int), np.array([[1],[2],[1]], dtype=np.int)) assert_equal((res1==res2).all(), True)
Now time this code and see if it has improved.
%timeit convolve2(np.array([[1, 1, 1]], dtype=np.int), np.array([[1],[2],[1]], dtype=np.int))
Did the time improve?
This code is still not as efficient as it could be. Array lookups and assignments, like those using the []
-operator, still uses full Python
operations. It would be much more effient if we could access the data buffer directly at C speed.
It is possible to do this by specifying the type of contents of the ndarray
objects. We do this with a special “buffer” syntax which must
be told the datatype (first argument) and number of dimensions (“ndim” keyword-only argument, if not provided then one-dimensional is assumed).
The format for this syntax is:
np.ndarray[type, ndim=N]
Here, type
is the compile-time type of the array elements, and N
is the number of dimensions in the array. This syntax is used wherever
an ndarray
is declared.
In the convolve2
example, we would need to use this in two places as follows:
...
def convolve2(np.ndarray[np.int_t, ndim=2] f, np.ndarray[np.int_t, ndim=2] g):
...
cdef np.ndarray[np.int_t, ndim=2] h = ...
The NumPy array lookups are still slowed down by two other factors:
If we are certain that code will always access within the array bounds, and that it doesn’t use negative indices, then it is possible to obtain some extra performance by avoiding these checks.
Note however that this comes at the cost of safety. If the code does not behave exactly as you expect, it could crash the program or corrupt data.
Bounds checking can be disabled by adding a decorator to the function as follows:
...
cimport cython
@cython.boundscheck(False) # turn off bounds-checking for entire function
def convolve2(np.ndarray[np.int_t, ndim=2] f, np.ndarray[np.int_t, ndim=2] g):
...
Now bounds checking is not performed. It is possible to switch bounds-checking mode in many ways, see Compiler directives for more information.
Negative indices are dealt with by forcing the indexes to be positive using the unsigned integer type for the index variables and
casting values to this type. If negative values are cast, then this will create a very large positive value instead and it may result
in an attempt to access out-of-bounds values. Casting is done with a special <>
-syntax. The code below shows how to change the
function to use either unsigned ints or casting as appropriate:
...
cdef int s, t
cdef unsigned int x, y, v, w
...
v = <unsigned int>(x - smid + s)
w = <unsigned int>(y - tmid + t)
value += g[<unsigned int>(smid - s), <unsigned int>(tmid - t)] * f[v, w]
...
Challenge
Make the changes suggested in the previous sections to
convolve2
and see how much the performance improves.
Key Points
Normal Cython features can be used to improve the performance of NumPy programs.
Cython supports an efficient indexing scheme for NumPy.
Some correctness checking features can be disabled if maximum speed is required.