Introduction to Numba

Numba Functions

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • Are there restrictions on calling Numba functions?

  • Can Numba be used to simplify the creation of ufuncs?

Objectives
  • Learn how to call Numba functions efficiently.

  • Learn how to vectorize code for use as a ufunc.

Calling other functions

Numba functions can call other Numba functions. Of course, both functions must have the @jit decorator, otherwise the code will be much slower.

import numpy as np
from numba import jit

@jit("void(f4[:])",nopython=True)
def bubblesort(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp
               
@jit("void(f4[:])",nopython=True)
def do_sort(sorted):
    bubblesort(sorted)
    
original = np.arange(0.0, 10.0, 0.01, dtype='f4')
shuffled = original.copy()
np.random.shuffle(shuffled)
sorted = shuffled.copy()
%timeit sorted[:]=shuffled[:]; do_sort(sorted)

NumPy universal functions

Numba’s @vectorize decorator allows Python functions taking scalar input arguments to be used as NumPy ufuncs. Creating a traditional NumPy ufunc is not the most straightforward process and involves writing some C code. Numba makes this easy. Using the @vectorize decorator, Numba can compile a pure Python function into a ufunc that operates over NumPy arrays as fast as traditional ufuncs written in C.

Universal functions (ufunc)

A universal function (or ufunc for short) is a function that operates on NumPy arrays (ndarrays) in an element-by-element fashion. They support array broadcasting, type casting, and several other standard features.

A ufunc is a “vectorized” wrapper for a function that takes a fixed number of scalar inputs and produces a fixed number of scalar outputs.

Many of NumPy’s builtin operators are ufuncs.

The @vectorize decorator has two modes of operation:

Using @vectorize, you write your function as operating over input scalars, rather than arrays. Numba will generate the surrounding loop (or kernel) allowing efficient iteration over the actual inputs. The following code defines a function that takes two integer arrays and returns an integer array.

import numpy as np
from numba import vectorize, int64
​
@vectorize([int64(int64, int64)])
def vec_add(x, y):
    return x + y

a = np.arange(6, dtype=np.int64)
b = np.linspace(0, 10, 6, dtype=np.int64)
print(vec_add(a, a))
print(vec_add(b, b))

Running this code should produce the output:

[ 0  2  4  6  8 10]
[ 0  4  8 12 16 20]

This works because NumPy array elements are int64. If the elements are a different type, and the arguments cannot be safely coerced, then the function will raise an exception:

c = a.astype(float)
print(c)
print(vec_add(c, c))
[ 0.  1.  2.  3.  4.  5.]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-74-9f06063afeeb> in <module>()
      1 c = a.astype(float)
      2 print c
----> 3 print vec_add(c, c)

TypeError: ufunc 'vec_add' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Challenge

Redefine the vec_add() function so that it takes float64 as arguments. Run it using the following to check it produces the correct results.

c = np.linspace(0, 1, 6)
if (c * 2 == vec_add(c, c)).all():
	print("Correct!")

Key Points