Writing Self-Tests for Notebook Development

Author

Jonathan Graves

Published

14 January 2025

An important part of notebook development is to design them so they give users formative feedback. Formative feedback helps students check if they understand a concept or skill.

We prefer to use immediate formative feedback, by integrating tests into the notebooks. These self-tests are run by the students and provide them with instant feedback about whether they have something correct or not.

This can be accomplished through the following process:

We create a notebook_test script which contains a series of functions which take in an object from the workbook and return feedback (e.g. correct/incorrect).
The object and correct answer are obfuscated using a cryptographic hash function which checks the object against the correct answer without revealing the correct answer.
- This prevents students from hard-coding in their answers by simply peeking at the “correct answer”.
The notebook instructs students to build or evaluate something, which is the object of the test.
The notebook reads in the script, passing in the testing functions for use. These tests are then evaluate when certain cells are run, performing the test and giving feedback.

It is also very important to follow best practices when developing these notebooks and tests, since even small mistakes can create a great deal of confusion for users.

R Kernels

Early in the notebook, usually in the first executed cell, include a source function call to the test scripts file:

source(tests.r)

This file should include the tests, as outlined below in Section 3. In this example, they are of the form test().

Python Kernels

Early in the notebook, usually in the first executed cell, import the Tests class from the test scripts file:

from .tests.py import *

This file should include the tests, as outlined below in Section 4. In this example, they are of the form test(), and are called like Tests.test().

1.1 Use in Jupyter Notebooks (`.ipynb`)

R Kernels
Python Kernels

R Kernels

In the notebook, ask the students to fill in the object requested, then call the test function. Add a comment to explain what needs to be changed, if it’s not clear.

answer_1 <- #fill in the correct value here

test_1()

Try to make the test function is a void call; avoid passing parameters.
Clearly indicate what to change using #comments.
Be as specific as possible when giving directions.

Python Kernels

In the notebook, ask the students to fill in the object requested, then call the test function. Add a comment to explain what needs to be changed, if it’s not clear.

answer_1 = #fill in the correct value here

Tests.test()

Try to make the test function is a void call; avoid passing parameters.
Clearly indicate what to change using #comments.
Be as specific as possible when giving directions.

2 Answers in `.qmd` notebooks

R Kernels
Python Kernels

R Kernels

Early in the notebook, usually in the first executed cell, include a source link to the test scripts file:

source(tests.r)

This file should include the tests, as outlined below in Section 3. In this example, they are of the form test().

In .qmd notebooks, when you write a test include two versions: one with the answers, and one without. Include meta class tags to help tell them apart, and avoid evaluation. The cell should look like:

#| eval: false
#| classes: "question"

answer_1 <- #fill in the correct value here

test_1()

for the question, and like:

#| eval: false
#| classes: "answer"

answer_1 <- the_right_answer(stuff)

test_1()

for the answer. This will help debug questions easily.

Tip

It’s usually easiest to write the answer first, then debug and test.

Python Kernels

Early in the notebook, usually in the first executed cell, import the Tests class from the test scripts file:

from .tests.py import Tests

This file should include the tests, as outlined below in Section 4. In this example, they are of the form test(), and are called like Tests.test().

In .qmd notebooks, when you write a test, include two versions: one with the answers, and one without. Include meta class tags to help tell them apart, and avoid evaluation. The cell should look like:

#| eval: false
#| classes: "question"

answer_1 = #fill in the correct value here

Tests.test_1()

for the question, and like:

#| eval: false
#| classes: "answer"

answer_1 = the_right_answer(stuff)

Tests.test_1()

for the answer. This will help debug questions easily.

Tip

It’s usually easiest to write the answer first, then debug and test.

3 Writing R Self-Tests

Self-test scripts are R files (.r) which supply the testing functions. They use two libraries:

library(testthat): a test assertion library, which provides functions to check if something is correct and give feedback.
library(digest): a hash library, which computes and check hash functions.

Here is an example of the first function of a file and the library headers:

library(testthat)
library(digest)

test_1 <- function() {
  test_that("Solution is incorrect", {
    expect_equal(digest(answer1), "dbc09cba9fe2583fb01d63c70e1555a8")
  })
  print("Success!")
}

This creates a function (test1()) that when called in the Jupyter notebook:

Finds the object answer1.
Computes the hash of it (digest(answer)) and compares it to the string dbc09cba9fe2583fb01d63c70e1555a8 (the correct answer’s hash).
If they match, it prints “Success!” otherwise it throws an error.

In order to develop the test, you can use this template:

Create a new cell to contain the test. If this a .qmd test, make it the answer version of the test.
Create a new function in the script file with a unique name (test_n()) and the answer (answer_n) to test in the testing script.
Compute digest(answer_n) to get the correct has value.
Add it to the expect_equal element in the script.
If a .qmd copy the answer, and change it to a question. Then, replace the correct answer with a comment.

Note that you may not want to test the entire object, but rather some particular part of it, such as answer_n$coefs; see Section 3.2 for details.

3.1 Richer Feedback

The previous method only tests if an answer exactly matches the correct answer. If there are common errors you may want to give a hint about what is wrong. For example, in a multiple-choice question, answers A and B reflect common misconceptions.

You can use tests to give this kind of feedback with a more complex test function. Use the case_when function to give varied responses depending on the answer given by the student. For example:

test_1 <- function(answer_1) {
    ans <- digest(answer_1)
    case_when(ans == "dbc09cba9fe2583fb01d63c70e1555a8" ~ test_that(TRUE),
             ans == "dd531643bffc240879f11278d7a360c1" ~ 
              "This is a common misconception, remember that...",
              TRUE ~ test_that(FALSE))
}

You can adapt this framework for more complex tests, as necessary.

A Note on Feedback

It is important to provide feedback that will guide the student towards the right answer and a greater understanding of the topic at hand. Try not to give feedback along the lines of “That is correct, congratulations!” or “I’m sorry, that is incorrect!.” Feedback should point out the error that students are making and guide them to the correct answer.

3.2 Important Notes

Here are some common pitfalls and notes about creating tests. The main idea is that hash functions are exact: the objects must be exactly the same. This means you should:

Always round numbers to 3 or 4 decimal places using the round() function. Do this in the testing function, rather than making students do it.
Never test objects that include arbitrary elements, such as names or sequences.
Only test the simplest object necessary, not the easiest one to test.

For example, the following objects will return different hashes:

d1 <- data.frame(age = "12")
d2 <- data.frame(Age = "12")

digest(d1) # == d2da0d698613f4cafa7d6fe5af762294
digest(d2) # == cfe4cbf9291d5705b2c61422098db883

Here are some examples of arbitrary elements that you can miss:

Object or variable names (Age != age)
Regression models (y ~ x1 + x2 != y ~ x2 + x1)
Floating point numbers (1.222222222222 != 1.222222222222)
Methods that us randomization (e.g., Monte Carlo methods)

Bottom line: only test mathematical or textual objects, not programming objects unless you are very, very explicit about them.

4 Writing Python Self-Tests

Python self-test scripts are Python files (.py) which supply the testing function in a test class. They use two libraries:

unittest: a test assertion library, which provides functions to check if something is correct and give feedback.
hashlib: a hash library, which computes and check hash functions, and report the hexdigest of one.

Here is an example of the first function of a file and the library headers:


from hashlib import blake2b
import unittest import TestCase as t

# Don't change this one
def hash(data):
    h = blake2b(digest_size=20)
    h.update(data)
    return h.hexdigest()


class Test():

  def test1():
    t.assertEqual(hash(answer1), "dbc09cba9fe2583fb01d63c70e1555a8")

See Section 3.1 and Section 3.2 for guidelines above writing richer tests, and some common mistakes. The issues and advice applies to Python as well.

5 Other Uses for Tests

You can also write “hidden” tests for developers; this is recommended when you have a complex example with interdependent parts. Try to make these as hidden as possible from the main notebook; hide them in a supplemental file which is included at runtime.

1 General Framework

R Kernels

Python Kernels

1.1 Use in Jupyter Notebooks (.ipynb)

R Kernels

Python Kernels

2 Answers in .qmd notebooks

R Kernels

Python Kernels

3 Writing R Self-Tests

3.1 Richer Feedback

3.2 Important Notes

4 Writing Python Self-Tests

5 Other Uses for Tests

1.1 Use in Jupyter Notebooks (`.ipynb`)

2 Answers in `.qmd` notebooks