Writing Self-Tests for Notebook Development
An important part of notebook development is to design them so they give users formative feedback. Formative feedback helps students check if they understand a concept or skill.
We prefer to use immediate formative feedback, by integrating tests into the notebooks. These self-tests are run by the students and provide them with instant feedback about whether they have something correct or not.
This can be accomplished through the following process:
- We create a notebook_test script which contains a series of functions which take in an object from the workbook and return feedback (e.g. correct/incorrect).
- The object and correct answer are obfuscated using a cryptographic hash function which checks the object against the correct answer without revealing the correct answer.
- This prevents students from hard-coding in their answers by simply peeking at the “correct answer”.
- The notebook instructs students to build or evaluate something, which is the object of the test.
- The notebook reads in the script, passing in the testing functions for use. These tests are then evaluate when certain cells are run, performing the test and giving feedback.
It is also very important to follow best practices when developing these notebooks and tests, since even small mistakes can create a great deal of confusion for users.
1 General Framework
R Kernels
Early in the notebook, usually in the first executed cell, include a source function call to the test scripts file:
source(tests.r)
This file should include the tests, as outlined below in Section 3. In this example, they are of the form test()
.
Python Kernels
Early in the notebook, usually in the first executed cell, import the Tests
class from the test scripts file:
from .tests.py import *
This file should include the tests, as outlined below in Section 4. In this example, they are of the form test()
, and are called like Tests.test()
.
1.1 Use in Jupyter Notebooks (.ipynb
)
R Kernels
In the notebook, ask the students to fill in the object requested, then call the test function. Add a comment to explain what needs to be changed, if it’s not clear.
<- #fill in the correct value here
answer_1
test_1()
- Try to make the test function is a void call; avoid passing parameters.
- Clearly indicate what to change using
#comments
. - Be as specific as possible when giving directions.
Python Kernels
In the notebook, ask the students to fill in the object requested, then call the test function. Add a comment to explain what needs to be changed, if it’s not clear.
= #fill in the correct value here
answer_1
Tests.test()
- Try to make the test function is a void call; avoid passing parameters.
- Clearly indicate what to change using
#comments
. - Be as specific as possible when giving directions.
2 Answers in .qmd
notebooks
R Kernels
Early in the notebook, usually in the first executed cell, include a source link to the test scripts file:
source(tests.r)
This file should include the tests, as outlined below in Section 3. In this example, they are of the form test()
.
In .qmd
notebooks, when you write a test include two versions: one with the answers, and one without. Include meta class tags to help tell them apart, and avoid evaluation. The cell should look like:
#| eval: false
#| classes: "question"
<- #fill in the correct value here
answer_1
test_1()
for the question, and like:
#| eval: false
#| classes: "answer"
<- the_right_answer(stuff)
answer_1
test_1()
for the answer. This will help debug questions easily.
It’s usually easiest to write the answer first, then debug and test.
Python Kernels
Early in the notebook, usually in the first executed cell, import the Tests
class from the test scripts file:
from .tests.py import Tests
This file should include the tests, as outlined below in Section 4. In this example, they are of the form test()
, and are called like Tests.test()
.
In .qmd
notebooks, when you write a test, include two versions: one with the answers, and one without. Include meta class tags to help tell them apart, and avoid evaluation. The cell should look like:
#| eval: false
#| classes: "question"
= #fill in the correct value here
answer_1
Tests.test_1()
for the question, and like:
#| eval: false
#| classes: "answer"
= the_right_answer(stuff)
answer_1
Tests.test_1()
for the answer. This will help debug questions easily.
It’s usually easiest to write the answer first, then debug and test.
3 Writing R Self-Tests
Self-test scripts are R
files (.r
) which supply the testing functions. They use two libraries:
library(testthat)
: a test assertion library, which provides functions to check if something is correct and give feedback.library(digest)
: a hash library, which computes and check hash functions.
Here is an example of the first function of a file and the library headers:
library(testthat)
library(digest)
<- function() {
test_1 test_that("Solution is incorrect", {
expect_equal(digest(answer1), "dbc09cba9fe2583fb01d63c70e1555a8")
})print("Success!")
}
This creates a function (test1()
) that when called in the Jupyter notebook:
- Finds the object
answer1
. - Computes the hash of it (
digest(answer)
) and compares it to the stringdbc09cba9fe2583fb01d63c70e1555a8
(the correct answer’s hash). - If they match, it prints “Success!” otherwise it throws an error.
In order to develop the test, you can use this template:
- Create a new cell to contain the test. If this a
.qmd
test, make it the answer version of the test. - Create a new function in the script file with a unique name (
test_n()
) and the answer (answer_n
) to test in the testing script. - Compute
digest(answer_n)
to get the correct has value. - Add it to the
expect_equal
element in the script. - If a
.qmd
copy the answer, and change it to a question. Then, replace the correct answer with a comment.
Note that you may not want to test the entire object, but rather some particular part of it, such as answer_n$coefs
; see Section 3.2 for details.
3.1 Richer Feedback
The previous method only tests if an answer exactly matches the correct answer. If there are common errors you may want to give a hint about what is wrong. For example, in a multiple-choice question, answers A
and B
reflect common misconceptions.
You can use tests to give this kind of feedback with a more complex test function. Use the case_when
function to give varied responses depending on the answer given by the student. For example:
<- function(answer_1) {
test_1 <- digest(answer_1)
ans case_when(ans == "dbc09cba9fe2583fb01d63c70e1555a8" ~ test_that(TRUE),
== "dd531643bffc240879f11278d7a360c1" ~
ans "This is a common misconception, remember that...",
TRUE ~ test_that(FALSE))
}
You can adapt this framework for more complex tests, as necessary.
It is important to provide feedback that will guide the student towards the right answer and a greater understanding of the topic at hand. Try not to give feedback along the lines of “That is correct, congratulations!” or “I’m sorry, that is incorrect!.” Feedback should point out the error that students are making and guide them to the correct answer.
3.2 Important Notes
Here are some common pitfalls and notes about creating tests. The main idea is that hash functions are exact: the objects must be exactly the same. This means you should:
- Always round numbers to 3 or 4 decimal places using the
round()
function. Do this in the testing function, rather than making students do it. - Never test objects that include arbitrary elements, such as names or sequences.
- Only test the simplest object necessary, not the easiest one to test.
For example, the following objects will return different hashes:
<- data.frame(age = "12")
d1 <- data.frame(Age = "12")
d2
digest(d1) # == d2da0d698613f4cafa7d6fe5af762294
digest(d2) # == cfe4cbf9291d5705b2c61422098db883
Here are some examples of arbitrary elements that you can miss:
- Object or variable names (
Age
!=age
) - Regression models (
y ~ x1 + x2
!=y ~ x2 + x1
) - Floating point numbers (1.222222222222 != 1.222222222222)
- Methods that us randomization (e.g., Monte Carlo methods)
Bottom line: only test mathematical or textual objects, not programming objects unless you are very, very explicit about them.
4 Writing Python Self-Tests
Python self-test scripts are Python files (.py
) which supply the testing function in a test class. They use two libraries:
unittest
: a test assertion library, which provides functions to check if something is correct and give feedback.hashlib
: a hash library, which computes and check hash functions, and report thehexdigest
of one.
Here is an example of the first function of a file and the library headers:
from hashlib import blake2b
import unittest import TestCase as t
# Don't change this one
def hash(data):
= blake2b(digest_size=20)
h
h.update(data)return h.hexdigest()
class Test():
def test1():
hash(answer1), "dbc09cba9fe2583fb01d63c70e1555a8") t.assertEqual(
See Section 3.1 and Section 3.2 for guidelines above writing richer tests, and some common mistakes. The issues and advice applies to Python as well.
5 Other Uses for Tests
You can also write “hidden” tests for developers; this is recommended when you have a complex example with interdependent parts. Try to make these as hidden as possible from the main notebook; hide them in a supplemental file which is included at runtime.