Unittest-Style

Basic Example

import datatest

class TestMyData(datatest.DataTestCase):
    @classmethod
    def setUpClass(cls):
        data = [
            {'is_active': 'Y', 'member_id': 105},
            {'is_active': 'Y', 'member_id': 104},
            {'is_active': 'Y', 'member_id': 103},
            {'is_active': 'N', 'member_id': 102},
            {'is_active': 'N', 'member_id': 101},
            {'is_active': 'N', 'member_id': 100},
        ]
        cls.source = datatest.DataSource(data)

    def test_is_active(self):
        allowed_values = {'Y', 'N'}
        self.assertValid(self.source(['is_active']), allowed_values)

    def test_member_id(self):
        def positive_integer(x):  # <- Helper function.
            return isinstance(x, int) and x > 0
        self.assertValid(self.source(['member_id']), positive_integer)

if __name__ == '__main__':
    datatest.main()

Command-Line Interface

The datatest module can be used from the command line just like unittest. To run the program with test discovery use the following command:

python -m datatest

Run tests from specific modules, classes, or individual methods with:

python -m datatest test_module1 test_module2
python -m datatest test_module.TestClass
python -m datatest test_module.TestClass.test_method

The syntax and command-line options (-f, -v, etc.) are the same as unittest—see the unittest documentation for full details.

Note

By default, tests are ordered by module name and line number (within each module).

Unlike strict unit testing, data preparation tests are often dependant on one another—this strict order-by-line-number behavior lets users design test suites appropriately. For example, asserting the population of a city will always fail when the ‘city’ column is missing. So it’s appropriate to validate column names before validating the contents of each column.

DataTestCase

class datatest.DataTestCase(methodName='runTest')

This class extends unittest.TestCase with methods for asserting data validity. In addition to the new functionality, familiar methods (like setUp, addCleanup, etc.) are still available.

assertValid(data, requirement, msg=None)

Fail if the data under test does not satisfy the requirement.

The given data can be a set, sequence, iterable, mapping, or other object. The requirement type determines how the data is validated (see below).

Set membership: When requirement is a set, elements in data are checked for membership in this set. On failure, a ValidationError is raised which contains :class:`Missing or Extra differences:

def test_mydata(self):
    data = ...
    requirement = {'A', 'B', 'C', ...}  # <- set
    self.assertValid(data, requirement)

Regular expression match: When requirement is a regular expression object, elements in data are checked to see if they match the given pattern. On failure, a ValidationError is raised with Invalid differences:

def test_mydata(self):
    data = ...
    requirement = re.compile(r'^[0-9A-F]*$')  # <- regex
    self.assertValid(data, requirement)

Sequence order: When requirement is a list or other sequence, elements in data are checked for matching order and value. On failure, an AssertionError is raised:

def test_mydata(self):
    data = ...
    requirement = ['A', 'B', 'C', ...]  # <- sequence
    self.assertValid(data, requirement)

Mapping comparison: When requirement is a dict (or other mapping), elements of matching keys are checked for equality. This comparison also requires data to be a mapping. On failure, a ValidationError is raised with Invalid or Deviation differences:

def test_mydata(self):
    data = ...  # <- Should also be a mapping.
    requirement = {'A': 1, 'B': 2, 'C': ...}  # <- mapping
    self.assertValid(data, requirement)

Function comparison: When requirement is a function or other callable, elements in data are checked to see if they evaluate to True. When the function returns False, a ValidationError is raised with Invalid differences:

def test_mydata(self):
    data = ...
    def requirement(x):  # <- callable (helper function)
        return x.isupper()
    self.assertValid(data, requirement)

Other comparison: When requirement does not match any previously specified type (e.g., str, float, etc.), elements in data are checked to see if they are equal to the given object. On failure, a ValidationError is raised which contains Invalid or Deviation differences:

def test_mydata(self):
    data = ...
    requirement = 'FOO'
    self.assertValid(data, requirement)
allowedMissing(msg=None)

Allows Missing elements without triggering a test failure:

with self.allowedMissing():
    data = {'A', 'B'}  # <- 'C' is missing
    requirement = {'A', 'B', 'C'}
    self.assertValid(data, requirement)
allowedExtra(msg=None)

Allows Extra elements without triggering a test failure:

with self.allowedExtra():
    data = {'A', 'B', 'C', 'D'}  # <- 'D' is extra
    requirement = {'A', 'B', 'C'}
    self.assertValid(data, requirement)
allowedInvalid(msg=None)

Allows Invalid elements without triggering a test failure:

with self.allowedInvalid():
    data = {'xxx': 'A', 'yyy': 'E'}  # <- 'E' is invalid
    requirement = {'xxx': 'A', 'yyy': 'B'}
    self.assertValid(data, requirement)
allowedDeviation(tolerance, /, msg=None)
allowedDeviation(lower, upper, msg=None)

Allows numeric Deviations within a given tolerance without triggering a test failure:

with self.allowedDeviation(5):  # tolerance of +/- 5
    data = ...
    requirement = ...
    self.assertValid(data, requirement)

Specifying different lower and upper bounds:

with self.allowedDeviation(-2, 3):  # tolerance from -2 to +3
    data = ...
    requirement = ...
    self.assertValid(data, requirement)

Deviations within the given range are suppressed while those outside the range will trigger a test failure.

Empty values (None, empty string, etc.) are treated as zeros when performing comparisons.

allowedPercentDeviation(tolerance, /, msg=None)
allowedPercentDeviation(lower, upper, msg=None)

Allows Deviations with percentages of error within a given tolerance without triggering a test failure:

with self.allowedPercentDeviation(0.03):  # tolerance of +/- 3%
    data = ...
    requirement = ...
    self.assertValid(data, requirement)

Specifying different lower and upper bounds:

with self.allowedPercentDeviation(-0.02, 0.01):  # tolerance from -2% to +1%
    data = ...
    requirement = ...
    self.assertValid(data, requirement)

Deviations within the given range are suppressed while those outside the range will trigger a test failure.

Empty values (None, empty string, etc.) are treated as zeros when performing comparisons.

allowedSpecific(differences, msg=None)

Allows specified differences without triggering a test failure:

diffs = [
    Missing('C'),
    Extra('D'),
]
with self.allowedSpecific(diffs):
    data = {'A', 'B', 'D'}  # <- 'D' extra, 'C' missing
    requirement = {'A', 'B', 'C'}
    self.assertValid(data, requirement)

The differences argument can be a list or dict of differences or a single difference.

allowedKey(function, msg=None)

Allows differences in a mapping where function returns True. For each difference, function will receive the associated mapping key unpacked into one or more arguments.

allowedArgs(function, msg=None)

Allows differences where function returns True. For the ‘args’ attribute of each difference (a tuple), function must accept the number of arguments unpacked from ‘args’.

allowedLimit(number, msg=None)

Allows a limited number of differences without triggering a test failure:

with self.allowedLimit(2):  # Allows up to two differences.
    data = ['47306', '1370', 'TX']  # <- '1370' and 'TX' invalid
    requirement = re.compile('^\d{5}$')
    self.assertValid(data, requirement)

If the count of differences exceeds the given number, the test will fail with a ValidationError containing all observed differences.

Note

In the deviation methods above, tolerance is a positional-only parameter—it cannot be specified using keyword syntax.

Test Runner Program

@datatest.mandatory

A decorator to mark whole test cases or individual methods as mandatory. If a mandatory test fails, DataTestRunner will stop immediately (this is similar to the --failfast command line argument behavior):

@datatest.mandatory
class TestFileFormat(datatest.DataTestCase):
    def test_columns(self):
        ...
@datatest.skip(reason)

A decorator to unconditionally skip a test:

@datatest.skip('Not finished collecting raw data.')
class TestSumTotals(datatest.DataTestCase):
    def test_totals(self):
        ...
@datatest.skipIf(condition, reason)

A decorator to skip a test if the condition is true.

@datatest.skipUnless(condition, reason)

A decorator to skip a test unless the condition is true.

class datatest.DataTestRunner(stream=None, descriptions=True, verbosity=1, failfast=False, buffer=False, resultclass=None, ignore=False)

A data test runner (wraps unittest.TextTestRunner) that displays results in textual form.

resultclass

alias of DataTestResult

run(test)

Run the given tests in order of line number from source file.

class datatest.DataTestProgram(module='__main__', defaultTest=None, argv=None, testRunner=datatest.DataTestRunner, testLoader=unittest.TestLoader, exit=True, verbosity=1, failfast=None, catchbreak=None, buffer=None, warnings=None)

datatest.main

alias of DataTestProgram