How to Get Started With Testing

Once you have reviewed the tutorials and have a basic understanding of datatest, you should be ready to start testing your own data.

1. Create a File and Add Some Sample Code

A simple way to get started is to create a .py file in the same folder as the data you want to test. It’s a good idea to follow established testing conventions and make sure your filename starts with “test_”.

Then, copy one of following the pytest or unittest code samples to use as a template for writing your own tests:

Pytest Samples
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#!/usr/bin/env python3
import pytest
import pandas as pd
import datatest as dt
from datatest import (
    Missing,
    Extra,
    Invalid,
    Deviation,
)


@pytest.fixture(scope='session')
@dt.working_directory(__file__)
def df():
    return pd.read_csv('example.csv')  # Returns DataFrame.


@pytest.mark.mandatory
def test_column_names(df):
    required_names = {'A', 'B', 'C'}
    dt.validate(df.columns, required_names)


def test_a(df):
    requirement = {'x', 'y', 'z'}
    dt.validate(df['A'], requirement)


# ...add more tests here...


if __name__ == '__main__':
    import sys
    sys.exit(pytest.main(sys.argv))
Unittest Samples
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/env python3
import pandas as pd
import datatest as dt
from datatest import (
    Missing,
    Extra,
    Invalid,
    Deviation,
)


@dt.working_directory(__file__)
def setUpModule():
    global df
    df = pd.read_csv('example.csv')


class TestMyData(dt.DataTestCase):
    @dt.mandatory
    def test_column_names(self):
        required_names = {'A', 'B', 'C'}
        self.assertValid(df.columns, required_names)

    def test_a(self):
        requirement = {'x', 'y', 'z'}
        self.assertValid(df['A'], requirement)

    # ...add more tests here...


if __name__ == '__main__':
    from datatest import main
    main()

2. Adapt the Sample Code to Suit Your Data

After copying the sample code into your own file, begin adapting it to suit your data:

  1. Change the fixture to use your data (instead of “example.csv”).

  2. Update the set in test_column_names() to require the names your data should contain (instead of “A”, “B”, and “C”).

  3. Rename test_a() and change it to check values in one of the columns in your data.

  4. Add more tests appropriate for your own data requirements.

3. Refactor Your Tests as They Grow

As your tests grow, look to structure them into related groups. Start by creating separate classes to contain groups of related test cases. And as you develop more and more classes, create separate modules to hold groups of related classes. If you are using pytest, move your fixtures into a conftest.py file.