Datatest: Test driven data-wrangling and data validation

Version 0.9.6

Datatest helps speed up and formalize data-wrangling and data validation tasks. It repurposes software testing practices for data preparation and quality assurance projects. Datatest can help you:

  • Clean and wrangle data faster and more accurately.
  • Maintain a record of checks and decisions regarding important data sets.
  • Distinguish between ideal criteria and acceptible deviation.
  • Measure progress of data preparation tasks.
  • On-board new team members with an explicit and structured process.
  • Test data pipeline components and end-to-end behavior.

Datatest supports both pytest and unittest style testing. It implements a system of validation methods, difference classes, and acceptance context managers.

Datatest has no hard dependencies; supports Python 2.6, 2.7, 3.1 through 3.8, PyPy, and PyPy3; and is freely available under the Apache License, version 2.

See the project’s README file for full details regarding supported versions, backward compatibility, and more.

Note

This documentation is aimed at newer versions of Python and uses some of the following features:

If you are using an older version of Python, you may need to convert some examples into older syntax before running them.