How to Customize Differences¶
When using a helper function for validation, datatest’s default
behavior is to produce Invalid
differences when the
function returns False. But you can customize this behavior
by returning a difference object instead of False. The returned
difference is used in place of an automatically generated one.
Default Behavior¶
In the following example, the helper function checks that text
values are upper case and have no extra whitespace. If the values
are good, the function returns True
, if the values are bad it
returns False
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | from datatest import validate
def wellformed(x): # <- Helper function.
"""Must be upercase and no extra whitespace."""
return x == ' '.join(x.split()) and x.isupper()
data = [
'CAPE GIRARDEAU',
'GREENE ',
'JACKSON',
'St. Louis',
]
validate(data, wellformed)
|
Each time the helper function returns False
, an Invalid
difference is created:
Traceback (most recent call last):
File "example.py", line 15, in <module>
validate(data, wellformed)
ValidationError: Must be upercase and no extra whitespace. (2 differences): [
Invalid('GREENE '),
Invalid('St. Louis'),
]
Custom Differences¶
In this example, the helper function returns a custom BadWhitespace
or NotUpperCase
difference for each bad value:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | from datatest import validate, Invalid
class BadWhitespace(Invalid):
"""For strings with leading, trailing, or irregular whitespace."""
class NotUpperCase(Invalid):
"""For strings that aren't upper case."""
def wellformed(x): # <- Helper function.
"""Must be upercase and no extra whitespace."""
if x != ' '.join(x.split()):
return BadWhitespace(x)
if not x.isupper():
return NotUpperCase(x)
return True
data = [
'CAPE GIRARDEAU',
'GREENE ',
'JACKSON',
'St. Louis',
]
validate(data, wellformed)
|
These differences are use in the ValidationError:
Traceback (most recent call last):
File "example.py", line 15, in <module>
validate(data, wellformed)
ValidationError: Must be upercase and no extra whitespace. (2 differences): [
BadWhitespace('GREENE '),
NotUpperCase('St. Louis'),
]
Caution
Typically, you should try to stick with existing differences in your data tests. Only create a custom subclass when its meaning is evident and doing so helps your data preparation workflow.
Don’t add a custom class when it doesn’t benefit your testing process. At best, you’re doing extra work for no added benefit. And at worst, an ambiguous or needlessly complex subclass can cause more problems than it solves.
If you need to resolve ambiguity in a validation, you can split
the check into multiple calls. Below, we perform the same check
demonstrated earlier using two validate()
calls:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | from datatest import validate
data = [
'CAPE GIRARDEAU',
'GREENE ',
'JACKSON',
'St. Louis',
]
def no_irregular_whitespace(x): # <- Helper function.
"""Must have no irregular whitespace."""
return x == ' '.join(x.split())
validate(data, no_irregular_whitespace)
def is_upper_case(x): # <- Helper function.
"""Must be upper case."""
return x.isupper()
validate(data, is_upper_case)
|