How to Validate Sequences¶
Index Position¶
To check for a specific sequence, you can pass a list 1 as the requirement argument:
1 2 3 4 5 | from datatest import validate
data = ['A', 'B', 'X', 'C', 'D']
requirement = ['A', 'B', 'C', 'D'] # <- a list
validate(data, requirement)
|
Elements in the data and requirement lists are compared by sequence position. The items at index position 0 are compared to each other, then items at index position 1 are compared to each other, and so on:
In this example, there are three differences:
ValidationError: does not match required sequence (3 differences): [
Invalid('X', expected='C'),
Invalid('C', expected='D'),
Extra('D'),
]
Using enumerate()¶
While the previous example works well for short lists, the error
does not describe where in your sequence the differences occur.
To get the index positions associated with any differences, you
can enumerate()
your data and requirement objects:
1 2 3 4 5 | from datatest import validate
data = ['A', 'B', 'X', 'C', 'D']
requirement = ['A', 'B', 'C', 'D']
validate(enumerate(data), enumerate(requirement))
|
A required enumerate object is treated as a mapping. The keys for any differences will correspond to their index positions:
ValidationError: does not satisfy mapping requirements (3 differences): {
2: Invalid('X', expected='C'),
3: Invalid('C', expected='D'),
4: Extra('D'),
}
Relative Order¶
When comparing elements by sequence position, one mis-alignment can create differences for all following elements. If this behavior is not desireable, you may want to check for relative order instead.
If you want to check the relative order of elements rather than
their index positions, you can use validate.order()
:
1 2 3 4 5 | from datatest import validate
data = ['A', 'B', 'X', 'C', 'D']
requirement = ['A', 'B', 'C', 'D']
validate.order(data, requirement)
|
When checking for relative order, this method tries to align elements into contiguous matching subsequences. This reduces the number of non-matches:
Differences are reported as two-tuples containing the index (in data) where the difference occurs and the non-matching value. In the earlier examples, we saw that validating by index position produced three differences. But in this example, validating the same sequences by relative order produces only one difference:
ValidationError: does not match required order (1 difference): [
Extra((2, 'X')),
]
Footnotes
- 1
The validate() function will check data by index position when the requirement is any iterable object other than a set, mapping, tuple or string. See the Sequence Validation section of the
validate()
documentation for full details.