How to Validate Sequences

Index Position

To check for a specific sequence, you can pass a list 1 as the requirement argument:

1
2
3
4
5
from datatest import validate

data = ['A', 'B', 'X', 'C', 'D']
requirement = ['A', 'B', 'C', 'D']  # <- a list
validate(data, requirement)

Elements in the data and requirement lists are compared by sequence position. The items at index position 0 are compared to each other, then items at index position 1 are compared to each other, and so on:

\[\begin{split}\begin{array}{cccc} \hline \textbf{index} & \textbf{data} & \textbf{requirement} & \textbf{result} \\ \hline 0 & \textbf{A} & \textbf{A} & \textrm{matches} \\ 1 & \textbf{B} & \textbf{B} & \textrm{matches} \\ 2 & \textbf{X} & \textbf{C} & \textrm{doesn't match} \\ 3 & \textbf{C} & \textbf{D} & \textrm{doesn't match} \\ 4 & \textbf{D} & no\;value & \textrm{doesn't match} \\ \hline \end{array}\end{split}\]

In this example, there are three differences:

ValidationError: does not match required sequence (3 differences): [
    Invalid('X', expected='C'),
    Invalid('C', expected='D'),
    Extra('D'),
]

Using enumerate()

While the previous example works well for short lists, the error does not describe where in your sequence the differences occur. To get the index positions associated with any differences, you can enumerate() your data and requirement objects:

1
2
3
4
5
from datatest import validate

data = ['A', 'B', 'X', 'C', 'D']
requirement = ['A', 'B', 'C', 'D']
validate(enumerate(data), enumerate(requirement))

A required enumerate object is treated as a mapping. The keys for any differences will correspond to their index positions:

ValidationError: does not satisfy mapping requirements (3 differences): {
    2: Invalid('X', expected='C'),
    3: Invalid('C', expected='D'),
    4: Extra('D'),
}

Relative Order

When comparing elements by sequence position, one mis-alignment can create differences for all following elements. If this behavior is not desireable, you may want to check for relative order instead.

If you want to check the relative order of elements rather than their index positions, you can use validate.order():

1
2
3
4
5
from datatest import validate

data = ['A', 'B', 'X', 'C', 'D']
requirement = ['A', 'B', 'C', 'D']
validate.order(data, requirement)

When checking for relative order, this method tries to align elements into contiguous matching subsequences. This reduces the number of non-matches:

\[\begin{split}\begin{array}{cccc} \hline \textbf{index} & \textbf{data} & \textbf{requirement} & \textbf{result} \\ \hline 0 & \textbf{A} & \textbf{A} & \textrm{matches} \\ 1 & \textbf{B} & \textbf{B} & \textrm{matches} \\ 2 & \textbf{X} & no\;value & \textrm{doesn't match} \\ 3 & \textbf{C} & \textbf{C} & \textrm{matches} \\ 4 & \textbf{D} & \textbf{D} & \textrm{matches} \\ \hline \end{array}\end{split}\]

Differences are reported as two-tuples containing the index (in data) where the difference occurs and the non-matching value. In the earlier examples, we saw that validating by index position produced three differences. But in this example, validating the same sequences by relative order produces only one difference:

ValidationError: does not match required order (1 difference): [
     Extra((2, 'X')),
]

Footnotes

1

The validate() function will check data by index position when the requirement is any iterable object other than a set, mapping, tuple or string. See the Sequence Validation section of the validate() documentation for full details.