[ADR-004] Supporting Single Path Sequences¶
Accepted
Context and Problem Statement¶
This proposal is to make the ranges in a sequence optional, such that a sequence with no ranges (i.e. a single file) is still considered a valid sequence.
Use cases for this include:
Retrieving a sequence string from an unknown source, and not knowing whether it will be a single file or a sequence. For example, a user may have a sequence string stored in a database, and want to use pathseq to loop over the files in that sequence. If the sequence string is for a single file, they would still want to be able to use pathseq to loop over it.
Where this type of pattern has been seen before, users have typically run the equivalent of
.with_existing_pathsimmediately after creating the sequence.Retrieving a sequence string from an unknown source, not knowing whether it will be a single file or a sequence, and wanting to update the unknown source with updated information about which files exist on disk.
Currently, a sequence with only empty ranges is considered to be empty. A single path sequence would have no ranges, and would be considered as having one file. These two definitions are somewhat in conflict, and so introducing single path sequences would erode the concept of a sequence.
PathSeq defines a “stem” slightly differently to pathlib. In pathlib, the stem of a path is the final path component without its suffix. In PathSeq, the stem of a path is the final path component without the ranges and any suffixes. This difference is achievable because the ranges are an additional component that creates a clear separation between the stem from the suffixes. In single path sequences, there is no clear separation between the stem and suffixes, hence why pathlib behaves the way it does. pathlib puts the burden on users to parse the stem and suffixes themselves, and PathSeq would ideally do the same, else risk users reporting unintuitive/inconsistent parsing of suffixes (e.g. “file.tar.gz” having a stem of “file.tar” and suffixes of “.gz” instead of a stem of “file” and suffixes of “.tar.gz”).
Note
This is already an issue for loose path sequences
where the ranges exist at the start or end of the sequence string,
and therefore there is no separation between the stem and suffixes.
The loose format already warns users that ambiguity exists throughout
the API of LoosePathSequence,
so the effect on loose path sequences is not considered significant.
Supporting single path sequences does not significantly complicate the implementation. Wherever we support sequences of an unknown number of ranges we already support sequences with no ranges.
A PathSequence cannot accurately reflect every sequence of paths.
For example, files file.1_1.exr, file.1_2.exr, and file.2_2.exr
would be represented as file.1-2#_1-2.exr in a PathSequence.
Considered Options¶
Change the return type of
.with_existing_pathsfromPathSequencetoPathSequence | None.We cannot gurantee that every path in the returned
PathSequencewill exist. This may create a source of confusion for users, and require them to recheck whether the paths in the sequence exist on disk.Users would have to check the type of the return value before using it or iterating over it. This applies even for those users that are always using a sequence with ranges.
Proper use of
.with_existing_pathscan be type checked.For users that aren’t using type checking, improper use of
.with_existing_pathscould go unnoticed until it is called on a single path sequence for which the file does not exist.The first use case would be written as:
def do_something_with_sequence(seq: str): paths: Iterable[Path] = PathSequence(seq).with_existing_paths() or () for path in paths: if not path.exists(): continue # do something with the file ...
The second use case would not be well supported.
This method would be demonstrated to users as:
>>> seq = PathSequence("tests/fixtures/simple/images.1-5####.exr") >>> for path in seq: ... path.touch(exist_ok=True) ... >>> seq2 = PathSequence("tests/fixtures/simple/images.####.exr") >>> paths = seq2.with_existing_paths() or () >>> for path in paths: ... path ... PosixPath('tests/fixtures/simple/images.0001.exr') PosixPath('tests/fixtures/simple/images.0002.exr') PosixPath('tests/fixtures/simple/images.0003.exr') PosixPath('tests/fixtures/simple/images.0004.exr') PosixPath('tests/fixtures/simple/images.0005.exr')
.with_existing_pathswill raise an error if it is called on a single file sequence for which the file does not exist.We cannot gurantee that every path in the returned
PathSequencewill exist. This may create a source of confusion for users, and require them to recheck whether the paths in the sequence exist on disk.Proper use of
.with_existing_pathscannot be type checked.For users that aren’t using type checking, improper use of
.with_existing_pathscould go unnoticed until it is called on a single path sequence for which the file does not exist.The first use case would be written as:
def do_something_with_sequence(seq: str): paths: Iterable[Path] try: paths = PathSequence(seq).with_existing_paths() except FileNotFoundError: paths = () for path in paths: if not path.exists(): continue # do something with the path ...
The second use case would not be well supported.
This method would be demonstrated to users as:
>>> seq = PathSequence("tests/fixtures/simple/images.1-5####.exr") >>> for path in seq: ... path.touch(exist_ok=True) ... >>> seq = PathSequence("tests/fixtures/simple/images.####.exr") >>> try: ... paths = seq2.with_existing_paths() ... except FileNotFoundError: ... paths = () ... >>> for path in paths: ... path ... PosixPath('tests/fixtures/simple/images.0001.exr') PosixPath('tests/fixtures/simple/images.0002.exr') PosixPath('tests/fixtures/simple/images.0003.exr') PosixPath('tests/fixtures/simple/images.0004.exr') PosixPath('tests/fixtures/simple/images.0005.exr')
Change the return type of
.with_existing_pathsfromPathSequencetotuple[Self, Completeness], where theCompletenessenum indicates whether the all paths in the sequence exist or not.Users do not need to check the type of the return value before using it or iterating over it. Unless a user is assuming that they are always using a sequence with ranges, they would need to check the completeness value.
Proper use of
.with_existing_pathscan be type checked.For users that aren’t using type checking, improper use of
.with_existing_pathscould go unnoticed until it is called on a single path sequence for which the file does not exist.The return value of this
.with_method would be different from the other.with_methods, which has a high risk of creating confusion. The method could be renamed to.resize_from_existing_pathsto make this distinction clearer.A
PathSequencewill never have to represent paths that must exist on disk, only sequences of paths that _could_ exist on disk.The first use case would be written as:
def do_something_with_sequence(seq: str): paths: Iterable[Path] paths, complete = PathSequence(seq).with_existing_paths() # Skips a path that doesn't exist if complete != Completeness.Empty: for path in paths: # do something with the path ...
The second use case would be well supported because the returned sequence reflects the ranges that exist on disk and information about whether the sequence is complete or not.
def do_something_with_sequence(seq: str): paths: Iterable[Path] paths, complete = PathSequence(seq).with_existing_paths() update_db(paths, complete)
This method would be demonstrated to users as:
>>> seq = PathSequence("tests/fixtures/simple/images.1-5####.exr") >>> for path in seq: ... path.touch(exist_ok=True) ... >>> seq2 = PathSequence("tests/fixtures/simple/images.####.exr") >>> paths, complete = seq2.with_existing_paths() >>> for path in paths: ... path ... PosixPath('tests/fixtures/simple/images.0001.exr') PosixPath('tests/fixtures/simple/images.0002.exr') PosixPath('tests/fixtures/simple/images.0003.exr') PosixPath('tests/fixtures/simple/images.0004.exr') PosixPath('tests/fixtures/simple/images.0005.exr')
Add an
.iter_existing_pathsmethod that always returns an iterable of paths.This method will ignore the file number sequences in the path sequence, and iterate over all paths that could be represented by the sequence string and exist on disk. Users who wish the file number sequences to be taken into account can use the existing
.__iter__method and filter the results by whether they exist on disk.Proper use of
.iter_existing_pathscan be type checked.For users that aren’t using type checking, the returned iterable will be empty if the file does not exist.
A
PathSequencewill never have to represent paths that must exist on disk, only sequences of paths that _could_ exist on disk.The first use case would be written as:
def do_something_with_sequence(seq: str): files: Iterable[Path] = PathSequence(seq).iter_existing_paths() for file in files: # do something with the file ...
The second use case would not be well supported.
This would be demonstrated to users as:
>>> seq = PathSequence("tests/fixtures/simple/images.1-5####.exr") >>> for path in seq: ... path.touch(exist_ok=True) ... >>> seq2 = PathSequence("tests/fixtures/simple/images.####.exr") >>> for path in seq2.iter_existing_paths(): ... path ... PosixPath('tests/fixtures/simple/images.0001.exr') PosixPath('tests/fixtures/simple/images.0002.exr') PosixPath('tests/fixtures/simple/images.0003.exr') PosixPath('tests/fixtures/simple/images.0004.exr') PosixPath('tests/fixtures/simple/images.0005.exr')
We will not support single path sequences. and instead raise an error if a PathSequence is constructed with a single file sequence.
Users will not have to worry about whether
.with_existing_pathscan be used safely. Checking is done upon creation of the sequence.The methods that construct an instance of
BasePurePathSequencewill need to raise an error if the sequence string is for a single file. Users already need to be aware of aParseErrorbeing raised in these methods, so this is not a significant change to the API.The first use case would be written as:
def do_something_with_sequence(seq: str): paths: Iterable[Path] try: paths = PathSequence(seq).with_existing_paths() except NotASequenceError: paths = [path] if (path := Path(seq)).exists() else [] for path in paths: # do something with the path ...
The second use case would not be well supported.
This would be demonstrated to users as:
>>> seq = PathSequence("tests/fixtures/simple/images.1-5####.exr") >>> for path in seq: ... path.touch(exist_ok=True) ... >>> seq2 = PathSequence("tests/fixtures/simple/images.####.exr") >>> try: ... paths = seq2.with_existing_paths() ... except NotASequenceError: ... paths = [path] if (path := Path(str(seq2))).exists() else [] ... >>> for path in paths: ... path ... PosixPath('tests/fixtures/simple/images.0001.exr') PosixPath('tests/fixtures/simple/images.0002.exr') PosixPath('tests/fixtures/simple/images.0003.exr') PosixPath('tests/fixtures/simple/images.0004.exr') PosixPath('tests/fixtures/simple/images.0005.exr')
Decision Outcome¶
We will support single path sequences in LoosePathSequence
by adding an .iter_existing_paths method,
and replacing .with_existing_paths with a .resize_from_existing_paths method.
In LoosePathSequence, ambiguity already exists in the API and users are less concerned about
consistency and simplicity.
PathSequence will not support single path sequences.