punchbowl.util#

Attributes#

T

Classes#

DataLoader

Interface for passing callable objects instead of file paths to be loaded.

Functions#

`validate_image_is_square`(→ None)	Check that the input array is square.
`load_mask_file`(→ numpy.ndarray)	Load a PUNCH instrument mask.
`output_image_task`(→ None)	Prefect task to write an image to disk.
`load_image_task`(→ ndcube.NDCube)	Prefect task to load data for processing.
`average_datetime`(→ datetime.datetime)	Compute average datetime from a list of datetimes.
`nan_percentile`(→ float \| numpy.ndarray)	Calculate the nan percentile of a 3D cube. Isn't as fast as possible on a single core, but parallelizes very well.
`parallel_sort_first_axis`(→ numpy.ndarray)	Sorts a 3D cube along the first axis.
`nan_percentile_2d`(→ float \| numpy.ndarray)	Percentile-filter a 2D cube with NaN awareness. Parallelizes well.
`nan_gaussian`(→ numpy.ndarray)	Gaussian filter, where NaN pixels are ignored in the convolution, and NaN inputs become NaN outputs.
`interpolate_data`(...)	Interpolates between two data objects.
`find_first_existing_file`(→ ndcube.NDCube \| None)	Find the first cube that's not None in a list of NDCubes.
`bundle_matched_mzp`(...)	Search and bundle MZP triplets closest in time.
`masked_mean`(→ numpy.ndarray)	Masked nanmean along the first axis of entries where both mask is True and data is finite.
`inpaint_nans`(→ numpy.ndarray)	Fill nans in an image with a neighborhood value.
`compute_tb`(→ numpy.ndarray)	Compute total brightness from input NDCube or 3D data array of shape (MZP, ...).

Module Contents#

punchbowl.util.validate_image_is_square(image: numpy.ndarray) → None[source]#: Check that the input array is square.

punchbowl.util.load_mask_file(path: str) → numpy.ndarray[source]#

Load a PUNCH instrument mask.

To write a .bin file that this function can read, use: with open(‘PUNCH_L2_MS1_20250101000000_v0j.bin’, ‘wb’) as f:

np.packbits(np.isfinite(mask).T).tofile(f)

punchbowl.util.output_image_task(data: ndcube.NDCube, output_filename: str) → None[source]#

Prefect task to write an image to disk.

Parameters:

data (NDCube) – data that is to be written
output_filename (str) – where to write the file out

Return type:

None

punchbowl.util.load_image_task(input_filename: str, include_provenance: bool = True, include_uncertainty: bool = True, dtype: type = float) → ndcube.NDCube[source]#

Prefect task to load data for processing.

Parameters:

input_filename (str) – path to file to load
include_provenance (bool) – whether to load the provenance layer
include_uncertainty (bool) – whether to load the uncertainty layer
dtype (type) – dtype to cast the data to

Returns:

loaded version of the image

Return type:

NDCube

punchbowl.util.average_datetime(datetimes: list[datetime.datetime]) → datetime.datetime[source]#: Compute average datetime from a list of datetimes.

punchbowl.util.nan_percentile(array: numpy.ndarray, percentile: float | list[float]) → float | numpy.ndarray[source]#

Calculate the nan percentile of a 3D cube. Isn’t as fast as possible on a single core, but parallelizes very well.

It’s documented that numba’s sort is slower than numpy’s, and this runs single-threaded ~half as fast as the old implementation using numpy. But this parallelizes extremely well, even up to 128 cores for a 1kx2kx2k cube! Thread count can be configured by setting numba.config.NUMBA_NUM_THREADS

The .copy() for each sequence means that, even though percentiling along the zeroth dimension seems wrong from a CPU cache standpoint, transposing the input cube makes very little difference (much less than the time cost of copying the cube into a transposed orientation!). Disabling the copy for a well-dimensioned array doesn’t make a clear difference to execution time.

The nan handling appears to add only negligible computation time

punchbowl.util.parallel_sort_first_axis(array: numpy.ndarray, handle_nans: bool = False, inplace: bool = False) → numpy.ndarray[source]#

Sorts a 3D cube along the first axis.

Parallelizes very well on punch190 and phoenix.

It’s documented that numba’s sort is slower than numpy’s, but this parallelizes extremely well, even up to 64 cores for a 1kx2kx2k cube! Thread count can be configured by setting numba.config.NUMBA_NUM_THREADS

The .copy() for each sequence means that, even though sorting along the zeroth dimension seems wrong from a CPU cache standpoint, transposing the input cube makes very little difference (much less than the time cost of copying the cube into a transposed orientation!).

If handle_nans is True, NaNs are explicitly sorted to the high end of the array. Numba’s sort appears to do this anyway and still sorts the rest of the array correctly, but the flag ensures this behavior with a speed penalty.

Sorting in-place offers a ~50% speed boost in a 1kx2kx2k test case.

punchbowl.util.nan_percentile_2d(array: numpy.ndarray, percentile: float | list[float], window_size: int, preserve_nans: bool = True) → float | numpy.ndarray[source]#

Percentile-filter a 2D cube with NaN awareness. Parallelizes well.

Each pixel is replaced with a percentile of the non-NaN pixels in a local window. At the image edges, the local window is clamped at the image boundary.

See nan_percentile for performance notes

When preserve_nans is True, NaN pixels will remain NaN. Otherwise they will be replaced with the percentile value.

punchbowl.util.nan_gaussian(image: numpy.ndarray, sigma: float) → numpy.ndarray[source]#: Gaussian filter, where NaN pixels are ignored in the convolution, and NaN inputs become NaN outputs.

punchbowl.util.interpolate_data(data_before: ndcube.NDCube, data_after: ndcube.NDCube, reference_time: datetime.datetime, time_key: str = 'DATE-OBS', allow_extrapolation: bool = False, and_uncertainty: bool = False) → numpy.ndarray | tuple[numpy.ndarray, numpy.ndarray][source]#: Interpolates between two data objects.

punchbowl.util.find_first_existing_file(inputs: list[ndcube.NDCube]) → ndcube.NDCube | None[source]#: Find the first cube that’s not None in a list of NDCubes.

punchbowl.util.bundle_matched_mzp(m_cubes: list[ndcube.NDCube], z_cubes: list[ndcube.NDCube], p_cubes: list[ndcube.NDCube], threshold: float = 75.0) → numpy.ndarray | tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]#: Search and bundle MZP triplets closest in time.

punchbowl.util.masked_mean(array: numpy.typing.ArrayLike, mask: numpy.typing.ArrayLike) → numpy.ndarray[source]#: Masked nanmean along the first axis of entries where both mask is True and data is finite.

punchbowl.util.T#

class punchbowl.util.DataLoader[source]#

Bases: abc.ABC, Generic[T]

Interface for passing callable objects instead of file paths to be loaded.

abstractmethod load() → T[source]#: Load the data.

abstractmethod src_repr() → str[source]#: Return a string representation of the data source.

punchbowl.util.inpaint_nans(image: numpy.ndarray, kernel_size: int = 5) → numpy.ndarray[source]#

Fill nans in an image with a neighborhood value.

Parameters:

image (np.ndarray) – image with nans
kernel_size (int) – odd integer size for the smoothing kernel

Returns:

image with nans filled

Return type:

np.ndarray

punchbowl.util.compute_tb(data: ndcube.NDCube | numpy.ndarray) → numpy.ndarray[source]#: Compute total brightness from input NDCube or 3D data array of shape (MZP, …).