punchbowl.util#

Attributes#

T

Classes#

DataLoader

Interface for passing callable objects instead of file paths to be loaded.

ShmPickleableNDArray

A numpy array backed by shared memory that pickles without copying data.

Functions#

validate_image_is_square(→ None)

Check that the input array is square.

load_mask_file(→ numpy.ndarray)

Load a PUNCH instrument mask.

output_image_task(→ None)

Prefect task to write an image to disk.

load_image_task(→ ndcube.NDCube)

Prefect task to load data for processing.

average_datetime(→ datetime.datetime)

Compute average datetime from a list of datetimes.

nan_percentile(→ float | numpy.ndarray)

Calculate the nan percentile of a 3D cube. Isn't as fast as possible on a single core, but parallelizes very well.

parallel_sort_first_axis(→ numpy.ndarray)

Sorts a 3D cube along the first axis.

nan_percentile_2d(→ float | numpy.ndarray)

Percentile-filter a 2D cube with NaN awareness. Parallelizes well.

nan_gaussian(→ numpy.ndarray)

Gaussian filter, where NaN pixels are ignored in the convolution, and NaN inputs become NaN outputs.

interpolate_data(...)

Interpolates between two data objects.

find_first_existing_file(→ ndcube.NDCube | None)

Find the first cube that's not None in a list of NDCubes.

get_dateobs(→ datetime.datetime)

Convert file path or NDCube to date_obs.

get_polstate(→ str)

Convert file path or NDCube to date_obs.

bundle_matched_mzp(→ list[tuple[ndcube.NDCube | str, ...)

Search and bundle MZP triplets closest in time.

masked_mean(→ numpy.ndarray)

Masked nanmean along the first axis of entries where both mask is True and data is finite.

inpaint_nans(→ numpy.ndarray)

Fill nans in an image with a neighborhood value.

compute_tb(→ numpy.ndarray)

Compute total brightness from input NDCube or 3D data array of shape (MZP, ...).

censor_wcs(→ astropy.wcs.WCS)

Remove observer details from a WCS.

Module Contents#

punchbowl.util.validate_image_is_square(image: numpy.ndarray) None[source]#

Check that the input array is square.

punchbowl.util.load_mask_file(path: str) numpy.ndarray[source]#

Load a PUNCH instrument mask.

To write a .bin file that this function can read, use: with open(‘PUNCH_L2_MS1_20250101000000_v0j.bin’, ‘wb’) as f:

np.packbits(np.isfinite(mask).T).tofile(f)

punchbowl.util.output_image_task(data: ndcube.NDCube, output_filename: str) None[source]#

Prefect task to write an image to disk.

Parameters:
  • data (NDCube) – data that is to be written

  • output_filename (str) – where to write the file out

Return type:

None

punchbowl.util.load_image_task(input_filename: str, include_provenance: bool = True, include_uncertainty: bool = True, dtype: type = float) ndcube.NDCube[source]#

Prefect task to load data for processing.

Parameters:
  • input_filename (str) – path to file to load

  • include_provenance (bool) – whether to load the provenance layer

  • include_uncertainty (bool) – whether to load the uncertainty layer

  • dtype (type) – dtype to cast the data to

Returns:

loaded version of the image

Return type:

NDCube

punchbowl.util.average_datetime(datetimes: list[datetime.datetime]) datetime.datetime[source]#

Compute average datetime from a list of datetimes.

punchbowl.util.nan_percentile(array: numpy.ndarray, percentile: float | list[float]) float | numpy.ndarray[source]#

Calculate the nan percentile of a 3D cube. Isn’t as fast as possible on a single core, but parallelizes very well.

It’s documented that numba’s sort is slower than numpy’s, and this runs single-threaded ~half as fast as the old implementation using numpy. But this parallelizes extremely well, even up to 128 cores for a 1kx2kx2k cube! Thread count can be configured by setting numba.config.NUMBA_NUM_THREADS

The .copy() for each sequence means that, even though percentiling along the zeroth dimension seems wrong from a CPU cache standpoint, transposing the input cube makes very little difference (much less than the time cost of copying the cube into a transposed orientation!). Disabling the copy for a well-dimensioned array doesn’t make a clear difference to execution time.

The nan handling appears to add only negligible computation time

punchbowl.util.parallel_sort_first_axis(array: numpy.ndarray, handle_nans: bool = False, inplace: bool = False) numpy.ndarray[source]#

Sorts a 3D cube along the first axis.

Parallelizes very well on punch190 and phoenix.

It’s documented that numba’s sort is slower than numpy’s, but this parallelizes extremely well, even up to 64 cores for a 1kx2kx2k cube! Thread count can be configured by setting numba.config.NUMBA_NUM_THREADS

The .copy() for each sequence means that, even though sorting along the zeroth dimension seems wrong from a CPU cache standpoint, transposing the input cube makes very little difference (much less than the time cost of copying the cube into a transposed orientation!).

If handle_nans is True, NaNs are explicitly sorted to the high end of the array. Numba’s sort appears to do this anyway and still sorts the rest of the array correctly, but the flag ensures this behavior with a speed penalty.

Sorting in-place offers a ~50% speed boost in a 1kx2kx2k test case.

punchbowl.util.nan_percentile_2d(array: numpy.ndarray, percentile: float | list[float], window_size: int, preserve_nans: bool = True) float | numpy.ndarray[source]#

Percentile-filter a 2D cube with NaN awareness. Parallelizes well.

Each pixel is replaced with a percentile of the non-NaN pixels in a local window. At the image edges, the local window is clamped at the image boundary.

See nan_percentile for performance notes

When preserve_nans is True, NaN pixels will remain NaN. Otherwise they will be replaced with the percentile value.

punchbowl.util.nan_gaussian(image: numpy.ndarray, sigma: float) numpy.ndarray[source]#

Gaussian filter, where NaN pixels are ignored in the convolution, and NaN inputs become NaN outputs.

punchbowl.util.interpolate_data(data_before: ndcube.NDCube, data_after: ndcube.NDCube, reference_time: datetime.datetime, time_key: str = 'DATE-OBS', allow_extrapolation: bool = False, and_uncertainty: bool = False) numpy.ndarray | tuple[numpy.ndarray, numpy.ndarray][source]#

Interpolates between two data objects.

punchbowl.util.find_first_existing_file(inputs: list[ndcube.NDCube]) ndcube.NDCube | None[source]#

Find the first cube that’s not None in a list of NDCubes.

punchbowl.util.get_dateobs(file: str | ndcube.NDCube) datetime.datetime[source]#

Convert file path or NDCube to date_obs.

punchbowl.util.get_polstate(file: str | ndcube.NDCube) str[source]#

Convert file path or NDCube to date_obs.

punchbowl.util.bundle_matched_mzp(m_cubes: list[ndcube.NDCube | str], z_cubes: list[ndcube.NDCube | str] | None = None, p_cubes: list[ndcube.NDCube | str] | None = None, threshold: float = 75.0) list[tuple[ndcube.NDCube | str, ndcube.NDCube | str, ndcube.NDCube | str]][source]#

Search and bundle MZP triplets closest in time.

punchbowl.util.masked_mean(array: numpy.typing.ArrayLike, mask: numpy.typing.ArrayLike) numpy.ndarray[source]#

Masked nanmean along the first axis of entries where both mask is True and data is finite.

punchbowl.util.T#
class punchbowl.util.DataLoader[source]#

Bases: abc.ABC, Generic[T]

Interface for passing callable objects instead of file paths to be loaded.

abstractmethod load() T[source]#

Load the data.

abstractmethod src_repr() str[source]#

Return a string representation of the data source.

punchbowl.util.inpaint_nans(image: numpy.ndarray, kernel_size: int = 5) numpy.ndarray[source]#

Fill nans in an image with a neighborhood value.

Parameters:
  • image (np.ndarray) – image with nans

  • kernel_size (int) – odd integer size for the smoothing kernel

Returns:

image with nans filled

Return type:

np.ndarray

punchbowl.util.compute_tb(data: ndcube.NDCube | numpy.ndarray) numpy.ndarray[source]#

Compute total brightness from input NDCube or 3D data array of shape (MZP, …).

punchbowl.util.censor_wcs(wcs: astropy.wcs.WCS, obstime: bool = True, observer: bool = True) astropy.wcs.WCS[source]#

Remove observer details from a WCS.

When input images have slightly different viewpoints, Sunpy will say this is an invalid coordinate transformation. Here we censor information from the WCS to pacify Sunpy.

class punchbowl.util.ShmPickleableNDArray(shape, dtype=float, buffer=None, offset=0, strides=None, order=None)[source]#

Bases: numpy.ndarray

A numpy array backed by shared memory that pickles without copying data.

Pickling happens by only transmitting the shared memory name (and array shape, etc.) and re-connecting to the shared memory on the receiving side, without ever pickling or copying the array contents. This is extremely useful when multi-processing with large data arrays, as data can be sent back and forth between workers with zero copying, and in a very seamless way.

Python spawns a tracker process that ensures the shared memory is freed after the main process terminates. Memory is also freed when an array is deleted (when Python determines the array’s reference count has dropped to zero)—this implies that any views have also been deleted, since views keep a reference to their base array.

ShmPickleableNDArray supports indexing and slicing, creating views into the same shared-memory array the same way that normal NDArrays do. Note that operations that produce a copy of the data, suce as “advanced indexing” ( indexing with an array of booleans or integers) produces a new array not backed by shared memory, which will not enjoy any advantages when pickling. In such a case, the resulting array will raise a RuntimeError if it is pickled.

__array_finalize__(obj: Any) None[source]#

Finalize array setup.

classmethod from_array(array: numpy.ndarray) ShmPickleableNDArray[source]#

Convert an array into a ShmPickleableNDArray.

classmethod empty_like(array: numpy.ndarray) ShmPickleableNDArray[source]#

Create an empty array like the given array.

property orig_array: ShmPickleableNDArray#

Get the whole underlying array.

property shm: multiprocessing.shared_memory.SharedMemory#

Access the base shared memory.

property numpy: numpy.ndarray#

Convert to a plain numpy array.

free() None[source]#

Free shared memory immediately.

Each shared memory object has a tracker process that ensures it is freed when the process that created it terminates. This function is only needed to free the memory early, before the process ends. Note that accessing the array after freeing the backing memory may result in a segfault.

__del__() None[source]#

Delete the array.

__getitem__(*args: tuple, **kwargs: dict) ShmPickleableNDArray[source]#

Index the array.

__setitem__(*args: tuple, **kwargs: dict) None[source]#

Index the array.

__repr__(*args: tuple, **kwargs: dict) str[source]#

Repr the array.

property data: memoryview#

Access array data directly.

__reduce__() tuple[source]#

Pickle the object.