ozzy.utils
This submodule provides utility functions for other parts of ozzy, from simple formatting operations to more complicated file-finding tasks.
axis_from_extent
§
axis_from_extent(nx, lims)
Create a numerical axis from the number of cells and extent limits. The axis values are centered with respect to each cell.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
int
|
The number of cells in the axis. |
required |
|
tuple[float, float]
|
The extent limits (min, max). |
required |
Returns:
Name | Type | Description |
---|---|---|
ax |
ndarray
|
The numerical axis. |
Raises:
Type | Description |
---|---|
ZeroDivisionError
|
If the number of cells is zero. |
TypeError
|
If the second element of |
Examples:
Simple axis
import ozzy as oz
axis = oz.utils.axis_from_extent(10, (0,1))
axis
# array([0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95])
bins_from_axis
§
bins_from_axis(axis)
Create bin edges from a numerical axis. This is useful for binning operations that require the bin edges.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
ndarray
|
The numerical axis. |
required |
Returns:
Name | Type | Description |
---|---|---|
binaxis |
ndarray
|
The bin edges. |
Examples:
Bin edges from simple axis
First we create a simple axis with the axis_from_extent
function:
import ozzy as oz
axis = oz.utils.axis_from_extent(10, (0,1))
print(axis)
# [0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95]
bedges = oz.utils.bins_from_axis(axis)
bedges
# array([-6.9388939e-18, 1.0000000e-01, 2.0000000e-01, 3.0000000e-01, 4.0000000e-01, 5.0000000e-01, 6.0000000e-01, 7.0000000e-01, 8.0000000e-01, 9.0000000e-01, 1.0000000e+00])
(In this example there is some rounding error for the zero edge.)
check_h5_availability
§
check_h5_availability(path)
Check if an HDF5 file can be opened for writing.
Note
This method is useful for longer analysis operations that save a file at the end. Without checking the output file writeability at the beginning, there is the risk of undergoing the lengthy processing and then failing to write the result to a file at the end.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The path to the HDF5 file. |
required |
Raises:
Type | Description |
---|---|
FileNotFoundError
|
If the file is not found. |
BlockingIOError
|
If the file is in use and cannot be overwritten. |
OSError
|
If there is another issue with the file. |
Examples:
Writing a custom analysis function
import ozzy as oz
def my_analysis(ds, output_file='output.h5'):
# Check whether output file is writeable
oz.utils.check_h5_availability(output_file)
# Perform lengthy analysis
# ...
new_ds = 10 * ds
# Save result
new_ds.ozzy.save(output_file)
return
find_runs
§
find_runs(path, runs_pattern)
Find run directories matching a glob pattern.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The base path. |
required |
|
str | list[str]
|
The run directory name or glob pattern(s). |
required |
Returns:
Name | Type | Description |
---|---|---|
dirs_dict |
dict
|
A dictionary mapping run names to their relative directory paths. |
Examples:
Finding set of run folders
Let's say we have a set of simulations that pick up from different checkpoints of a baseline simulation, with the following folder tree:
.
└── all_simulations/
├── baseline/
│ ├── data.h5
│ ├── checkpoint_t_00200.h5
│ ├── checkpoint_t_00400.h5
│ ├── checkpoint_t_00600.h5
│ └── ...
├── from_t_00200/
│ └── data.h5
├── from_t_00400/
│ └── data.h5
├── from_t_00600/
│ └── data.h5
├── ...
└── other_simulation
To get the directories of each subfolder, we could use either
import ozzy as oz
run_dirs = oz.utils.find_runs(path = "all_simulations", runs_pattern = "from_t_*")
print(run_dirs)
# {'from_t_00200': 'from_t_00200', 'from_t_00400': 'from_t_00400', ...}
import ozzy as oz
run_dirs = oz.utils.find_runs(path = ".", runs_pattern = "all_simulations/from_t_*")
print(run_dirs)
# {'from_t_00200': 'all_simulations/from_t_00200', 'from_t_00400': 'all_simulations/from_t_00400', ...}
Note that this function does not work recursively, though it still returns the current directory if no run folders are found:
import ozzy as oz
run_dirs = oz.utils.find_runs(path = ".", runs_pattern = "from_t_*")
# Could not find any run folder:
# - Checking whether already inside folder...
# ...no
# - Proceeding without a run name.
print(run_dirs)
# {'undefined': '.'}
force_str_to_list
§
force_str_to_list(var)
Convert a string to a list containing the string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str | object
|
The input variable. |
required |
Returns:
Name | Type | Description |
---|---|---|
var |
list
|
A list containing the input variable if it was a string, or the original object. |
Examples:
Example
import ozzy as oz
oz.utils.force_str_to_list('hello')
# ['hello']
oz.utils.force_str_to_list([1, 2, 3])
# [1, 2, 3]
get_attr_if_exists
§
get_attr_if_exists(
da, attr, str_exists=None, str_doesnt=None
)
Retrieve an attribute from a xarray DataArray if it exists, or return a specified value otherwise.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
DataArray
|
The xarray DataArray object to check for the attribute. |
required |
|
str
|
The name of the attribute to retrieve. |
required |
|
str | Iterable[str] | Callable | None
|
None
|
|
|
str | None
|
The value to return if the attribute doesn't exist. If |
None
|
Returns:
Type | Description |
---|---|
str | None
|
The processed attribute value if it exists, |
Notes
If str_exists
is an Iterable
with more than two elements, only the first two are used,
and a warning is printed.
Examples:
Basic usage with string
import ozzy as oz
import numpy as np
# Create a sample DataArray with an attribute
da = oz.DataArray(np.random.rand(3, 3), attrs={'units': 'meters'})
result = get_attr_if_exists(da, 'missing_attr', 'Exists', 'Does not exist')
print(result)
# Output: Does not exist
Using an Iterable and a Callable
import ozzy as oz
import numpy as np
da = oz.DataArray(np.random.rand(3, 3), attrs={'units': 'meters'})
# Using an Iterable
result = get_attr_if_exists(da, 'units', ['Unit: ', ' (SI)'], 'No unit')
print(result)
# Output: Unit: meters (SI)
result = get_attr_if_exists(da, 'units', lambda x: f'The unit is: {x}', 'No unit found')
print(result)
# Output: The unit is: meters
# Using a Callable
result = get_attr_if_exists(da, 'units', lambda x: x.upper(), 'No unit')
print(result)
# Output: METERS
get_regex_snippet
§
get_regex_snippet(pattern, string)
Extract a regex pattern from a string using re.search
.
Tip
Use regex101.com to experiment with and debug regular expressions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The regular expression pattern. |
required |
|
str
|
The input string. |
required |
Returns:
Name | Type | Description |
---|---|---|
match |
str
|
The matched substring. |
Examples:
Get number from file name
import ozzy as oz
oz.utils.get_regex_snippet(r'\d+', 'field-001234.h5')
# '001234'
get_user_methods
§
get_user_methods(clss)
Get a list of user-defined methods in a class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
class
|
The input class. |
required |
Returns:
Name | Type | Description |
---|---|---|
methods |
list[str]
|
A list of user-defined method names in the class. |
Examples:
Minimal class
class MyClass:
def __init__(self):
pass
def my_method(self):
pass
import ozzy as oz
oz.utils.get_user_methods(MyClass)
# ['my_method']
path_list_to_pars
§
path_list_to_pars(pathlist)
Split a list of file paths into common directory, run directories, and quantities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
list[str]
|
A list of file paths. |
required |
Returns:
Name | Type | Description |
---|---|---|
common_dir |
str
|
The common directory shared by all file paths. |
dirs_runs |
dict[str, str]
|
A dictionary mapping run folder names to their absolute paths. |
quants |
list[str]
|
A list of unique quantities (file names) present in the input paths. |
Examples:
Simple example
import os
from ozzy.utils import path_list_to_pars
pathlist = ['/path/to/run1/quantity1.txt',
'/path/to/run1/quantity2.txt',
'/path/to/run2/quantity1.txt']
common_dir, dirs_runs, quants = path_list_to_pars(pathlist)
print(f"Common directory: {common_dir}")
# Common directory: /path/to
print(f"Run directories: {dirs_runs}")
# Run directories: {'run1': '/path/to/run1', 'run2': '/path/to/run2'}
print(f"Quantities: {quants}")
# Quantities: ['quantity2.txt', 'quantity1.txt']
Single file path
import os
from ozzy.utils import path_list_to_pars
pathlist = ['/path/to/run1/quantity.txt']
common_dir, dirs_runs, quants = path_list_to_pars(pathlist)
print(f"Common directory: {common_dir}")
# Common directory: /path/to/run1
print(f"Run directories: {dirs_runs}")
# Run directories: {'.': '/path/to/run1'}
print(f"Quantities: {quants}")
# Quantities: ['quantity.txt']
prep_file_input
§
prep_file_input(files)
Prepare path input argument by expanding user paths and converting to absolute paths.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str | list of str
|
The input file(s). |
required |
Returns:
Name | Type | Description |
---|---|---|
filelist |
list of str
|
A list of absolute file paths. |
Examples:
Expand user folder
import ozzy as oz
oz.utils.prep_file_input('~/example.txt')
# ['/home/user/example.txt']
oz.utils.prep_file_input(['~/file1.txt', '~/file2.txt'])
# ['/home/user/file1.txt', '/home/user/file2.txt']
print_file_item
§
print_file_item(file)
Print a file name with a leading ' - '.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The file name to be printed. |
required |
Examples:
Example
import ozzy as oz
oz.utils.print_file_item('example.txt')
# - example.txt
recursive_search_for_file
§
recursive_search_for_file(fname, path=os.getcwd())
Recursively search for files with a given name or pattern in a specified directory and its subdirectories.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The name or name pattern of the file to search for. |
required |
|
str
|
The path to the directory where the search should start. If not specified, uses the current directory via |
getcwd()
|
Returns:
Type | Description |
---|---|
list[str]
|
A list of paths to the files found, relative to |
Examples:
Search for a file in the current directory
from ozzy.utils import recursive_search_for_file
files = recursive_search_for_file('example.txt')
# files = ['/path/to/current/dir/example.txt']
Search for many files in a subdirectory
from ozzy.utils import recursive_search_for_file
files = recursive_search_for_file('data-*.h5', '/path/to/project')
# files = ['data/data-000.h5', 'data/data-001.h5', 'analysis/data-modified.h5']
set_attr_if_exists
§
set_attr_if_exists(da, attr, str_exists, str_doesnt=None)
Set or modify an attribute of a DataArray if it exists, or modify if it doesn't exist or is None
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
DataArray
|
The input DataArray. |
required |
|
str
|
The name of the attribute to set or modify. |
required |
|
str | Iterable[str] | Callable | None
|
required | |
|
str | None
|
The value to set if the attribute doesn't exist. If |
None
|
Returns:
Type | Description |
---|---|
DataArray
|
The modified DataArray with updated attributes. |
Notes
If str_exists
is an Iterable
with more than two elements, only the first two are used,
and a warning is printed.
Examples:
Set an existing attribute
import ozzy as oz
import numpy as np
# Create a sample DataArray
da = oz.DataArray(np.random.rand(3, 3), attrs={'units': 'meters'})
# Set an existing attribute
da = set_attr_if_exists(da, 'units', 'kilometers')
print(da.attrs['units'])
# Output: kilometers
Modify an existing attribute with a function
import ozzy as oz
import numpy as np
# Create a sample DataArray
da = oz.DataArray(np.random.rand(3, 3), attrs={'description': 'Random data'})
# Modify an existing attribute with a function
da = set_attr_if_exists(da, 'description', lambda x: x.upper())
print(da.attrs['description'])
# Output: RANDOM DATA
Set a non-existing attribute
import ozzy as oz
import numpy as np
# Create a sample DataArray
da = oz.DataArray(np.random.rand(3, 3))
# Set a non-existing attribute
da = set_attr_if_exists(da, 'units', 'meters', str_doesnt='unknown')
print(da.attrs['units'])
# Output: unknown
stopwatch
§
stopwatch(method)
Decorator function to measure the execution time of a method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
callable
|
The method to be timed. |
required |
Returns:
Name | Type | Description |
---|---|---|
timed |
callable
|
A wrapped version of the input method that prints the execution time. |
Examples:
Get execution time whenever a function is called
from ozzy.utils import stopwatch
@stopwatch
def my_function(a, b):
return a + b
my_function(2, 3)
# -> 'my_function' took: 0:00:00.000001
# 5
tex_format
§
tex_format(str)
Format a string for TeX by enclosing it with '$' symbols.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
The input string. |
required |
Returns:
Name | Type | Description |
---|---|---|
newstr |
str
|
The TeX-formatted string. |
Examples:
Example
import ozzy as oz
oz.utils.tex_format('k_p^2')
# '$k_p^2$'
oz.utils.tex_format('')
# ''
unpack_attr
§
unpack_attr(attr)
Unpack a NumPy array attribute, typically from HDF5 files.
This function handles different shapes and data types of NumPy arrays, particularly focusing on string (byte string) attributes. It's useful for unpacking attributes read from HDF5 files using h5py.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
ndarray
|
The input NumPy array to unpack. |
required |
Returns:
Type | Description |
---|---|
object
|
The unpacked attribute. For string attributes, it returns a UTF-8 decoded string. For other types, it returns either a single element (if the array has only one element) or the entire array. |
Raises:
Type | Description |
---|---|
AssertionError
|
If the input is not a NumPy array. |
Notes
- For string attributes (
dtype.kind == 'S'
):- 0D arrays: returns the decoded string
- 1D arrays: returns the first element decoded
- 2D arrays: returns the first element if size is 1, otherwise the entire array
- For non-string attributes:
- If the array has only one element, returns that element
- Otherwise, returns the entire array
Examples:
Unpacking a string attribute
import numpy as np
import ozzy.utils as utils
# Create a NumPy array with a byte string
attr = np.array(b'Hello, World!')
result = utils.unpack_attr(attr)
print(result)
# Output: Hello, World!
Unpacking a numeric attribute
import numpy as np
import ozzy.utils as utils
# Create a NumPy array with a single number
attr = np.array([42])
result = utils.unpack_attr(attr)
print(result)
# Output: 42