data0.dataset

Classes

uarray

A wrapper class for an xarray Dataset.

Functions

get_dataset(dataset[, is_input_set, is_predict])

Get the given dataset.

load_dataset(file_path, **kwargs)

Load the data from the given filepath into an xarray dataset.

csv_to_pd(csv_filepath[, is_US_EPA])

Load a CSV file into a pandas DataFrame.

csv_to_xr(csv_filepath[, is_US_EPA])

Load a CSV file into an xarray Dataset.

get_US_EPA_species_name(ID)

Get the US EPA species name from the ID.

get_years(dataset)

Get the years present in the dataset.

get_metadata(this_uarr)

Find and load the relevant metadata dictionary for the given uarray.

is_ensemble(dataset, **kwargs)

Check whether the given dataset has ensemble members.

get_epochs_logs(dataset, **kwargs)

Find and load the relevant epochs csv logs for the given uarray.

Module Contents

class data0.dataset.uarray(dataset, is_input_set=False, is_predict=False, **kwargs)

A wrapper class for an xarray Dataset.

A class that wraps an xarray Dataset of a format specified by verify_dataset(). All method names start with an underscore (_) to avoid conflicts.

name

The name of the dataset, matching the string given to load the dataset, if applicable.

Type:

str

xr

The xarray dataset. Expected to have lat and lon coordinates, and optionally a time coordinate.

Type:

xr.Dataset or xr.DataArray

years

A list of unique years present in the time coordinate of the dataset.

Type:

list of int

metadata_file

The file path to the metadata file for the dataset, if it is an input or prediction set.

Type:

str

metadata

A dictionary of metadata for the dataset, coming from metadata_file.

Type:

dict

epochs_logs

An xarray Dataset of the epochs logs for the dataset, if it is a prediction set.

Type:

xr.Dataset

is_input_set

Whether the dataset is an input set. If this is True, then is_predict must be False.

Type:

bool

is_predict

Whether the dataset is a prediction set. If this is True, then is_input_set must be False.

Type:

bool

is_ensemble

Whether the dataset has ensemble members. Only applicable for prediction sets.

Type:

bool

_verify(**kwargs)

Verify specified aspects of the dataset using verify_dataset().

_get_years()

Get a list of unique years present in the time coordinate of the dataset using get_years().

_select_year(year)

Select data for the specified year from the dataset.

_get_metadata()

Get the metadata dictionary if the dataset is an input or prediction set using get_metadata().

_get_epochs_logs()

Get the epochs csv logs if the dataset is a prediction set using get_epochs_logs().

_shift_lons(**kwargs)

Shift the longitude coordinates of the dataset using shift_lon_arr().

_verify(**kwargs)
_is_ensemble()
_get_years()
_select_year(year)
_get_metadata()
_get_epochs_logs()
_shift_lons(**kwargs)
data0.dataset.get_dataset(dataset, is_input_set=False, is_predict=False, **kwargs)

Get the given dataset.

Parameters:
  • dataset (str, uarray, xarray.Dataset, xarray.DataArray) – The name of the dataset to get.

  • is_input_set (bool, optional) – If True, treat the dataset as an input set.

  • is_predict (bool, optional) – If True, treat the dataset as a model output prediction set.

  • **kwargs (keyword arguments) – Additional keyword arguments to pass to load_dataset() and verify_dataset().

Returns:

xr_dataset – The loaded and verified xarray dataset.

Return type:

xarray.Dataset or xarray.DataArray

data0.dataset.load_dataset(file_path, **kwargs)

Load the data from the given filepath into an xarray dataset.

Verifies the given filepath, ensures the file contains an applicable format, and loads the data into an xarray dataset.

Parameters:
  • file_path (str) – The filepath to the data file to load.

  • **kwargs (keyword arguments) – Additional keyword arguments to pass to csv_to_xr() and verify_dataset().

Returns:

xr_dataset – The loaded xarray dataset.

Return type:

xarray.Dataset or xarray.DataArray

data0.dataset.csv_to_pd(csv_filepath, is_US_EPA=True, **kwargs)

Load a CSV file into a pandas DataFrame.

Loads a CSV file into a pandas DataFrame, ensuring that the required columns are present if the file is from the US EPA.

Parameters:
  • csv_filepath (str) – The path to the CSV file to load.

  • is_US_EPA (bool, optional) – If True, verify that the CSV file has the required columns for US EPA data. Defaults to True.

  • **kwargs (keyword arguments) – Additional keyword arguments to accommodate wrapper functions.

Returns:

df – The loaded DataFrame.

Return type:

pandas.DataFrame

Examples

>>> df = csv_to_pd('datafiles/US_EPA/daily_42602_2019.csv')
>>> df.head()
            Latitude    Longitude       Arithmetic Mean
Date
2019-01-01      33.553056       -86.815     4.314286
2019-01-08      33.553056       -86.815     6.263636
2019-01-09      33.553056       -86.815     4.957143
2019-01-10      33.553056       -86.815     5.891667
2019-01-11      33.553056       -86.815     14.500000
data0.dataset.csv_to_xr(csv_filepath, is_US_EPA=True, **kwargs)

Load a CSV file into an xarray Dataset.

Load a CSV file into an xarray Dataset, ensuring that the required columns are present if the file is from the US EPA.

Parameters:
  • csv_filepath (str) – The path to the CSV file to load.

  • is_US_EPA (bool, optional) – If True, verify that the CSV file has the required columns for US EPA data. Defaults to True.

  • **kwargs (keyword arguments) – Additional keyword arguments to accommodate wrapper functions.

Returns:

xr_dataset – The loaded Dataset.

Return type:

xarray.Dataset

Examples

>>> xr_dataset = csv_to_xr('datafiles/US_EPA/daily_42602_2019.csv')
>>> xr_dataset
data0.dataset.get_US_EPA_species_name(ID)

Get the US EPA species name from the ID.

Map the US EPA species ID to the corresponding species name.

Parameters:

ID (str) – The US EPA species ID to map.

Returns:

species_name – The corresponding US EPA species name.

Return type:

str

Examples

>>> species_name = get_US_EPA_species_name('42602')
'no2'
>>> species_name = get_US_EPA_species_name('42101')
'co'
data0.dataset.get_years(dataset)

Get the years present in the dataset.

Get a list of unique years from the time coordinate of the given dataset.

Parameters:

dataset (str, uarray, xarray.Dataset, xarray.DataArray) – The dataset from which to extract the years.

Returns:

years – A list of unique years in the dataset.

Return type:

list of int

data0.dataset.get_metadata(this_uarr)

Find and load the relevant metadata dictionary for the given uarray.

Parameters:

this_uarr (uarray) – The uarray object for which to load the metadata.

Returns:

metadata – The metadata dictionary for this uarray.

Return type:

dict

data0.dataset.is_ensemble(dataset, **kwargs)

Check whether the given dataset has ensemble members.

Parameters:
  • dataset (str, uarray, xarray.Dataset, xarray.DataArray) – The name of the dataset to get.

  • **kwargs (keyword arguments) – Additional keyword arguments to pass to load_dataset() and verify_dataset().

Returns:

is_ensemble – Whether the given dataset has ensemble members.

Return type:

bool

data0.dataset.get_epochs_logs(dataset, **kwargs)

Find and load the relevant epochs csv logs for the given uarray.

Parameters:
  • dataset (uarray) – The uarray object for which to load the epochs logs.

  • **kwargs (keyword arguments) – Additional keyword arguments to pass to uarray().

Returns:

epochs_logs – The dataset of epochs logs for this uarray.

Return type:

xr.Dataset