data0.dataset
Classes
A wrapper class for an xarray Dataset. |
Functions
|
Get the given dataset. |
|
Load the data from the given filepath into an xarray dataset. |
|
Load a CSV file into a pandas DataFrame. |
|
Load a CSV file into an xarray Dataset. |
Get the US EPA species name from the ID. |
|
|
Get the years present in the dataset. |
|
Find and load the relevant metadata dictionary for the given uarray. |
|
Check whether the given dataset has ensemble members. |
|
Find and load the relevant epochs csv logs for the given uarray. |
Module Contents
- class data0.dataset.uarray(dataset, is_input_set=False, is_predict=False, **kwargs)
A wrapper class for an xarray Dataset.
A class that wraps an xarray Dataset of a format specified by verify_dataset(). All method names start with an underscore (_) to avoid conflicts.
- name
The name of the dataset, matching the string given to load the dataset, if applicable.
- Type:
str
- xr
The xarray dataset. Expected to have lat and lon coordinates, and optionally a time coordinate.
- Type:
xr.Dataset or xr.DataArray
- years
A list of unique years present in the time coordinate of the dataset.
- Type:
list of int
- metadata_file
The file path to the metadata file for the dataset, if it is an input or prediction set.
- Type:
str
- metadata
A dictionary of metadata for the dataset, coming from metadata_file.
- Type:
dict
- epochs_logs
An xarray Dataset of the epochs logs for the dataset, if it is a prediction set.
- Type:
xr.Dataset
- is_input_set
Whether the dataset is an input set. If this is True, then is_predict must be False.
- Type:
bool
- is_predict
Whether the dataset is a prediction set. If this is True, then is_input_set must be False.
- Type:
bool
- is_ensemble
Whether the dataset has ensemble members. Only applicable for prediction sets.
- Type:
bool
- _verify(**kwargs)
Verify specified aspects of the dataset using verify_dataset().
- _get_years()
Get a list of unique years present in the time coordinate of the dataset using get_years().
- _select_year(year)
Select data for the specified year from the dataset.
- _get_metadata()
Get the metadata dictionary if the dataset is an input or prediction set using get_metadata().
- _get_epochs_logs()
Get the epochs csv logs if the dataset is a prediction set using get_epochs_logs().
- _shift_lons(**kwargs)
Shift the longitude coordinates of the dataset using shift_lon_arr().
- _verify(**kwargs)
- _is_ensemble()
- _get_years()
- _select_year(year)
- _get_metadata()
- _get_epochs_logs()
- _shift_lons(**kwargs)
- data0.dataset.get_dataset(dataset, is_input_set=False, is_predict=False, **kwargs)
Get the given dataset.
- Parameters:
dataset (str, uarray, xarray.Dataset, xarray.DataArray) – The name of the dataset to get.
is_input_set (bool, optional) – If True, treat the dataset as an input set.
is_predict (bool, optional) – If True, treat the dataset as a model output prediction set.
**kwargs (keyword arguments) – Additional keyword arguments to pass to load_dataset() and verify_dataset().
- Returns:
xr_dataset – The loaded and verified xarray dataset.
- Return type:
xarray.Dataset or xarray.DataArray
- data0.dataset.load_dataset(file_path, **kwargs)
Load the data from the given filepath into an xarray dataset.
Verifies the given filepath, ensures the file contains an applicable format, and loads the data into an xarray dataset.
- Parameters:
file_path (str) – The filepath to the data file to load.
**kwargs (keyword arguments) – Additional keyword arguments to pass to csv_to_xr() and verify_dataset().
- Returns:
xr_dataset – The loaded xarray dataset.
- Return type:
xarray.Dataset or xarray.DataArray
- data0.dataset.csv_to_pd(csv_filepath, is_US_EPA=True, **kwargs)
Load a CSV file into a pandas DataFrame.
Loads a CSV file into a pandas DataFrame, ensuring that the required columns are present if the file is from the US EPA.
- Parameters:
csv_filepath (str) – The path to the CSV file to load.
is_US_EPA (bool, optional) – If True, verify that the CSV file has the required columns for US EPA data. Defaults to True.
**kwargs (keyword arguments) – Additional keyword arguments to accommodate wrapper functions.
- Returns:
df – The loaded DataFrame.
- Return type:
pandas.DataFrame
Examples
>>> df = csv_to_pd('datafiles/US_EPA/daily_42602_2019.csv') >>> df.head() Latitude Longitude Arithmetic Mean Date 2019-01-01 33.553056 -86.815 4.314286 2019-01-08 33.553056 -86.815 6.263636 2019-01-09 33.553056 -86.815 4.957143 2019-01-10 33.553056 -86.815 5.891667 2019-01-11 33.553056 -86.815 14.500000
- data0.dataset.csv_to_xr(csv_filepath, is_US_EPA=True, **kwargs)
Load a CSV file into an xarray Dataset.
Load a CSV file into an xarray Dataset, ensuring that the required columns are present if the file is from the US EPA.
- Parameters:
csv_filepath (str) – The path to the CSV file to load.
is_US_EPA (bool, optional) – If True, verify that the CSV file has the required columns for US EPA data. Defaults to True.
**kwargs (keyword arguments) – Additional keyword arguments to accommodate wrapper functions.
- Returns:
xr_dataset – The loaded Dataset.
- Return type:
xarray.Dataset
Examples
>>> xr_dataset = csv_to_xr('datafiles/US_EPA/daily_42602_2019.csv') >>> xr_dataset
- data0.dataset.get_US_EPA_species_name(ID)
Get the US EPA species name from the ID.
Map the US EPA species ID to the corresponding species name.
- Parameters:
ID (str) – The US EPA species ID to map.
- Returns:
species_name – The corresponding US EPA species name.
- Return type:
str
Examples
>>> species_name = get_US_EPA_species_name('42602') 'no2' >>> species_name = get_US_EPA_species_name('42101') 'co'
- data0.dataset.get_years(dataset)
Get the years present in the dataset.
Get a list of unique years from the time coordinate of the given dataset.
- Parameters:
dataset (str, uarray, xarray.Dataset, xarray.DataArray) – The dataset from which to extract the years.
- Returns:
years – A list of unique years in the dataset.
- Return type:
list of int
- data0.dataset.get_metadata(this_uarr)
Find and load the relevant metadata dictionary for the given uarray.
- Parameters:
this_uarr (uarray) – The uarray object for which to load the metadata.
- Returns:
metadata – The metadata dictionary for this uarray.
- Return type:
dict
- data0.dataset.is_ensemble(dataset, **kwargs)
Check whether the given dataset has ensemble members.
- Parameters:
dataset (str, uarray, xarray.Dataset, xarray.DataArray) – The name of the dataset to get.
**kwargs (keyword arguments) – Additional keyword arguments to pass to load_dataset() and verify_dataset().
- Returns:
is_ensemble – Whether the given dataset has ensemble members.
- Return type:
bool
- data0.dataset.get_epochs_logs(dataset, **kwargs)
Find and load the relevant epochs csv logs for the given uarray.
- Parameters:
dataset (uarray) – The uarray object for which to load the epochs logs.
**kwargs (keyword arguments) – Additional keyword arguments to pass to uarray().
- Returns:
epochs_logs – The dataset of epochs logs for this uarray.
- Return type:
xr.Dataset