unox.input
Attributes
Functions
|
Return whether the given variable is an x or y variable. |
|
Get the index of the given variable in the input array. |
|
Create a y input file for the Unet model for the given year. |
|
Write an xarray Dataset to a netcdf file, appending or overwriting as needed. |
|
Add attributes to an xarray Dataset. |
|
Add attributes to a variable in an xarray Dataset. |
|
Scale a variable in an xarray Dataset by a given factor. |
|
Add a (t-1) version of the given variable to the dataset. |
|
Create an x input file for the Unet model for the given year and stage. |
|
Add stage 2 for the variable in an xarray Dataset using available insitu data. |
|
Create y input files for multiple years. |
|
Create x input files for multiple years and stages. |
|
Create all input files for the Unet model. |
|
Create a metadata file for the dataset in the given directory. |
|
Create an input configuration file. |
|
Copy an input set to a new location. |
Module Contents
- unox.input.era5_vars_list = ['u10', 'v10', 'blh', 'sp', 'skt', 't2m', 'ssrd', 'lsm']
- unox.input.input_vars_dict
- unox.input.x_or_y_var(var)[source]
Return whether the given variable is an x or y variable.
- Parameters:
var (str) – The variable to check.
- Returns:
x_or_y – ‘x’ if the variable is an x variable, ‘y’ if it is a y variable.
- Return type:
str
Examples
>>> x_or_y_var('no2') 'x' >>> x_or_y_var('nox') 'y'
- unox.input.get_input_index(var)[source]
Get the index of the given variable in the input array.
- Parameters:
var (str) – The variable to check.
- Returns:
index – The index of the variable in the input array.
- Return type:
int
Examples
>>> get_input_index('no2') 0 >>> get_input_index('u10') 2
- unox.input.make_y_input_file(year, var='nox', emiss_dir='/data/high_res/emacdonald/unet/datafiles/t106', emiss_pre='nox_', emiss_post='_t106_US.nc', scale_factor=1000000000000.0, nan_fill=0, stage_2_cutoff=2013, output_dir='test_input', write_this_year=True, overwrite=True, output_format='nc', **kwargs)[source]
Create a y input file for the Unet model for the given year.
The array in the generated file will have these dimensions: - time: 364 (or 365 for leap years)
One day less than usual to allow for t-1 variable
lat: length depends on the latitude grid
lon: length depends on the longitude grid
var: 1 (a dummy dimension to match the x input files)
- Parameters:
year (int) – The year for which to create the y input file (between 2005 and 2021).
var (str, optional) – The variable to extract from the dataset. Default is ‘nox’.
emiss_dir (str, optional) – Directory where the emissions data are stored. Default is ‘/data/high_res/emacdonald/unet/datafiles/t106’.
emiss_pre (str, optional) – Prefix for the emissions input file name. Default is ‘nox_’.
emiss_post (str, optional) – Extension for the input file name. Default is ‘_t106_US.nc’.
scale_factor (float, optional) – Factor by which to scale the data. Default is 1e12.
nan_fill (float, optional) – Value to fill NaNs in the dataset. Default is 0.
stage_2_cutoff (int, optional) – Year after which the data will also be saved in stage 2.
output_dir (str, optional) – Directory inside inputfiles/ where the output y input file will be saved. Default is ‘test_input’.
write_this_year (bool, optional) – Whether to write the data for this year or just return the xarray without writing to file. Default is True.
output_format (str, optional) – Whether to save netcdf files (‘nc’), numpy arrays (‘npy’), or ‘both’. Default is ‘nc’. Irrelevant if write_this_year is False.
overwrite (bool, optional) – Whether to overwrite existing netcdf files. Default is True.
**kwargs (dict, optional) – Additional keyword arguments (not used).
- Returns:
input_netcdf_xr (xarray.Dataset) – The y input data for the specified year.
g_attr_dict (dict) – Dictionary of global attributes for the dataset.
- unox.input.write_input_netcdf(input_netcdf_xr, output_filepath, g_attr_dict=None, overwrite=True, sort=True, **kwargs)[source]
Write an xarray Dataset to a netcdf file, appending or overwriting as needed.
- Parameters:
input_netcdf_xr (xarray.Dataset) – The dataset to write to the netcdf file.
output_filepath (str) – Path to the output netcdf file.
g_attr_dict (dict, optional) – Dictionary of global attributes to add to the dataset if creating a new file.
overwrite (bool, optional) – Whether to overwrite existing data in the netcdf file if there are overlapping times. Default is True.
sort (bool, optional) – Whether to sort the xarray before writing to netcdf. Sorting takes a long time. Default is True.
- Returns:
input_netcdf_xr – The dataset that was written to the netcdf file.
- Return type:
xarray.Dataset
- unox.input.set_global_attrs(xr_dataset, attr_dict)[source]
Add attributes to an xarray Dataset.
- Parameters:
xr_dataset (xarray.Dataset) – The dataset to which attributes will be added.
attr_dict (dict) – Dictionary of attributes to add to the dataset.
- Returns:
The dataset with added attributes.
- Return type:
xarray.Dataset
- unox.input.set_var_attrs(xr_dataset, var, attr_dict)[source]
Add attributes to a variable in an xarray Dataset.
- Parameters:
xr_dataset (xarray.Dataset) – The dataset containing the variable to which attributes will be added.
var (str) – The variable to which attributes will be added.
attr_dict (dict) – Dictionary of attributes to add to the variable.
- Returns:
The dataset with the variable having added attributes.
- Return type:
xarray.Dataset
- unox.input.scale_xr_var(xr_dataset, var, scale_factor)[source]
Scale a variable in an xarray Dataset by a given factor.
- Parameters:
xr_dataset (xarray.Dataset) – The dataset containing the variable to be scaled.
var (str) – The variable to be scaled.
scale_factor (float) – The factor by which to scale the variable.
- Returns:
The dataset with the scaled variable.
- Return type:
xarray.Dataset
- unox.input.add_tm1_var(xr_dataset, var, year)[source]
Add a (t-1) version of the given variable to the dataset.
Add a version of the given variable which is shifted by one day (t-1) to the dataset, and drop January 1st from the time coordinate.
- Parameters:
xr_dataset (xarray.Dataset) – The dataset containing the variable to shifted.
var (str) – The variable to be shifted.
year (int) – The year which xr_dataset covers (between 2005 and 2021).
- Returns:
The dataset with the shifted variable.
- Return type:
xarray.Dataset
- unox.input.make_x_input_file(year, stage_2=True, data_dir='/data/high_res', chemra_path='emacdonald/unet/datafiles/TROPESS/TROPESS_reanalysis_2hr_no2_sfc_', chemra_var='no2', insitu_path='US_EPA/NO2/daily_NO2/daily_42602_', era5_path='ERA5concatenated', scale_factors={'chemra': 0.001, 'sp': 1e-05, 'ssrd': 1e-06, 'blh': 0.001}, stage_2_cutoff=2013, output_dir='test_input', write_this_year=True, output_format='nc', overwrite=True, **kwargs)[source]
Create an x input file for the Unet model for the given year and stage.
The array in the file will have these dimensions: - time: 364 (or 365 for leap years)
One day less than usual to allow for t-1 variable
lat: length depends on the latitude grid
lon: length depends on the longitude grid
var: 9 variables (e.g., ‘no2’, ‘u10’, ‘v10’, etc.)
- Parameters:
year (int) – The year for which to create the x input file.
stage_2 (bool, optional) – Whether or not to make stage 2 in addition to stage 1 for the input. Default is True.
data_dir (str, optional) – Directory where the NOx data are stored. Default is ‘/data/high_res’.
chemra_path (str, optional) – Path to the chemical reanalysis data files. Default is ‘emacdonald/unet/datafiles/TROPESS/TROPESS_reanalysis_2hr_no2_sfc_’.
chemra_var (str, optional) – The variable to extract from the dataset. Default is ‘no2’
insitu_path (str, optional) – Path to the insitu data files. Default is ‘US_EPA/NO2/daily_NO2/daily_42602_’.
era5_path (str, optional) – Path to the ERA5 reanalysis data files. Default is ‘ERA5concatenated’.
scale_factors (dict, optional) – Scaling factors for the variables. Default is a dictionary with scaling factors for ‘chemra’, ‘sp’, ‘ssrd’, and ‘blh’.
stage_2_cutoff (int, optional) – Year after which input files will also be generated for stage 2. Default is 2013.
output_dir (str, optional) – Directory inside inputfiles/ where the output x input file will be saved. Default is ‘test_input’.
write_this_year (bool, optional) – Whether to write the data for this year or just return the xarray without writing to file. Default is True.
output_format (str, optional) – Whether to save netcdf files (‘nc’), numpy arrays (‘npy’), or ‘both’. Default is ‘nc’. Irrelevant if write_this_year is False.
overwrite (bool, optional) – Whether to overwrite existing netcdf files. Default is True.
**kwargs (dict, optional) – Additional keyword arguments (not used).
- Returns:
x_data – The x input data for the specified year.
- Return type:
xarray.Dataset
- unox.input.fill_w_insitu(xr_dataset, insitu_filepath, var='no2')
Add stage 2 for the variable in an xarray Dataset using available insitu data.
Given an xarray Dataset with reanalysis data, duplicate the specified variable and replace values of that duplicated variable when and where there is available insitu data in the provided filepath, to be used for stage 2 of training the unet.
- Parameters:
xr_dataset (xarray.Dataset) – The dataset containing reanalysis data.
insitu_filepath (str) – Path to the CSV file containing insitu data.
var (str, optional) – The variable to replace in the dataset. Default is ‘no2’.
- Returns:
The updated dataset with insitu data replacing the specified variable.
- Return type:
xarray.Dataset
- unox.input.make_all_y_input_files(years=range(2005, 2021), var='nox', output_dir='test_input', sort=True, **kwargs)[source]
Create y input files for multiple years.
Runs the make_y_input_file function for each year in the specified range.
- Parameters:
years (iterable, optional) – Years for which to create y input files. Default is range(2005, 2021).
var (str, optional) – Variable to extract from the dataset. Default is ‘nox’.
output_dir (str, optional) – Directory inside inputfiles/ where the output y input files will be saved. Default is ‘test_input’.
sort (bool, optional) – Whether to sort the xarray after making all y inputs. Sorting takes a long time. Default is True.
**kwargs (dict, optional) – Additional keyword arguments to pass to the make_y_input_file function.
- Returns:
y_data_array (list of numpy.ndarray)
List of y input data arrays for the specified years.
- unox.input.make_all_x_input_files(years=range(2005, 2021), stage_2=True, stage_2_cutoff=2013, output_dir='test_input', sort=True, **kwargs)[source]
Create x input files for multiple years and stages.
Run the make_x_input_file function for each year and stage in the specified ranges.
- Parameters:
years (iterable, optional) – Years for which to create x input files. Default is range(2005, 2021).
stage_2 (bool, optional) – Whether or not to make stage 2 in addition to stage 1 for the input. Default is True.
stage_2_cutoff (int, optional) – Year after which the data will also be saved in stage 2. Default is 2013.
output_dir (str, optional) – Directory inside inputfiles/ where the output x input files will be saved. Default is ‘test_input’.
sort (bool, optional) – Whether to sort the xarray after making all x inputs. Sorting takes a long time. Default is True.
**kwargs (dict, optional) – Additional keyword arguments to pass to the make_x_input_file function.
- Returns:
x_data_array – List of x input data arrays for the specified years and stages.
- Return type:
list of xarray.Dataset
- unox.input.make_all_input_files(output_dir='test_input', sort=True, **kwargs)
Create all input files for the Unet model.
This function combines the creation of y input files and x input files for both stages.
- Parameters:
output_dir (str, optional) – Directory inside inputfiles/ where the output input files will be saved. Default is ‘test_input’.
sort (bool, optional) – Whether to sort the xarray after making all inputs. Sorting takes a long time. Default is True.
**kwargs (dict, optional) – Additional keyword arguments to pass to the make_y_input_file and make_x_input_file functions.
- Returns:
input_netcdf_xr – The combined input data for both x and y.
- Return type:
xarray.Dataset
- unox.input.make_input_metadata_file(input_set, output_dir=None, g_attrs=None, overwrite=True)[source]
Create a metadata file for the dataset in the given directory.
Gather the metadata from the given dataset, format it, and output to a clear-text file that can be easily read.
- Parameters:
input_set (str, xr.Dataset, uarray) – Directory inside inputfiles/ where the dataset is found and in which the metadata file will be saved, or the xarray Dataset
output_dir (str, None, optional) – Directory inside inputfiles/ where the metadata file will be saved. If None, the metadata file will not be saved. Default is None.
g_attrs (dict, None, optional) – Global attributes to use for the metadata file.
overwrite (bool, optional) – Whether to overwrite an existing metadata file. Default is True.
- Returns:
metadata_dict (dict) – The metadata dictionary that was saved to the json file. Has the format:
{ –
- “years”: {
- “x”: [
2005, … 2020
], “y”: [
2005, … 2020
]
}, “y_var”: “nox”, “emiss_dir”: “/data/high_res/t106”, “emiss_pre”: “nox_”, “emiss_post”: “_t106_US.nc”, “nan_fill”: 0, “stage_2_cutoff”: 2013, “x_vars”: [
”no2”, … “ssrd”
], “data_dir”: “/data/high_res”, “chemra_path”: “emacdonald/unet/datafiles/TROPESS/TROPESS_reanalysis_2hr_no2_sfc_”, “insitu_path”: “US_EPA/NO2/daily_NO2/daily_42602_”, “era5_path”: “ERA5concatenated”, “stages”: [
1, 2
]
}
- unox.input.make_input_config(config_name, input_set='no2_sample_input', grid_size=[56, 120], x_vars=['no2', 'no2_tm1', 'u10', 'v10', 'blh', 'sp', 'skt', 't2m', 'ssrd'], stage_2=True, stage_2_cutoff=2013, lsm_vars=[], zfi_vars=[], overwrite=False, **kwargs)[source]
Create an input configuration file.
Create the input configuration file for using input data with the Unet model.
- Parameters:
config_name (str) – Name of the configuration file to be created.
input_set (str or xr.Dataset, optional) – Directory inside inputfiles/ where the dataset is found, or the xarray Dataset. Default is ‘no2_sample_input’.
grid_size (list of int, optional) – The number of grid cells to have in [latitude, longitude] when running the Unet model. Default is [56, 120].
x_vars (list of str, optional) – List of variable names to be used as input features for the model. Default is a list of common meteorological and chemical variables.
stage_2 (bool, optional) – Whether or not stage 2 should be run with the Unet model. Default is True.
stage_2_cutoff (int, optional) – Year after which stage 2 data will be used. Default is 2013.
lsm_vars (list of str, optional) – List of variable names that should use land-sea mask. Default is [‘no2’, ‘no2_tm1’].
zfi_vars (list of str, optional) – List of variable names that should use zero-fill mask. Default is [‘t2m’].
**kwargs (dict, optional)
- Returns:
config_dict – The configuration dictionary that was saved to the json file.
- Return type:
dict
- unox.input.copy_input_files(source_input_set, output_dir, keep_vars='all', start_date=None, end_date=None, overwrite=True, **kwargs)[source]
Copy an input set to a new location.
Create a copy of the input netCDF and input_metadata.json file from the specified source in a new directory, optionally filtering the netCDF to only include specified variables and date range.
- Parameters:
source_input_set (str) – Name of the source input set located in inputfiles/.
output_dir (str) – Name of the output directory inside inputfiles/ where the new input set will be copied to.
keep_vars (list of str or all, optional) – List of variable names to keep in the copied netCDF. If all, all variables are kept. Default is all.
start_date (str, None, or np.datetime64, optional) – Date from which to start the copied data. If None, the start date equals that of the original file. Expected format is ‘YYYY-MM-DDTHH:MM:SS’ or ‘YYYY-MM-DD’. Default is None.
end_date (str, None, or np.datetime64, optional) – Date at which to end the copied data. If None, the end date equals that of the original file. Expected format is ‘YYYY-MM-DDTHH:MM:SS’ or ‘YYYY-MM-DD’. Default is None.
**kwargs (dict, optional)
- Returns:
new_xr_dataset – The copied and filtered xarray Dataset that is saved to the new location.
- Return type:
xarray.Dataset