unox.input
==========

.. py:module:: unox.input


Attributes
----------

.. autoapisummary::

   unox.input.era5_vars_list
   unox.input.input_vars_dict


Functions
---------

.. autoapisummary::

   unox.input.x_or_y_var
   unox.input.get_input_index
   unox.input.make_y_input_file
   unox.input.write_input_netcdf
   unox.input.set_global_attrs
   unox.input.set_var_attrs
   unox.input.scale_xr_var
   unox.input.add_tm1_var
   unox.input.make_x_input_file
   unox.input.fill_w_insitu
   unox.input.make_all_y_input_files
   unox.input.make_all_x_input_files
   unox.input.make_all_input_files
   unox.input.make_input_metadata_file
   unox.input.make_input_config
   unox.input.copy_input_files


Module Contents
---------------

.. py:data:: era5_vars_list
   :value: ['u10', 'v10', 'blh', 'sp', 'skt', 't2m', 'ssrd', 'lsm']


.. py:data:: input_vars_dict

.. py:function:: x_or_y_var(var)

   Return whether the given variable is an x or y variable.

   :param var: The variable to check.
   :type var: `str`

   :returns: **x_or_y** -- 'x' if the variable is an x variable, 'y' if it is a y variable.
   :rtype: `str`

   .. rubric:: Examples

   >>> x_or_y_var('no2')
   'x'
   >>> x_or_y_var('nox')
   'y'


.. py:function:: get_input_index(var)

   Get the index of the given variable in the input array.

   :param var: The variable to check.
   :type var: `str`

   :returns: **index** -- The index of the variable in the input array.
   :rtype: `int`

   .. rubric:: Examples

   >>> get_input_index('no2')
   0
   >>> get_input_index('u10')
   2


.. py:function:: make_y_input_file(year, var='nox', emiss_dir='/data/high_res/emacdonald/unet/datafiles/t106', emiss_pre='nox_', emiss_post='_t106_US.nc', scale_factor=1000000000000.0, nan_fill=0, stage_2_cutoff=2013, output_dir='test_input', write_this_year=True, overwrite=True, output_format='nc', **kwargs)

   Create a y input file for the Unet model for the given year.

   The array in the generated file will have these dimensions:
   - time: 364 (or 365 for leap years)
       - One day less than usual to allow for t-1 variable
   - lat: length depends on the latitude grid
   - lon: length depends on the longitude grid
   - var: 1 (a dummy dimension to match the x input files)

   :param year: The year for which to create the y input file (between 2005 and 2021).
   :type year: `int`
   :param var: The variable to extract from the dataset. Default is 'nox'.
   :type var: `str`, optional
   :param emiss_dir: Directory where the emissions data are stored.
                     Default is '/data/high_res/emacdonald/unet/datafiles/t106'.
   :type emiss_dir: `str`, optional
   :param emiss_pre: Prefix for the emissions input file name. Default is 'nox_'.
   :type emiss_pre: `str`, optional
   :param emiss_post: Extension for the input file name. Default is '_t106_US.nc'.
   :type emiss_post: `str`, optional
   :param scale_factor: Factor by which to scale the data. Default is 1e12.
   :type scale_factor: `float`, optional
   :param nan_fill: Value to fill NaNs in the dataset. Default is 0.
   :type nan_fill: `float`, optional
   :param stage_2_cutoff: Year after which the data will also be saved in stage 2.
   :type stage_2_cutoff: `int`, optional
   :param output_dir: Directory inside `inputfiles/` where the output y input file will be saved.
                      Default is `'test_input'`.
   :type output_dir: `str`, optional
   :param write_this_year: Whether to write the data for this year or just return the xarray without writing to file.
                           Default is True.
   :type write_this_year: `bool`, optional
   :param output_format: Whether to save netcdf files ('nc'), numpy arrays ('npy'), or 'both'.
                         Default is 'nc'. Irrelevant if `write_this_year` is False.
   :type output_format: `str`, optional
   :param overwrite: Whether to overwrite existing netcdf files. Default is True.
   :type overwrite: `bool`, optional
   :param \*\*kwargs: Additional keyword arguments (not used).
   :type \*\*kwargs: `dict`, optional

   :returns: * **input_netcdf_xr** (`xarray.Dataset`) -- The y input data for the specified year.
             * **g_attr_dict** (`dict`) -- Dictionary of global attributes for the dataset.


.. py:function:: write_input_netcdf(input_netcdf_xr, output_filepath, g_attr_dict=None, overwrite=True, sort=True, **kwargs)

   Write an xarray Dataset to a netcdf file, appending or overwriting as needed.

   :param input_netcdf_xr: The dataset to write to the netcdf file.
   :type input_netcdf_xr: `xarray.Dataset`
   :param output_filepath: Path to the output netcdf file.
   :type output_filepath: `str`
   :param g_attr_dict: Dictionary of global attributes to add to the dataset if creating a new file.
   :type g_attr_dict: `dict`, optional
   :param overwrite: Whether to overwrite existing data in the netcdf file if there are overlapping times.
                     Default is True.
   :type overwrite: `bool`, optional
   :param sort: Whether to sort the xarray before writing to netcdf. Sorting takes a long time.
                Default is True.
   :type sort: `bool`, optional

   :returns: **input_netcdf_xr** -- The dataset that was written to the netcdf file.
   :rtype: `xarray.Dataset`


.. py:function:: set_global_attrs(xr_dataset, attr_dict)

   Add attributes to an xarray Dataset.

   :param xr_dataset: The dataset to which attributes will be added.
   :type xr_dataset: `xarray.Dataset`
   :param attr_dict: Dictionary of attributes to add to the dataset.
   :type attr_dict: `dict`

   :returns: The dataset with added attributes.
   :rtype: `xarray.Dataset`


.. py:function:: set_var_attrs(xr_dataset, var, attr_dict)

   Add attributes to a variable in an xarray Dataset.

   :param xr_dataset: The dataset containing the variable to which attributes will be added.
   :type xr_dataset: `xarray.Dataset`
   :param var: The variable to which attributes will be added.
   :type var: `str`
   :param attr_dict: Dictionary of attributes to add to the variable.
   :type attr_dict: `dict`

   :returns: The dataset with the variable having added attributes.
   :rtype: `xarray.Dataset`


.. py:function:: scale_xr_var(xr_dataset, var, scale_factor)

   Scale a variable in an xarray Dataset by a given factor.

   :param xr_dataset: The dataset containing the variable to be scaled.
   :type xr_dataset: `xarray.Dataset`
   :param var: The variable to be scaled.
   :type var: `str`
   :param scale_factor: The factor by which to scale the variable.
   :type scale_factor: `float`

   :returns: The dataset with the scaled variable.
   :rtype: `xarray.Dataset`


.. py:function:: add_tm1_var(xr_dataset, var, year)

   Add a (t-1) version of the given variable to the dataset.

   Add a version of the given variable which is shifted by one day (t-1) to the dataset, and drop January 1st from the time coordinate.

   :param xr_dataset: The dataset containing the variable to shifted.
   :type xr_dataset: `xarray.Dataset`
   :param var: The variable to be shifted.
   :type var: `str`
   :param year: The year which xr_dataset covers (between 2005 and 2021).
   :type year: `int`

   :returns: The dataset with the shifted variable.
   :rtype: `xarray.Dataset`


.. py:function:: make_x_input_file(year, stage_2=True, data_dir='/data/high_res', chemra_path='emacdonald/unet/datafiles/TROPESS/TROPESS_reanalysis_2hr_no2_sfc_', chemra_var='no2', insitu_path='US_EPA/NO2/daily_NO2/daily_42602_', era5_path='ERA5concatenated', scale_factors={'chemra': 0.001, 'sp': 1e-05, 'ssrd': 1e-06, 'blh': 0.001}, stage_2_cutoff=2013, output_dir='test_input', write_this_year=True, output_format='nc', overwrite=True, **kwargs)

   Create an x input file for the Unet model for the given year and stage.

   The array in the file will have these dimensions:
   - time: 364 (or 365 for leap years)
       - One day less than usual to allow for t-1 variable
   - lat: length depends on the latitude grid
   - lon: length depends on the longitude grid
   - var: 9 variables (e.g., 'no2', 'u10', 'v10', etc.)

   :param year: The year for which to create the x input file.
   :type year: `int`
   :param stage_2: Whether or not to make stage 2 in addition to stage 1 for the input.
                   Default is True.
   :type stage_2: `bool`, optional
   :param data_dir: Directory where the NOx data are stored.
                    Default is '/data/high_res'.
   :type data_dir: `str`, optional
   :param chemra_path: Path to the chemical reanalysis data files.
                       Default is 'emacdonald/unet/datafiles/TROPESS/TROPESS_reanalysis_2hr_no2_sfc_'.
   :type chemra_path: `str`, optional
   :param chemra_var: The variable to extract from the dataset. Default is 'no2'
   :type chemra_var: `str`, optional
   :param insitu_path: Path to the insitu data files. Default is 'US_EPA/NO2/daily_NO2/daily_42602_'.
   :type insitu_path: `str`, optional
   :param era5_path: Path to the ERA5 reanalysis data files. Default is 'ERA5concatenated'.
   :type era5_path: `str`, optional
   :param scale_factors: Scaling factors for the variables. Default is a dictionary with
                         scaling factors for 'chemra', 'sp', 'ssrd', and 'blh'.
   :type scale_factors: `dict`, optional
   :param stage_2_cutoff: Year after which input files will also be generated for stage 2. Default is 2013.
   :type stage_2_cutoff: `int`, optional
   :param output_dir: Directory inside `inputfiles/` where the output x input file will be saved.
                      Default is `'test_input'`.
   :type output_dir: `str`, optional
   :param write_this_year: Whether to write the data for this year or just return the xarray without writing to file.
                           Default is True.
   :type write_this_year: `bool`, optional
   :param output_format: Whether to save netcdf files ('nc'), numpy arrays ('npy'), or 'both'.
                         Default is 'nc'. Irrelevant if `write_this_year` is False.
   :type output_format: `str`, optional
   :param overwrite: Whether to overwrite existing netcdf files. Default is True.
   :type overwrite: `bool`, optional
   :param \*\*kwargs: Additional keyword arguments (not used).
   :type \*\*kwargs: `dict`, optional

   :returns: **x_data** -- The x input data for the specified year.
   :rtype: `xarray.Dataset`


.. py:function:: fill_w_insitu(xr_dataset, insitu_filepath, var='no2')

   Add stage 2 for the variable in an xarray Dataset using available insitu data.

   Given an xarray Dataset with reanalysis data, duplicate the specified variable and replace values of that duplicated variable when and where there is available insitu data in the provided filepath, to be used for stage 2 of training the unet.

   :param xr_dataset: The dataset containing reanalysis data.
   :type xr_dataset: `xarray.Dataset`
   :param insitu_filepath: Path to the CSV file containing insitu data.
   :type insitu_filepath: `str`
   :param var: The variable to replace in the dataset. Default is 'no2'.
   :type var: `str`, optional

   :returns: The updated dataset with insitu data replacing the specified variable.
   :rtype: `xarray.Dataset`


.. py:function:: make_all_y_input_files(years=range(2005, 2021), var='nox', output_dir='test_input', sort=True, **kwargs)

   Create y input files for multiple years.

   Runs the `make_y_input_file` function for each year in the specified range.

   :param years: Years for which to create y input files. Default is range(2005, 2021).
   :type years: iterable, optional
   :param var: Variable to extract from the dataset. Default is 'nox'.
   :type var: str, optional
   :param output_dir: Directory inside `inputfiles/` where the output y input files will be saved.
                      Default is `'test_input'`.
   :type output_dir: str, optional
   :param sort: Whether to sort the xarray after making all y inputs. Sorting takes a long time.
                Default is True.
   :type sort: bool, optional
   :param \*\*kwargs: Additional keyword arguments to pass to the `make_y_input_file` function.
   :type \*\*kwargs: dict, optional

   :returns: * **y_data_array** (*list of numpy.ndarray*)
             * *List of y input data arrays for the specified years.*


.. py:function:: make_all_x_input_files(years=range(2005, 2021), stage_2=True, stage_2_cutoff=2013, output_dir='test_input', sort=True, **kwargs)

   Create x input files for multiple years and stages.

   Run the `make_x_input_file` function for each year and stage in the specified ranges.

   :param years: Years for which to create x input files. Default is range(2005, 2021).
   :type years: `iterable`, optional
   :param stage_2: Whether or not to make stage 2 in addition to stage 1 for the input.
                   Default is True.
   :type stage_2: `bool`, optional
   :param stage_2_cutoff: Year after which the data will also be saved in stage 2. Default is 2013.
   :type stage_2_cutoff: `int`, optional
   :param output_dir: Directory inside `inputfiles/` where the output x input files will be saved.
                      Default is `'test_input'`.
   :type output_dir: `str`, optional
   :param sort: Whether to sort the xarray after making all x inputs. Sorting takes a long time.
                Default is True.
   :type sort: `bool`, optional
   :param \*\*kwargs: Additional keyword arguments to pass to the `make_x_input_file` function.
   :type \*\*kwargs: `dict`, optional

   :returns: **x_data_array** -- List of x input data arrays for the specified years and stages.
   :rtype: `list` of `xarray.Dataset`


.. py:function:: make_all_input_files(output_dir='test_input', sort=True, **kwargs)

   Create all input files for the Unet model.

   This function combines the creation of y input files and x input files for both stages.

   :param output_dir: Directory inside `inputfiles/` where the output input files will be saved.
                      Default is `'test_input'`.
   :type output_dir: `str`, optional
   :param sort: Whether to sort the xarray after making all inputs. Sorting takes a long time.
                Default is True.
   :type sort: `bool`, optional
   :param \*\*kwargs: Additional keyword arguments to pass to the `make_y_input_file` and
                      `make_x_input_file` functions.
   :type \*\*kwargs: `dict`, optional

   :returns: **input_netcdf_xr** -- The combined input data for both x and y.
   :rtype: `xarray.Dataset`


.. py:function:: make_input_metadata_file(input_set, output_dir=None, g_attrs=None, overwrite=True)

   Create a metadata file for the dataset in the given directory.

   Gather the metadata from the given dataset, format it, and output to a clear-text file that can be easily read.

   :param input_set: Directory inside `inputfiles/` where the dataset is found and
                     in which the metadata file will be saved, or the xarray Dataset
   :type input_set: `str`, `xr.Dataset`, `uarray`
   :param output_dir: Directory inside `inputfiles/` where the metadata file will be saved.
                      If None, the metadata file will not be saved. Default is None.
   :type output_dir: `str`, `None`, optional
   :param g_attrs: Global attributes to use for the metadata file.
   :type g_attrs: `dict`, `None`, optional
   :param overwrite: Whether to overwrite an existing metadata file. Default is True.
   :type overwrite: `bool`, optional

   :returns: * **metadata_dict** (`dict`) -- The metadata dictionary that was saved to the json file.
               Has the format:
             * ```json
             * *{* --

               "years": {
                   "x": [
                       2005,
                       ...
                       2020
                   ],
                   "y": [
                       2005,
                       ...
                       2020
                   ]
               },
               "y_var": "nox",
               "emiss_dir": "/data/high_res/t106",
               "emiss_pre": "nox_",
               "emiss_post": "_t106_US.nc",
               "nan_fill": 0,
               "stage_2_cutoff": 2013,
               "x_vars": [
                   "no2",
                   ...
                   "ssrd"
               ],
               "data_dir": "/data/high_res",
               "chemra_path": "emacdonald/unet/datafiles/TROPESS/TROPESS_reanalysis_2hr_no2_sfc_",
               "insitu_path": "US_EPA/NO2/daily_NO2/daily_42602_",
               "era5_path": "ERA5concatenated",
               "stages": [
                   1,
                   2
               ]
             * *}*
             * ```


.. py:function:: make_input_config(config_name, input_set='no2_sample_input', grid_size=[56, 120], x_vars=['no2', 'no2_tm1', 'u10', 'v10', 'blh', 'sp', 'skt', 't2m', 'ssrd'], stage_2=True, stage_2_cutoff=2013, lsm_vars=[], zfi_vars=[], overwrite=False, **kwargs)

   Create an input configuration file.

   Create the input configuration file for using input data with the Unet model.

   :param config_name: Name of the configuration file to be created.
   :type config_name: `str`
   :param input_set: Directory inside `inputfiles/` where the dataset is found, or the xarray Dataset.
                     Default is 'no2_sample_input'.
   :type input_set: `str` or `xr.Dataset`, optional
   :param grid_size: The number of grid cells to have in [latitude, longitude] when running the Unet model.
                     Default is [56, 120].
   :type grid_size: `list` of `int`, optional
   :param x_vars: List of variable names to be used as input features for the model.
                  Default is a list of common meteorological and chemical variables.
   :type x_vars: `list` of `str`, optional
   :param stage_2: Whether or not stage 2 should be run with the Unet model.
                   Default is True.
   :type stage_2: `bool`, optional
   :param stage_2_cutoff: Year after which stage 2 data will be used.
                          Default is 2013.
   :type stage_2_cutoff: `int`, optional
   :param lsm_vars: List of variable names that should use land-sea mask.
                    Default is ['no2', 'no2_tm1'].
   :type lsm_vars: `list` of `str`, optional
   :param zfi_vars: List of variable names that should use zero-fill mask.
                    Default is ['t2m'].
   :type zfi_vars: `list` of `str`, optional
   :param \*\*kwargs:
   :type \*\*kwargs: `dict`, optional

   :returns: **config_dict** -- The configuration dictionary that was saved to the json file.
   :rtype: `dict`


.. py:function:: copy_input_files(source_input_set, output_dir, keep_vars='all', start_date=None, end_date=None, overwrite=True, **kwargs)

   Copy an input set to a new location.

   Create a copy of the input netCDF and `input_metadata.json` file from the specified source in a new directory, optionally filtering the netCDF to only include specified variables and date range.

   :param source_input_set: Name of the source input set located in `inputfiles/`.
   :type source_input_set: `str`
   :param output_dir: Name of the output directory inside `inputfiles/` where the new input set will be copied to.
   :type output_dir: `str`
   :param keep_vars: List of variable names to keep in the copied netCDF. If `all`, all variables are kept.
                     Default is `all`.
   :type keep_vars: `list` of `str` or `all`, optional
   :param start_date: Date from which to start the copied data. If None, the start date equals that of the original file.
                      Expected format is 'YYYY-MM-DDTHH:MM:SS' or 'YYYY-MM-DD'.
                      Default is None.
   :type start_date: `str`, `None`, or `np.datetime64`, optional
   :param end_date: Date at which to end the copied data. If None, the end date equals that of the original file.
                    Expected format is 'YYYY-MM-DDTHH:MM:SS' or 'YYYY-MM-DD'.
                    Default is None.
   :type end_date: `str`, `None`, or `np.datetime64`, optional
   :param \*\*kwargs:
   :type \*\*kwargs: `dict`, optional

   :returns: **new_xr_dataset** -- The copied and filtered xarray Dataset that is saved to the new location.
   :rtype: `xarray.Dataset`