data0.dataset ============= .. py:module:: data0.dataset Classes ------- .. autoapisummary:: data0.dataset.uarray Functions --------- .. autoapisummary:: data0.dataset.get_dataset data0.dataset.load_dataset data0.dataset.csv_to_pd data0.dataset.csv_to_xr data0.dataset.get_US_EPA_species_name data0.dataset.get_years data0.dataset.get_metadata data0.dataset.is_ensemble data0.dataset.get_epochs_logs Module Contents --------------- .. py:class:: uarray(dataset, is_input_set=False, is_predict=False, **kwargs) A wrapper class for an xarray Dataset. A class that wraps an xarray Dataset of a format specified by `verify_dataset()`. All method names start with an underscore (_) to avoid conflicts. .. attribute:: name The name of the dataset, matching the string given to load the dataset, if applicable. :type: `str` .. attribute:: xr The xarray dataset. Expected to have lat and lon coordinates, and optionally a time coordinate. :type: `xr.Dataset` or `xr.DataArray` .. attribute:: years A list of unique years present in the time coordinate of the dataset. :type: list of `int` .. attribute:: metadata_file The file path to the metadata file for the dataset, if it is an input or prediction set. :type: `str` .. attribute:: metadata A dictionary of metadata for the dataset, coming from `metadata_file`. :type: `dict` .. attribute:: epochs_logs An xarray Dataset of the epochs logs for the dataset, if it is a prediction set. :type: `xr.Dataset` .. attribute:: is_input_set Whether the dataset is an input set. If this is `True`, then `is_predict` must be `False`. :type: `bool` .. attribute:: is_predict Whether the dataset is a prediction set. If this is `True`, then `is_input_set` must be `False`. :type: `bool` .. attribute:: is_ensemble Whether the dataset has ensemble members. Only applicable for prediction sets. :type: `bool` .. method:: _verify(**kwargs) Verify specified aspects of the dataset using `verify_dataset()`. .. method:: _get_years() Get a list of unique years present in the time coordinate of the dataset using `get_years()`. .. method:: _select_year(year) Select data for the specified year from the dataset. .. method:: _get_metadata() Get the metadata dictionary if the dataset is an input or prediction set using `get_metadata()`. .. method:: _get_epochs_logs() Get the epochs csv logs if the dataset is a prediction set using `get_epochs_logs()`. .. method:: _shift_lons(**kwargs) Shift the longitude coordinates of the dataset using `shift_lon_arr()`. .. py:method:: _verify(**kwargs) .. py:method:: _is_ensemble() .. py:method:: _get_years() .. py:method:: _select_year(year) .. py:method:: _get_metadata() .. py:method:: _get_epochs_logs() .. py:method:: _shift_lons(**kwargs) .. py:function:: get_dataset(dataset, is_input_set=False, is_predict=False, **kwargs) Get the given dataset. :param dataset: The name of the dataset to get. :type dataset: `str`, `uarray`, `xarray.Dataset`, `xarray.DataArray` :param is_input_set: If True, treat the dataset as an input set. :type is_input_set: `bool`, optional :param is_predict: If True, treat the dataset as a model output prediction set. :type is_predict: `bool`, optional :param \*\*kwargs: Additional keyword arguments to pass to `load_dataset()` and `verify_dataset()`. :type \*\*kwargs: keyword arguments :returns: **xr_dataset** -- The loaded and verified xarray dataset. :rtype: `xarray.Dataset` or `xarray.DataArray` .. py:function:: load_dataset(file_path, **kwargs) Load the data from the given filepath into an xarray dataset. Verifies the given filepath, ensures the file contains an applicable format, and loads the data into an xarray dataset. :param file_path: The filepath to the data file to load. :type file_path: `str` :param \*\*kwargs: Additional keyword arguments to pass to `csv_to_xr()` and `verify_dataset()`. :type \*\*kwargs: keyword arguments :returns: **xr_dataset** -- The loaded xarray dataset. :rtype: `xarray.Dataset` or `xarray.DataArray` .. py:function:: csv_to_pd(csv_filepath, is_US_EPA=True, **kwargs) Load a CSV file into a pandas DataFrame. Loads a CSV file into a pandas DataFrame, ensuring that the required columns are present if the file is from the US EPA. :param csv_filepath: The path to the CSV file to load. :type csv_filepath: `str` :param is_US_EPA: If True, verify that the CSV file has the required columns for US EPA data. Defaults to True. :type is_US_EPA: `bool`, optional :param \*\*kwargs: Additional keyword arguments to accommodate wrapper functions. :type \*\*kwargs: keyword arguments :returns: **df** -- The loaded DataFrame. :rtype: `pandas.DataFrame` .. rubric:: Examples >>> df = csv_to_pd('datafiles/US_EPA/daily_42602_2019.csv') >>> df.head() Latitude Longitude Arithmetic Mean Date 2019-01-01 33.553056 -86.815 4.314286 2019-01-08 33.553056 -86.815 6.263636 2019-01-09 33.553056 -86.815 4.957143 2019-01-10 33.553056 -86.815 5.891667 2019-01-11 33.553056 -86.815 14.500000 .. py:function:: csv_to_xr(csv_filepath, is_US_EPA=True, **kwargs) Load a CSV file into an xarray Dataset. Load a CSV file into an xarray Dataset, ensuring that the required columns are present if the file is from the US EPA. :param csv_filepath: The path to the CSV file to load. :type csv_filepath: `str` :param is_US_EPA: If True, verify that the CSV file has the required columns for US EPA data. Defaults to True. :type is_US_EPA: `bool`, optional :param \*\*kwargs: Additional keyword arguments to accommodate wrapper functions. :type \*\*kwargs: keyword arguments :returns: **xr_dataset** -- The loaded Dataset. :rtype: `xarray.Dataset` .. rubric:: Examples >>> xr_dataset = csv_to_xr('datafiles/US_EPA/daily_42602_2019.csv') >>> xr_dataset .. py:function:: get_US_EPA_species_name(ID) Get the US EPA species name from the ID. Map the US EPA species ID to the corresponding species name. :param ID: The US EPA species ID to map. :type ID: `str` :returns: **species_name** -- The corresponding US EPA species name. :rtype: `str` .. rubric:: Examples >>> species_name = get_US_EPA_species_name('42602') 'no2' >>> species_name = get_US_EPA_species_name('42101') 'co' .. py:function:: get_years(dataset) Get the years present in the dataset. Get a list of unique years from the time coordinate of the given dataset. :param dataset: The dataset from which to extract the years. :type dataset: `str`, `uarray`, `xarray.Dataset`, `xarray.DataArray` :returns: **years** -- A list of unique years in the dataset. :rtype: `list` of `int` .. py:function:: get_metadata(this_uarr) Find and load the relevant metadata dictionary for the given uarray. :param this_uarr: The uarray object for which to load the metadata. :type this_uarr: `uarray` :returns: **metadata** -- The metadata dictionary for this uarray. :rtype: `dict` .. py:function:: is_ensemble(dataset, **kwargs) Check whether the given dataset has ensemble members. :param dataset: The name of the dataset to get. :type dataset: `str`, `uarray`, `xarray.Dataset`, `xarray.DataArray` :param \*\*kwargs: Additional keyword arguments to pass to `load_dataset()` and `verify_dataset()`. :type \*\*kwargs: keyword arguments :returns: **is_ensemble** -- Whether the given dataset has ensemble members. :rtype: `bool` .. py:function:: get_epochs_logs(dataset, **kwargs) Find and load the relevant epochs csv logs for the given `uarray`. :param dataset: The `uarray` object for which to load the epochs logs. :type dataset: `uarray` :param \*\*kwargs: Additional keyword arguments to pass to `uarray()`. :type \*\*kwargs: keyword arguments :returns: **epochs_logs** -- The dataset of epochs logs for this `uarray`. :rtype: `xr.Dataset`