unox.data ========= .. py:module:: unox.data Attributes ---------- .. autoapisummary:: unox.data.DEFAULT_LAT_MIN unox.data.DEFAULT_LAT_MAX unox.data.DEFAULT_LON_MIN unox.data.DEFAULT_LON_MAX unox.data.DEFAULT_EXTENT Functions --------- .. autoapisummary:: unox.data.generate_lats_lons unox.data.get_extent unox.data.get_lats_lons unox.data.get_latlon_resolution unox.data.print_latlon_info unox.data.clean_num_list unox.data.verify_lat unox.data.verify_lon unox.data.get_vminmax unox.data.get_max_abs_val unox.data.restrict_domain unox.data.match_domains unox.data.verify_npy unox.data.get_num_from_string unox.data.get_DOY unox.data.increment_month unox.data.get_YMD_from_date unox.data.get_increment_info unox.data.add_amount_to_date Module Contents --------------- .. py:data:: DEFAULT_LAT_MIN :value: 11 .. py:data:: DEFAULT_LAT_MAX :value: 75 .. py:data:: DEFAULT_LON_MIN :value: -175 .. py:data:: DEFAULT_LON_MAX :value: -39 .. py:data:: DEFAULT_EXTENT .. py:function:: generate_lats_lons(dataset='datafiles/sample_data/2019u10.nc', output_dir='datafiles/') Generate latitude and longitude arrays from the given dataset. Create the `lats.npy` and `lons.npy` files from the latitude and longitude values in the given dataset. They were originally generated from the ERA5 concatenated data files created by the `download_era5` and `concatenate` scripts in the `datafiles` directory. :param dataset: The filepath to the dataset or an xarray Dataset object from which to extract latitude and longitude values. :type dataset: `str` or `xarray.Dataset`, optional :param output_dir: The directory in which to save the generated `lats.npy` and `lons.npy` files. :type output_dir: `str`, optional :returns: * **lats** (`numpy.ndarray`) -- The latitude values extracted from the dataset. * **lons** (`numpy.ndarray`) -- The longitude values extracted from the dataset. .. py:function:: get_extent(xr_dataset=None, lats=None, lons=None, shift_lons=False, **kwargs) Get the latitude and longitude extent of the given xarray dataset. Find the maximum and minimum latitude and longitude values in the given dataset. :param xr_dataset: The xarray data of which to find the extent. :type xr_dataset: `xarray.Dataset` or `xarray.DataArray`, optional :param lats: The latitude values to use instead of those in the dataset. :type lats: `numpy.ndarray`, optional :param lons: The longitude values to use instead of those in the dataset. :type lons: `numpy.ndarray`, optional :param shift_lons: If True, shift the longitude values based on the PM_centered kwarg. :type shift_lons: `bool`, optional :param \*\*kwargs: Additional keyword arguments to pass to `verify_dataset()` and `shift_lon_arr()`. :type \*\*kwargs: keyword arguments :returns: **extent** -- A tuple of np.float64 in the form (lat_min, lat_max, lon_min, lon_max). :rtype: `tuple` .. rubric:: Examples >>> nox = xr.open_dataset('datafiles/nox_2019_t106_US.nc') >>> extent = get_extent(nox) (24.112, 58.878, -126.0, -59.625) >>> lats, lons = get_lats_lons(nox) >>> extent = get_extent(lats=lats, lons=lons) (24.112, 58.878, -126.0, -59.625) .. py:function:: get_lats_lons(xr_dataset, **kwargs) Get the latitude and longitude values from the given dataset. Load the latitude and longitude values from the given dataset and return them as numpy arrays. :param xr_dataset: The xarray data to verify. :type xr_dataset: `xarray.Dataset` or `xarray.DataArray` :param \*\*kwargs: Additional keyword arguments to pass to `verify_dataset()`. :type \*\*kwargs: keyword arguments :returns: * **lats** (`numpy.ndarray`) -- Array of latitude values. * **lons** (`numpy.ndarray`) -- Array of longitude values. .. rubric:: Examples >>> lats, lons = get_lats_lons() .. py:function:: get_latlon_resolution(xr_dataset=None, lats=None, lons=None, **kwargs) Get the latitude and longitude resolution of the given dataset. Calculate the resolution of coordinate values in the dataset to find the resolution in latitude and longitude separately. :param xr_dataset: The xarray data of which to find the extent. :type xr_dataset: `xarray.Dataset` or `xarray.DataArray`, optional :param lats: The latitude values to use instead of those in the dataset. :type lats: `numpy.ndarray`, optional :param lons: The longitude values to use instead of those in the dataset. :type lons: `numpy.ndarray`, optional :param \*\*kwargs: Additional keyword arguments to pass to `verify_dataset()` and `get_lats_lons()`. :type \*\*kwargs: keyword arguments :returns: * **lat_res** (`str`) -- The resolution in latitude. * **lon_res** (`str`) -- The resolution in longitude. .. rubric:: Examples >>> nox = xr.open_dataset('datafiles/nox_2019_t106_US.nc') >>> lat_res, lon_res = get_latlon_resolution(nox) (0.25, 0.25) .. py:function:: print_latlon_info(xr_dataset=None, lats=None, lons=None, **kwargs) Print information about the latitude and longitude values. Print the extent and resolution of the latitude and longitude values in the given dataset or arrays. :param xr_dataset: The filepath to, or the xarray data for which to print the latitude and longitude information. :type xr_dataset: `str` or `xarray.Dataset` or `xarray.DataArray`, optional :param lats: The latitude values to use instead of those in the dataset. :type lats: `numpy.ndarray`, optional :param lons: The longitude values to use instead of those in the dataset. :type lons: `numpy.ndarray`, optional :param \*\*kwargs: Additional keyword arguments to pass to `verify_dataset()`, `get_extent()` and `get_latlon_resolution()`. :type \*\*kwargs: keyword arguments .. py:function:: clean_num_list(val_list) Clean the list of values that cannot be converted to a number. For each value in the list, if it cannot be converted to a number, all instances of that value are removed from the list. :param val_list: The list of values to clean. :type val_list: `list` :returns: **return_list** -- The cleaned list of values. :rtype: `list` .. rubric:: Examples >>> val_list = clean_list([1, 2, 3, "4", 5]) [1, 2, 3, 5] >>> val_list = clean_list([1, 2, 3, np.nan, None, np.inf, -np.inf]) [1, 2, 3] .. py:function:: verify_lat(lat_val) Verify that the given latitude value is valid. If the given latitude value is within the range [-90, 90], return that value. Otherwise, raise a ValueError. :param lat_val: The latitude value to verify. :type lat_val: `float` :returns: **lat_val** -- The verified latitude value. :rtype: `float` .. rubric:: Examples >>> lat_val = verify_lat(45.0) 45.0 >>> lat_val = verify_lat(-100.0) ValueError: Latitude value must be in the range [-90, 90]. .. py:function:: verify_lon(lon_val, PM_centered=None) Verify that the given longitude value is valid. If the given longitude value is within the range [-180, 180], return that value. Otherwise, raise a ValueError. :param lon_val: The longitude value to verify. :type lon_val: `float` :param PM_centered: If None, verify that the longitude value is in the range [-180, 360]. If True, verify that the longitude value is in the range [-180, 180]. If False, verify that the longitude value is in the range [0, 360]. :type PM_centered: `bool`, optional :returns: **lon_val** -- The verified longitude value. :rtype: `float` .. rubric:: Examples >>> lon_val = verify_lon(45.0) 45.0 >>> lon_val = verify_lon(-200.0) ValueError: Longitude value must be in the range [-180, 180]. .. py:function:: get_vminmax(arrays) Get the minimum and maximum values across the given arrays. Flatten and concatenate the given arrays and return the minimum and maximum values, ignoring NaN values. :param arrays: The arrays to get the minimum and maximum values from. :type arrays: `list` of `numpy.ndarray` :returns: * **vmin** (`float`) -- The minimum value across the arrays. * **vmax** (`float`) -- The maximum value across the arrays. .. rubric:: Examples >>> arrays = [np.array([1, 2, 3]), np.array([4, 5, 6])] >>> vmin, vmax = get_vminmax(arrays) (1, 6) .. py:function:: get_max_abs_val(val_list) Get the maximum absolute value from the given list. Remove invalid numbers from the given list of values, then take the absolute value of the remaining values, and return the largest. :param val_list: The list of values to get the maximum absolute value from. :type val_list: `list` of numbers or `numpy.ndarray` :returns: **max_abs** -- The maximum absolute value of the given values. :rtype: `float` .. rubric:: Examples >>> max_abs = get_max_abs_val(-11, 6) 6 >>> vmin, vmax = get_vminmax([np.array([1, 2, -3]), np.array([4, 5, -6])]) >>> max_abs = get_max_abs_val(vmin, vmax) 5 .. py:function:: restrict_domain(arrs_to_restrict, lats, lons, restricting_data) Restrict the domain of the given arrays. Restrict the domain of the given arrays to the same extent as that in the restricting data. The values of lats, lons are the latitude and longitude values of the arrays to restrict. :param arrs_to_restrict: The arrays to restrict in latitude and longitude. :type arrs_to_restrict: `list` of `numpy.ndarray` :param lats: The latitude values of the arrays to restrict. :type lats: `numpy.ndarray` :param lons: The longitude values of the arrays to restrict. :type lons: `numpy.ndarray` :param restricting_data: The dataset to restrict the arrays to. :type restricting_data: `xarray.Dataset` or `xarray.DataArray` :returns: * **arrs_to_return** (`list` of `numpy.ndarray`) -- The restricted arrays. * **lat_r** (`numpy.ndarray`) -- The latitude values of the restricting data. * **lon_r** (`numpy.ndarray`) -- The longitude values of the restricting data. .. rubric:: Examples >>> stage1 = np.load(get_pred_data(stage=1, 'HPC_run'='no2_example_run', 'year'=2019)) >>> lats, lons = load_lats_lons() >>> nox = xr.open_dataset('datafiles/nox_2019_t106_US.nc') >>> stage1_restricted = restrict_domain([nox], lats, lons, nox) .. py:function:: match_domains(xr_a, xr_b, require_equal=True, require_len_gt_1=True) Restrict the domain of the given xarray Datasets to match each other. Find the maximum extent covered by both given datasets and restrict both to match. Requires that at least some of the actual latitude and longitude values are present in both datasets. :param xr_a: The first dataset. :type xr_a: `xarray.Dataset` or `xarray.DataArray` :param xr_b: The second dataset. :type xr_b: `xarray.Dataset` or `xarray.DataArray` :param require_equal: Whether to check that the latitude and longitude values in the two datasets are exactly the same after trimming. Default is `True`. :type require_equal: `bool`, optional :param require_len_gt_1: Whether to check to make sure that the trimmed datasets have more than 1 value in each of the lat and lon dimensions, to catch cases where the datasets only overlap at a single point, resulting in either the lat or lon dimension being dropped. Default is `True`. :type require_len_gt_1: `bool`, optional :returns: * **xr_a** (`xarray.Dataset` or `xarray.DataArray`) -- The first dataset, with the latitude and longitude extents trimmed to match `xr_b`. * **xr_b** (`xarray.Dataset` or `xarray.DataArray`) -- The first dataset, with the latitude and longitude extents trimmed to match `xr_a`. .. py:function:: verify_npy(array) Determine if a variable or file holds a valid numpy array. If a numpy array or a path to a file containing a numpy array was passed, return True. Otherwise, raise a TypeError, ValueError or FileNotFoundError. :param array: A numpy array or a path to a file containing a numpy array. :type array: `numpy.array` or `string` :returns: **nparray** -- The array being passed or pointed to as a np.ndarray. :rtype: `np.ndarray` .. rubric:: Examples >>> import numpy as np >>> from tempfile import NamedTemporaryFile >>> arr = np.array([1, 2, 3]) >>> verify_npy(arr) array([1, 2, 3]) >>> with NamedTemporaryFile(suffix=".npy", delete=False) as f: ... np.save(f.name, arr) ... verify_npy(f.name) array([1, 2, 3]) >>> with NamedTemporaryFile(suffix=".txt", mode="w", delete=False) as f: ... _ = f.write("1,2,3\n4,5,6") >>> loaded = verify_npy(f.name) >>> isinstance(loaded, np.ndarray) True .. py:function:: get_num_from_string(str) Extract numbers from a string. If the string contains numbers, return those numbers in a list. Otherwise, raise a ValueError. :param str: The string to extract the number from. :type str: `str` :returns: **nums** -- A list of numbers extracted from the string. :rtype: `list` of `int` or `float` .. rubric:: Examples >>> num = get_num_from_string("There are 42.0 apples and 3 oranges.") [42, 3] >>> num = get_num_from_string("No number here") ValueError: No number found in the string. .. py:function:: get_DOY(date) Get the day of the year from a date. Extract the day of the year from a given date and return it as an integer. :param date: The date to extract the day of the year from. :type date: `np.datetime64` or `str` :returns: **doy** -- The day of the year of the date. :rtype: `int` .. rubric:: Examples >>> get_DOY('2019-12-20') 354 >>> get_DOY(np.datetime64('2020-01-01')) 1 .. py:function:: increment_month(month, increment) Increment the month by a given number of months. Increment the month by the given number of months, wrapping around if the increment goes beyond December (12). :param month: The month to increment (1 for January, 2 for February, ..., 12 for December). :type month: `int` or `str` :param increment: The number of months to increment by. :type increment: `int` or `str` :returns: * **new_month** (`int` or `str`) -- The new month after incrementing. The type will match the type of `month`. * **increment_year** (`bool`) -- Whether the increment caused a year change. True if the month is December and increment > 0. .. rubric:: Examples >>> increment_month(1, 2) 3, False >>> increment_month(11, 3) 2, True >>> increment_month('5', '7') '12', False .. py:function:: get_YMD_from_date(this_date) Get the year, month, and day from a date. Extract the year, month, and day from a given date and return them as integers. :param this_date: The date to extract the year, month, and day from. :type this_date: `np.datetime64` or `str` :returns: * **year** (`int`) -- The year of the date. * **month** (`int`) -- The month of the date. * **day** (`int`) -- The day of the date. .. rubric:: Examples >>> get_YMD_from_date('2019-12-20') (2019, 12, 20) >>> get_YMD_from_date(np.datetime64('2020-01-01')) (2020, 1, 1) .. py:function:: get_increment_info(increment) Get the increment value and unit from a string. Parse a string that represents an increment in the format 'XD', 'XM', or 'XY', where X is an integer and D, M, or Y are the units for days, months, or years respectively. :param increment: The amount of time to add to the date. If a string, it should be in the format 'XD', 'XM', or 'XY' where X is an integer and D, M, or Y are the units for days, months, or years respectively. :type increment: `np.timedelta64` or `str` :returns: * **value** (`int`) -- The numeric value of the increment. * **unit** (`str`) -- The unit of the increment ('D', 'M', or 'Y'). :raises ValueError: If the increment string is not in the expected format. :raises TypeError: If the increment is not a np.timedelta64 or str. .. rubric:: Examples >>> value, unit = get_increment_info('20D') (20, 'D') >>> value, unit = get_increment_info(np.timedelta64(20, 'D')) (20, 'D') >>> value, unit = get_increment_info('3M') (3, 'M') >>> value, unit = get_increment_info(np.timedelta64(2, 'Y')) (2, 'Y') .. py:function:: add_amount_to_date(this_date, increment, keep_within_year=False) Add an amount of time to a date. Add the given amount of time to the given date and return the new date. :param this_date: The date to add the time to. :type this_date: `np.datetime64` or `str` :param increment: The amount of time to add to the date. If a string, it should be in the format 'XD', 'XM', or 'XY' where X is an integer and D, M, or Y are the units for days, months, or years respectively. :type increment: `np.timedelta64` or `str` :param keep_within_year: If True, the new date will be kept within the same year as `this_date`. :type keep_within_year: `bool`, optional :returns: **new_date** -- The new date after adding the time. :rtype: `np.datetime64` or `str` .. rubric:: Examples >>> add_amount_to_date('2019-12-20', '20D') '2020-01-09' >>> add_amount_to_date(np.datetime64('2019-12-25'), np.timedelta64(20, 'D')) np.datetime64('2020-01-14')