mettoolbox.disaggregate.humidity

mettoolbox.disaggregate.humidity(method, source_units, input_ts='-', columns=None, start_date=None, end_date=None, dropna='no', clean=False, round_index=None, skiprows=None, index_type='datetime', names=None, target_units=None, print_input=False, hum_min_col=None, hum_max_col=None, hum_mean_col=None, temp_min_col=None, temp_max_col=None, precip_col=None, a0=None, a1=None, kr=None, hourly_temp=None, hourly_precip_hum=None, preserve_daily_mean=None)

Disaggregate daily relative humidity to hourly humidity.

Relative humidity disaggregation requires the following input data.

Input data

Description

hum_min_col

Required column name or number representing the minimum daily relative humidity.

hum_max_col

Required column name or number representing the maximum daily relative humidity.

hum_mean_col

Optional column name or number representing the average daily relative humidity. Default is None and if None will be calculated as average of hum_min_col and hum_max_col.

temp_min_col

Required column name or number representing the minimum daily temperature for minimal, dewpoint regression, linear dewpoint variation, and min_max methods.

temp_max_col

Required column name or number representing the maximum daily temperature for min_max method.

precip_col

Required column name or number representing the total precipitation for month_hour_precip_mean method.

Parameters:
  • method (str) –

    Available disaggregation methods for humidity.

    method

    Description

    equal

    Duplicate mean daily humidity for the 24 hours of the day.

    minimal

    The dew point temperature is set to the minimum temperature on that day.

    dewpoint_regression

    Using hourly observations, a regression approach is applied to calculate daily dew point temperature. Regression parameters must be specified.

    linear_dewpoint_variation

    This method extends through linearly varying dew point temperature between consecutive days. The parameter kr needs to be specified (kr=6 if monthly radiation exceeds 100 W/m2 else kr=12).

    min_max

    This method requires minimum and maximum relative humidity for each day.

    month_hour_precip_mean

    Calculate hourly humidity from categorical [month, hour, precip(y/n)] mean values derived from observations.

    Required keywords for each method. The “Column Name/Index Keywords” represent the column name or index (data columns starting numbering at 1) in the input dataset.

    method

    Column Name/ Index Keywords

    Other Keywords

    equal

    hum_mean_col

    minimal

    temp_min_col

    hourly_temp

    dewpoint_regression

    temp_min_col

    a0 a1 hourly_temp

    linear_dewpoint_variation

    temp_min_col

    a0 a1 kr hourly_temp

    min_max

    hum_min_col hum_max_col temp_min_col temp_max_col

    hourly_temp

    month_hour_precip_mean

    precip_col

  • source_units (str) –

    If unit is specified for the column as the second field of a ‘:’ delimited column name, then the specified units and the ‘source_units’ must match exactly.

    Any unit string compatible with the ‘pint’ library can be used.

  • input_ts (str) –

    [optional though required if using within Python, default is ‘-’ (stdin)]

    Whether from a file or standard input, data requires a single line header of column names. The default header is the first line of the input, but this can be changed for CSV files using the ‘skiprows’ option.

    Most common date formats can be used, but the closer to ISO 8601 date/time standard the better.

    Comma-separated values (CSV) files or tab-separated values (TSV):

    File separators will be automatically detected.
    
    Columns can be selected by name or index, where the index for
    data columns starts at 1.
    

    Command line examples:

    Keyword Example

    Description

    –input_ts=fn.csv

    read all columns from ‘fn.csv’

    –input_ts=fn.csv,2,1

    read data columns 2 and 1 from ‘fn.csv’

    –input_ts=fn.csv,2,skiprows=2

    read data column 2 from ‘fn.csv’, skipping first 2 rows so header is read from third row

    –input_ts=fn.xlsx,2,Sheet21

    read all data from 2nd sheet all data from “Sheet21” of ‘fn.xlsx’

    –input_ts=fn.hdf5,Table12,T2

    read all data from table “Table12” then all data from table “T2” of ‘fn.hdf5’

    –input_ts=fn.wdm,210,110

    read DSNs 210, then 110 from ‘fn.wdm’

    –input_ts=’-’

    read all columns from standard input (stdin)

    –input_ts=’-’ –columns=4,1

    read column 4 and 1 from standard input (stdin)

    If working with CSV or TSV files you can use redirection rather than use –input_ts=fname.csv. The following are identical:

    From a file:

    command subcmd –input_ts=fname.csv

    From standard input (since ‘–input_ts=-’ is the default:

    command subcmd < fname.csv

    Can also combine commands by piping:

    command subcmd < filein.csv | command subcmd1 > fileout.csv

    Python library examples:

    You must use the `input_ts=...` option where `input_ts` can be
    one of a [pandas DataFrame, pandas Series, dict, tuple, list,
    StringIO, or file name].
    

  • columns

    [optional, defaults to all columns, input filter]

    Columns to select out of input. Can use column names from the first line header or column numbers. If using numbers, column number 1 is the first data column. To pick multiple columns; separate by commas with no spaces. As used in toolbox_utils pick command.

    This solves a big problem so that you don’t have to create a data set with a certain column order, you can rearrange columns when data is read in.

  • start_date (str) –

    [optional, defaults to first date in time-series, input filter]

    The start_date of the series in ISOdatetime format, or ‘None’ for beginning.

  • end_date (str) –

    [optional, defaults to last date in time-series, input filter]

    The end_date of the series in ISOdatetime format, or ‘None’ for end.

  • dropna (str) –

    [optional, defauls it ‘no’, input filter]

    Set dropna to ‘any’ to have records dropped that have NA value in any column, or ‘all’ to have records dropped that have NA in all columns. Set to ‘no’ to not drop any records. The default is ‘no’.

  • clean

    [optional, default is False, input filter]

    The ‘clean’ command will repair a input index, removing duplicate index values and sorting.

  • round_index

    [optional, default is None which will do nothing to the index, output format]

    Round the index to the nearest time point. Can significantly improve the performance since can cut down on memory and processing requirements, however be cautious about rounding to a very course interval from a small one. This could lead to duplicate values in the index.

  • skiprows (list-like or integer or callable) –

    [optional, default is None which will infer header from first line, input filter]

    Line numbers to skip (0-indexed) if a list or number of lines to skip at the start of the file if an integer.

    If used in Python can be a callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be

    lambda x: x in [0, 2].

  • index_type (str) –

    [optional, default is ‘datetime’, output format]

    Can be either ‘number’ or ‘datetime’. Use ‘number’ with index values that are Julian dates, or other epoch reference.

  • names (str) –

    [optional, default is None, transformation]

    If None, the column names are taken from the first row after ‘skiprows’ from the input dataset.

    MUST include a name for all columns in the input dataset, including the index column.

  • target_units (str) –

    [optional, default is None, transformation]

    The purpose of this option is to specify target units for unit conversion. The source units are specified in the header line of the input or using the ‘source_units’ keyword.

    The units of the input time-series or values are specified as the second field of a ‘:’ delimited name in the header line of the input or in the ‘source_units’ keyword.

    Any unit string compatible with the ‘pint’ library can be used.

    This option will also add the ‘target_units’ string to the column names.

  • print_input

    [optional, default is False, output format]

    If set to ‘True’ will include the input columns in the output table.

  • tablefmt (str) –

    [optional, default is ‘csv’, output format]

    The table format. Can be one of ‘csv’, ‘tsv’, ‘plain’, ‘simple’, ‘grid’, ‘pipe’, ‘orgtbl’, ‘rst’, ‘mediawiki’, ‘latex’, ‘latex_raw’ and ‘latex_booktabs’.

  • precip_col (Union[int, str, Series, None]) – Column index (data columns start numbering at 1) or column name from the input data that contains the daily precipitation.

  • temp_min_col (Union[int, str, Series, None]) – Column index (data columns start numbering at 1) or column name from the input data that contains the daily minimum temperature.

  • temp_max_col (Union[int, str, Series, None]) – Column index (data columns start numbering at 1) or column name from the input data that contains the daily maximum temperature.

  • hum_min_col (Union[int, str, Series, None]) – Column index (data columns start numbering at 1) or column name from the input data that contains the daily minimum humidity.

  • hum_max_col (Union[int, str, Series, None]) – Column index (data columns start numbering at 1) or column name from the input data that contains the daily maximum humidity.

  • hum_mean_col (Union[int, str, Series, None]) – Column index (data columns start numbering at 1) or column name from the input data that contains the daily maximum humidity.

  • a0 (float) – The “a0” parameter.

  • a1 (float) – The “a1” parameter.

  • kr (int) – Parameter for the “linear_dewpoint_variation” method.

  • hourly_temp (str) – Filename of a CSV file that contains an hourly time series of temperatures.

  • hourly_precip_hum (str) – Filename of a CSV file that contains an hourly time series of precipitation and humidity.

  • preserve_daily_mean (str) – Column name or index (data columns start at 1) that identifies the observed daily mean humidity. If not None will correct the daily mean values of the disaggregated data with the observed daily mean humidity.