tstoolbox.tstoolbox.gof

tstoolbox.tstoolbox.gof(obs_col=1, sim_col=2, stats='default', replace_nan=None, replace_inf=None, remove_neg=False, remove_zero=False, start_date=None, end_date=None, input_ts=None, round_index=None, clean=False, index_type='datetime', source_units=None, target_units=None, kge_sr=1.0, kge09_salpha=1.0, kge12_sgamma=1.0, kge_sbeta=1.0)

Will calculate goodness of fit statistics between two time-series.

The first time series must be the observed, the second the simulated series. You can only give two time-series.

Parameters:
  • obs_col – If integer represents the column number of standard input. Can be If integer represents the column number of standard input. Can be a csv, wdm, hdf or xlsx file following format specified in ‘tstoolbox read …’.

  • sim_col – If integer represents the column number of standard input. Can be a csv, wdm, hdf or xlsx file following format specified in ‘tstoolbox read …’.

  • stats (Union[None, Literal['default', 'all', 'bias', 'pc_bias', 'apc_bias', 'rmsd', 'crmsd', 'corrcoef', 'coefdet', 'murphyss', 'nse', 'kge09', 'kge12', 'index_agreement', 'brierss', 'mae', 'mean', 'stdev', 'acc', 'd1', 'd1_p', 'd', 'dmod', 'drel', 'dr', 'ed', 'g_mean_diff', 'h10_mahe', 'h10_mhe', 'h10_rmshe', 'h1_mahe', 'h1_mhe', 'h1_rmshe', 'h2_mahe', 'h2_mhe', 'h2_rmshe', 'h3_mahe', 'h3_mhe', 'h3_rmshe', 'h4_mahe', 'h4_mhe', 'h4_rmshe', 'h5_mahe', 'h5_mhe', 'h5_rmshe', 'h6_mahe', 'h6_mhe', 'h6_rmshe', 'h7_mahe', 'h7_mhe', 'h7_rmshe', 'h8_mahe', 'h8_mhe', 'h8_rmshe', 'irmse', 'lm_index', 'maape', 'male', 'mapd', 'mape', 'mase', 'mb_r', 'mdae', 'mde', 'mdse', 'mean_var', 'me', 'mle', 'mse', 'msle', 'ned', 'nrmse_iqr', 'nrmse_mean', 'nrmse_range', 'nse_mod', 'nse_rel', 'rmse', 'rmsle', 'sa', 'sc', 'sga', 'sid', 'smape1', 'smape2', 'spearman_r', 've', 'watt_m'], List[Literal['default', 'all', 'bias', 'pc_bias', 'apc_bias', 'rmsd', 'crmsd', 'corrcoef', 'coefdet', 'murphyss', 'nse', 'kge09', 'kge12', 'index_agreement', 'brierss', 'mae', 'mean', 'stdev', 'acc', 'd1', 'd1_p', 'd', 'dmod', 'drel', 'dr', 'ed', 'g_mean_diff', 'h10_mahe', 'h10_mhe', 'h10_rmshe', 'h1_mahe', 'h1_mhe', 'h1_rmshe', 'h2_mahe', 'h2_mhe', 'h2_rmshe', 'h3_mahe', 'h3_mhe', 'h3_rmshe', 'h4_mahe', 'h4_mhe', 'h4_rmshe', 'h5_mahe', 'h5_mhe', 'h5_rmshe', 'h6_mahe', 'h6_mhe', 'h6_rmshe', 'h7_mahe', 'h7_mhe', 'h7_rmshe', 'h8_mahe', 'h8_mhe', 'h8_rmshe', 'irmse', 'lm_index', 'maape', 'male', 'mapd', 'mape', 'mase', 'mb_r', 'mdae', 'mde', 'mdse', 'mean_var', 'me', 'mle', 'mse', 'msle', 'ned', 'nrmse_iqr', 'nrmse_mean', 'nrmse_range', 'nse_mod', 'nse_rel', 'rmse', 'rmsle', 'sa', 'sc', 'sga', 'sid', 'smape1', 'smape2', 'spearman_r', 've', 'watt_m']]]) –

    [optional, Python: list, Command line: comma separated string, default is ‘default’]

    Comma separated list of statistical measures.

    You can select two groups of statistical measures.

    stats

    Description

    default

    A subset of common statistic measures

    all

    All available statistic measures

    The ‘default’ set of statistics are:

    stats

    Description

    me

    Mean error or bias -inf < ME < inf, close to 0 is better

    pc_bias

    Percent Bias -inf < PC_BIAS < inf, close to 0 is better

    apc_bias

    Absolute Percent Bias 0 <= APC_BIAS < inf, close to 0 is better

    rmsd

    Root Mean Square Deviation/Error 0 <= RMSD < inf, smaller is better

    crmsd

    Centered Root Mean Square Deviation/Error

    corrcoef

    Pearson Correlation coefficient (r) -1 <= r <= 1 1 perfect positive correlation 0 complete randomness -1 perfect negative correlation

    coefdet

    Coefficient of determination (r^2) 0 <= r^2 <= 1 1 perfect correlation 0 complete randomness

    murphyss

    Murphy Skill Score

    nse

    Nash-Sutcliffe Efficiency -inf < NSE < 1, larger is better

    kge09

    Kling-Gupta Efficiency, 2009 -inf < KGE09 < 1, larger is better

    kge12

    Kling-Gupta Efficiency, 2012 -inf < KGE12 < 1, larger is better

    index_agreement

    Index of agreement (d) 0 <= d < 1, larger is better

    brierss

    Brier Skill Score

    mae

    Mean Absolute Error 0 <= MAE < 1, larger is better

    mean

    observed mean, simulated mean

    stdev

    observed stdev, simulated stdev

    Additional statistics:

    stats

    Description

    acc

    Anomaly correlation coefficient (ACC) -1 <= r <= 1 1 positive correlation of variation in anomalies 0 complete randomness of variation in anomalies -1 negative correlation of variation in anomalies

    d1

    Index of agreement (d1) 0 <= d1 < 1, larger is better

    d1_p

    Legate-McCabe Index of Agreement 0 <= d1_p < 1, larger is better

    d

    Index of agreement (d) 0 <= d < 1, larger is better

    dmod

    Modified index of agreement (dmod) 0 <= dmod < 1, larger is better

    drel

    Relative index of agreement (drel) 0 <= drel < 1, larger is better

    dr

    Refined index of agreement (dr) -1 <= dr < 1, larger is better

    ed

    Euclidean distance in vector space 0 <= ed < inf, smaller is better

    g_mean_diff

    Geometric mean difference

    h1_mahe

    H1 absolute error

    h1_mhe

    H1 mean error

    h1_rmshe

    H1 root mean square error

    h2_mahe

    H2 mean absolute error

    h2_mhe

    H2 mean error

    h2_rmshe

    H2 root mean square error

    h3_mahe

    H3 mean absolute error

    h3_mhe

    H3 mean error

    h3_rmshe

    H3 root mean square error

    h4_mahe

    H4 mean absolute error

    h4_mhe

    H4 mean error

    h4_rmshe

    H4 root mean square error

    h5_mahe

    H5 mean absolute error

    h5_mhe

    H5 mean error

    h5_rmshe

    H5 root mean square error

    h6_mahe

    H6 mean absolute error

    h6_mhe

    H6 mean error

    h6_rmshe

    H6 root mean square error

    h7_mahe

    H7 mean absolute error

    h7_mhe

    H7 mean error

    h7_rmshe

    H7 root mean square error

    h8_mahe

    H8 mean absolute error

    h8_mhe

    H8 mean error

    h8_rmshe

    H8 root mean square error

    h10_mahe

    H10 mean absolute error

    h10_mhe

    H10 mean error

    h10_rmshe

    H10 root mean square error

    irmse

    Inertial root mean square error (IRMSE) 0 <= irmse < inf, smaller is better

    lm_index

    Legate-McCabe Efficiency Index 0 <= lm_index < 1, larger is better

    maape

    Mean Arctangent Absolute Percentage Error (MAAPE) 0 <= maape < pi/2, smaller is better

    male

    Mean absolute log error 0 <= male < inf, smaller is better

    mapd

    Mean absolute percentage deviation (MAPD)

    mape

    Mean absolute percentage error (MAPE) 0 <= mape < inf, 0 indicates perfect correlation

    mase

    Mean absolute scaled error

    mb_r

    Mielke-Berry R value (MB R) 0 <= mb_r < 1, larger is better

    mdae

    Median absolute error (MdAE) 0 <= mdae < inf, smaller is better

    mde

    Median error (MdE) -inf < mde < inf, closer to zero is better

    mdse

    Median squared error (MdSE) 0 < mde < inf, closer to zero is better

    mean_var

    Mean variance

    me

    Mean error -inf < me < inf, closer to zero is better

    mle

    Mean log error -inf < mle < inf, closer to zero is better

    mse

    Mean squared error 0 <= mse < inf, smaller is better

    msle

    Mean squared log error 0 <= msle < inf, smaller is better

    ned

    Normalized Euclidian distance in vector space 0 <= ned < inf, smaller is better

    nrmse_iqr

    IQR normalized root mean square error 0 <= nrmse_iqr < inf, smaller is better

    nrmse_mean

    Mean normalized root mean square error 0 <= nrmse_mean < inf, smaller is better

    nrmse_range

    Range normalized root mean square error 0 <= nrmse_range < inf, smaller is better

    nse_mod

    Modified Nash-Sutcliffe efficiency (NSE mod) -inf < nse_mod < 1, larger is better

    nse_rel

    Relative Nash-Sutcliffe efficiency (NSE rel) -inf < nse_mod < 1, larger is better

    rmse

    Root mean square error 0 <= rmse < inf, smaller is better

    rmsle

    Root mean square log error 0 <= rmsle < inf, smaller is better

    sa

    Spectral Angle (SA) -pi/2 <= sa < pi/2, closer to 0 is better

    sc

    Spectral Correlation (SC) -pi/2 <= sc < pi/2, closer to 0 is better

    sga

    Spectral Gradient Angle (SGA) -pi/2 <= sga < pi/2, closer to 0 is better

    sid

    Spectral Information Divergence (SID) -pi/2 <= sid < pi/2, closer to 0 is better

    smape1

    Symmetric Mean Absolute Percentage Error (1) (SMAPE1) 0 <= smape1 < 100, smaller is better

    smape2

    Symmetric Mean Absolute Percentage Error (2) (SMAPE2) 0 <= smape2 < 100, smaller is better

    spearman_r

    Spearman rank correlation coefficient -1 <= spearman_r <= 1 1 perfect positive correlation 0 complete randomness -1 perfect negative correlation

    ve

    Volumetric Efficiency (VE) 0 <= ve < 1, smaller is better

    watt_m

    Watterson’s M (M) -1 <= watt_m < 1, larger is better

  • replace_nan (float) – If given, indicates which value to replace NaN values with in the two arrays. If None, when a NaN value is found at the i-th position in the observed OR simulated array, the i-th value of the observed and simulated array are removed before the computation.

  • replace_inf (float) – If given, indicates which value to replace Inf values with in the two arrays. If None, when an inf value is found at the i-th position in the observed OR simulated array, the i-th value of the observed and simulated array are removed before the computation.

  • remove_neg (boolean) – If True, when a negative value is found at the i-th position in the observed OR simulated array, the i-th value of the observed AND simulated array are removed before the computation.

  • remove_zero (boolean) – If true, when a zero value is found at the i-th position in the observed OR simulated array, the i-th value of the observed AND simulated array are removed before the computation.

  • start_date (str) –

    [optional, defaults to first date in time-series, input filter]

    The start_date of the series in ISOdatetime format, or ‘None’ for beginning.

  • end_date (str) –

    [optional, defaults to last date in time-series, input filter]

    The end_date of the series in ISOdatetime format, or ‘None’ for end.

  • round_index

    [optional, default is None which will do nothing to the index, output format]

    Round the index to the nearest time point. Can significantly improve the performance since can cut down on memory and processing requirements, however be cautious about rounding to a very course interval from a small one. This could lead to duplicate values in the index.

  • clean

    [optional, default is False, input filter]

    The ‘clean’ command will repair a input index, removing duplicate index values and sorting.

  • index_type (str) –

    [optional, default is ‘datetime’, output format]

    Can be either ‘number’ or ‘datetime’. Use ‘number’ with index values that are Julian dates, or other epoch reference.

  • source_units (str) –

    [optional, default is None, transformation]

    If unit is specified for the column as the second field of a ‘:’ delimited column name, then the specified units and the ‘source_units’ must match exactly.

    Any unit string compatible with the ‘pint’ library can be used.

  • target_units (str) –

    [optional, default is None, transformation]

    The purpose of this option is to specify target units for unit conversion. The source units are specified in the header line of the input or using the ‘source_units’ keyword.

    The units of the input time-series or values are specified as the second field of a ‘:’ delimited name in the header line of the input or in the ‘source_units’ keyword.

    Any unit string compatible with the ‘pint’ library can be used.

    This option will also add the ‘target_units’ string to the column names.

  • tablefmt (str) –

    [optional, default is ‘csv’, output format]

    The table format. Can be one of ‘csv’, ‘tsv’, ‘plain’, ‘simple’, ‘grid’, ‘pipe’, ‘orgtbl’, ‘rst’, ‘mediawiki’, ‘latex’, ‘latex_raw’ and ‘latex_booktabs’.

  • float_format

    [optional, output format]

    Format for float numbers.

  • kge_sr (float) –

    [optional, defaults to 1.0]

    Scaling factor for kge09 and kge12 correlation.

  • kge09_salpha (float) –

    [optional, defaults to 1.0]

    Scaling factor for kge09 alpha.

  • kge12_sgamma (float) –

    [optional, defaults to 1.0]

    Scaling factor for kge12 beta.

  • kge_sbeta (float) –

    [optional, defaults to 1.0]

    Scaling factor for kge09 and kge12 beta.

  • input_ts (str) –

    [DEPRECATED] [optional, defaults to None]

    The older approach was implicit by using the input_ts and columns to create a two-column dataset where the first column was the observed values and the second column was the simulated values.

    The new approach is to specify the input time series file using obs_col and sim_col parameters.

    OLD WAY:

    ... --input_ts=blah.csv ...
    # Where blah.csv contains two columns, the first
    # is the observed values and the second
    # is the simulated values.
    

    NEW WAY:

    ... --obs_col=blah.csv,2 --sim_col=blah.csv,5 ...
    # Where blah.csv contains at least 5 columns,
    # the second column is the observed values
    # and the fifth column is the simulated values.