tstoolbox.tstoolbox.gof¶
- tstoolbox.tstoolbox.gof(obs_col=1, sim_col=2, stats='default', replace_nan=None, replace_inf=None, remove_neg=False, remove_zero=False, start_date=None, end_date=None, input_ts=None, round_index=None, clean=False, index_type='datetime', source_units=None, target_units=None, kge_sr=1.0, kge09_salpha=1.0, kge12_sgamma=1.0, kge_sbeta=1.0)¶
Will calculate goodness of fit statistics between two time-series.
The first time series must be the observed, the second the simulated series. You can only give two time-series.
- Parameters:
obs_col – If integer represents the column number of standard input. Can be If integer represents the column number of standard input. Can be a csv, wdm, hdf or xlsx file following format specified in ‘tstoolbox read …’.
sim_col – If integer represents the column number of standard input. Can be a csv, wdm, hdf or xlsx file following format specified in ‘tstoolbox read …’.
stats (
Union[None,Literal['default','all','bias','pc_bias','apc_bias','rmsd','crmsd','corrcoef','coefdet','murphyss','nse','kge09','kge12','index_agreement','brierss','mae','mean','stdev','acc','d1','d1_p','d','dmod','drel','dr','ed','g_mean_diff','h10_mahe','h10_mhe','h10_rmshe','h1_mahe','h1_mhe','h1_rmshe','h2_mahe','h2_mhe','h2_rmshe','h3_mahe','h3_mhe','h3_rmshe','h4_mahe','h4_mhe','h4_rmshe','h5_mahe','h5_mhe','h5_rmshe','h6_mahe','h6_mhe','h6_rmshe','h7_mahe','h7_mhe','h7_rmshe','h8_mahe','h8_mhe','h8_rmshe','irmse','lm_index','maape','male','mapd','mape','mase','mb_r','mdae','mde','mdse','mean_var','me','mle','mse','msle','ned','nrmse_iqr','nrmse_mean','nrmse_range','nse_mod','nse_rel','rmse','rmsle','sa','sc','sga','sid','smape1','smape2','spearman_r','ve','watt_m'],List[Literal['default','all','bias','pc_bias','apc_bias','rmsd','crmsd','corrcoef','coefdet','murphyss','nse','kge09','kge12','index_agreement','brierss','mae','mean','stdev','acc','d1','d1_p','d','dmod','drel','dr','ed','g_mean_diff','h10_mahe','h10_mhe','h10_rmshe','h1_mahe','h1_mhe','h1_rmshe','h2_mahe','h2_mhe','h2_rmshe','h3_mahe','h3_mhe','h3_rmshe','h4_mahe','h4_mhe','h4_rmshe','h5_mahe','h5_mhe','h5_rmshe','h6_mahe','h6_mhe','h6_rmshe','h7_mahe','h7_mhe','h7_rmshe','h8_mahe','h8_mhe','h8_rmshe','irmse','lm_index','maape','male','mapd','mape','mase','mb_r','mdae','mde','mdse','mean_var','me','mle','mse','msle','ned','nrmse_iqr','nrmse_mean','nrmse_range','nse_mod','nse_rel','rmse','rmsle','sa','sc','sga','sid','smape1','smape2','spearman_r','ve','watt_m']]]) –[optional, Python: list, Command line: comma separated string, default is ‘default’]
Comma separated list of statistical measures.
You can select two groups of statistical measures.
stats
Description
default
A subset of common statistic measures
all
All available statistic measures
The ‘default’ set of statistics are:
stats
Description
me
Mean error or bias -inf < ME < inf, close to 0 is better
pc_bias
Percent Bias -inf < PC_BIAS < inf, close to 0 is better
apc_bias
Absolute Percent Bias 0 <= APC_BIAS < inf, close to 0 is better
rmsd
Root Mean Square Deviation/Error 0 <= RMSD < inf, smaller is better
crmsd
Centered Root Mean Square Deviation/Error
corrcoef
Pearson Correlation coefficient (r) -1 <= r <= 1 1 perfect positive correlation 0 complete randomness -1 perfect negative correlation
coefdet
Coefficient of determination (r^2) 0 <= r^2 <= 1 1 perfect correlation 0 complete randomness
murphyss
Murphy Skill Score
nse
Nash-Sutcliffe Efficiency -inf < NSE < 1, larger is better
kge09
Kling-Gupta Efficiency, 2009 -inf < KGE09 < 1, larger is better
kge12
Kling-Gupta Efficiency, 2012 -inf < KGE12 < 1, larger is better
index_agreement
Index of agreement (d) 0 <= d < 1, larger is better
brierss
Brier Skill Score
mae
Mean Absolute Error 0 <= MAE < 1, larger is better
mean
observed mean, simulated mean
stdev
observed stdev, simulated stdev
Additional statistics:
stats
Description
acc
Anomaly correlation coefficient (ACC) -1 <= r <= 1 1 positive correlation of variation in anomalies 0 complete randomness of variation in anomalies -1 negative correlation of variation in anomalies
d1
Index of agreement (d1) 0 <= d1 < 1, larger is better
d1_p
Legate-McCabe Index of Agreement 0 <= d1_p < 1, larger is better
d
Index of agreement (d) 0 <= d < 1, larger is better
dmod
Modified index of agreement (dmod) 0 <= dmod < 1, larger is better
drel
Relative index of agreement (drel) 0 <= drel < 1, larger is better
dr
Refined index of agreement (dr) -1 <= dr < 1, larger is better
ed
Euclidean distance in vector space 0 <= ed < inf, smaller is better
g_mean_diff
Geometric mean difference
h1_mahe
H1 absolute error
h1_mhe
H1 mean error
h1_rmshe
H1 root mean square error
h2_mahe
H2 mean absolute error
h2_mhe
H2 mean error
h2_rmshe
H2 root mean square error
h3_mahe
H3 mean absolute error
h3_mhe
H3 mean error
h3_rmshe
H3 root mean square error
h4_mahe
H4 mean absolute error
h4_mhe
H4 mean error
h4_rmshe
H4 root mean square error
h5_mahe
H5 mean absolute error
h5_mhe
H5 mean error
h5_rmshe
H5 root mean square error
h6_mahe
H6 mean absolute error
h6_mhe
H6 mean error
h6_rmshe
H6 root mean square error
h7_mahe
H7 mean absolute error
h7_mhe
H7 mean error
h7_rmshe
H7 root mean square error
h8_mahe
H8 mean absolute error
h8_mhe
H8 mean error
h8_rmshe
H8 root mean square error
h10_mahe
H10 mean absolute error
h10_mhe
H10 mean error
h10_rmshe
H10 root mean square error
irmse
Inertial root mean square error (IRMSE) 0 <= irmse < inf, smaller is better
lm_index
Legate-McCabe Efficiency Index 0 <= lm_index < 1, larger is better
maape
Mean Arctangent Absolute Percentage Error (MAAPE) 0 <= maape < pi/2, smaller is better
male
Mean absolute log error 0 <= male < inf, smaller is better
mapd
Mean absolute percentage deviation (MAPD)
mape
Mean absolute percentage error (MAPE) 0 <= mape < inf, 0 indicates perfect correlation
mase
Mean absolute scaled error
mb_r
Mielke-Berry R value (MB R) 0 <= mb_r < 1, larger is better
mdae
Median absolute error (MdAE) 0 <= mdae < inf, smaller is better
mde
Median error (MdE) -inf < mde < inf, closer to zero is better
mdse
Median squared error (MdSE) 0 < mde < inf, closer to zero is better
mean_var
Mean variance
me
Mean error -inf < me < inf, closer to zero is better
mle
Mean log error -inf < mle < inf, closer to zero is better
mse
Mean squared error 0 <= mse < inf, smaller is better
msle
Mean squared log error 0 <= msle < inf, smaller is better
ned
Normalized Euclidian distance in vector space 0 <= ned < inf, smaller is better
nrmse_iqr
IQR normalized root mean square error 0 <= nrmse_iqr < inf, smaller is better
nrmse_mean
Mean normalized root mean square error 0 <= nrmse_mean < inf, smaller is better
nrmse_range
Range normalized root mean square error 0 <= nrmse_range < inf, smaller is better
nse_mod
Modified Nash-Sutcliffe efficiency (NSE mod) -inf < nse_mod < 1, larger is better
nse_rel
Relative Nash-Sutcliffe efficiency (NSE rel) -inf < nse_mod < 1, larger is better
rmse
Root mean square error 0 <= rmse < inf, smaller is better
rmsle
Root mean square log error 0 <= rmsle < inf, smaller is better
sa
Spectral Angle (SA) -pi/2 <= sa < pi/2, closer to 0 is better
sc
Spectral Correlation (SC) -pi/2 <= sc < pi/2, closer to 0 is better
sga
Spectral Gradient Angle (SGA) -pi/2 <= sga < pi/2, closer to 0 is better
sid
Spectral Information Divergence (SID) -pi/2 <= sid < pi/2, closer to 0 is better
smape1
Symmetric Mean Absolute Percentage Error (1) (SMAPE1) 0 <= smape1 < 100, smaller is better
smape2
Symmetric Mean Absolute Percentage Error (2) (SMAPE2) 0 <= smape2 < 100, smaller is better
spearman_r
Spearman rank correlation coefficient -1 <= spearman_r <= 1 1 perfect positive correlation 0 complete randomness -1 perfect negative correlation
ve
Volumetric Efficiency (VE) 0 <= ve < 1, smaller is better
watt_m
Watterson’s M (M) -1 <= watt_m < 1, larger is better
replace_nan (float) – If given, indicates which value to replace NaN values with in the two arrays. If None, when a NaN value is found at the i-th position in the observed OR simulated array, the i-th value of the observed and simulated array are removed before the computation.
replace_inf (float) – If given, indicates which value to replace Inf values with in the two arrays. If None, when an inf value is found at the i-th position in the observed OR simulated array, the i-th value of the observed and simulated array are removed before the computation.
remove_neg (boolean) – If True, when a negative value is found at the i-th position in the observed OR simulated array, the i-th value of the observed AND simulated array are removed before the computation.
remove_zero (boolean) – If true, when a zero value is found at the i-th position in the observed OR simulated array, the i-th value of the observed AND simulated array are removed before the computation.
start_date (str) –
[optional, defaults to first date in time-series, input filter]
The start_date of the series in ISOdatetime format, or ‘None’ for beginning.
end_date (str) –
[optional, defaults to last date in time-series, input filter]
The end_date of the series in ISOdatetime format, or ‘None’ for end.
round_index –
[optional, default is None which will do nothing to the index, output format]
Round the index to the nearest time point. Can significantly improve the performance since can cut down on memory and processing requirements, however be cautious about rounding to a very course interval from a small one. This could lead to duplicate values in the index.
clean –
[optional, default is False, input filter]
The ‘clean’ command will repair a input index, removing duplicate index values and sorting.
index_type (str) –
[optional, default is ‘datetime’, output format]
Can be either ‘number’ or ‘datetime’. Use ‘number’ with index values that are Julian dates, or other epoch reference.
source_units (str) –
[optional, default is None, transformation]
If unit is specified for the column as the second field of a ‘:’ delimited column name, then the specified units and the ‘source_units’ must match exactly.
Any unit string compatible with the ‘pint’ library can be used.
target_units (str) –
[optional, default is None, transformation]
The purpose of this option is to specify target units for unit conversion. The source units are specified in the header line of the input or using the ‘source_units’ keyword.
The units of the input time-series or values are specified as the second field of a ‘:’ delimited name in the header line of the input or in the ‘source_units’ keyword.
Any unit string compatible with the ‘pint’ library can be used.
This option will also add the ‘target_units’ string to the column names.
tablefmt (str) –
[optional, default is ‘csv’, output format]
The table format. Can be one of ‘csv’, ‘tsv’, ‘plain’, ‘simple’, ‘grid’, ‘pipe’, ‘orgtbl’, ‘rst’, ‘mediawiki’, ‘latex’, ‘latex_raw’ and ‘latex_booktabs’.
float_format –
[optional, output format]
Format for float numbers.
kge_sr (
float) –[optional, defaults to 1.0]
Scaling factor for kge09 and kge12 correlation.
kge09_salpha (
float) –[optional, defaults to 1.0]
Scaling factor for kge09 alpha.
kge12_sgamma (
float) –[optional, defaults to 1.0]
Scaling factor for kge12 beta.
kge_sbeta (
float) –[optional, defaults to 1.0]
Scaling factor for kge09 and kge12 beta.
input_ts (str) –
[DEPRECATED] [optional, defaults to None]
The older approach was implicit by using the input_ts and columns to create a two-column dataset where the first column was the observed values and the second column was the simulated values.
The new approach is to specify the input time series file using obs_col and sim_col parameters.
OLD WAY:
... --input_ts=blah.csv ... # Where blah.csv contains two columns, the first # is the observed values and the second # is the simulated values.
NEW WAY:
... --obs_col=blah.csv,2 --sim_col=blah.csv,5 ... # Where blah.csv contains at least 5 columns, # the second column is the observed values # and the fifth column is the simulated values.