tstoolbox.tstoolbox.plot¶

tstoolbox.tstoolbox.
plot
(input_ts='', columns=None, start_date=None, end_date=None, clean=False, skiprows=None, index_type='datetime', names=None, ofilename='plot.png', type='time', xtitle='', ytitle='', title='', figsize='10, 6.0', legend=None, legend_names=None, subplots=False, sharex=True, sharey=False, colors='auto', linestyles='auto', markerstyles=' ', style='auto', logx=False, logy=False, xaxis='arithmetic', yaxis='arithmetic', xlim=None, ylim=None, secondary_y=False, mark_right=True, scatter_matrix_diagonal='kde', bootstrap_size=50, bootstrap_samples=500, norm_xaxis=False, norm_yaxis=False, lognorm_xaxis=False, lognorm_yaxis=False, xy_match_line='', grid=False, label_rotation=None, label_skip=1, force_freq=None, drawstyle='default', por=False, invert_xaxis=False, invert_yaxis=False, round_index=None, plotting_position='weibull', prob_plot_sort_values='descending', source_units=None, target_units=None, lag_plot_lag=1)¶ Plot data.
 Parameters
input_ts (str) –
[optional, required if using Python API, default is ‘‘ (stdin)]
Whether from a file or standard input, data requires a header of column names. The default header is the first line of the input, but this can be changed using the ‘skiprows’ option.
Most separators will be automatically detected. Most common date formats can be used, but the closer to ISO 8601 date/time standard the better.
Command line:
+++  input_ts=filename.csv  to read 'filename.csv'  +++  input_ts=''  to read from standard    input (stdin).  +++ In many cases it is better to use redirection rather that use `input_ts=filename.csv`. The following are identical: From a file: command subcmd input_ts=filename.csv From standard input: command subcmd input_ts= < filename.csv The BEST way since you don't have to include `input_ts=` because that is the default: command subcmd < filename.csv Can also combine commands by piping: command subcmd < filename.csv  command subcmd1 > fileout.csv
As Python Library:
You MUST use the `input_ts=...` option where `input_ts` can be one of a [pandas DataFrame, pandas Series, dict, tuple, list, StringIO, or file name]. If result is a time series, returns a pandas DataFrame.
ofilename (str) –
[optional, defaults to ‘plot.png’]
Output filename for the plot. Extension defines the type, for example ‘filename.png’ will create a PNG file.
If used within the Python API, if ofilename is None will return the Matplotlib figure that can then be changed or added to as needed.
type (str) –
[optional, defaults to ‘time’]
The plot type.
Can be one of the following:
 time
Standard time series plot. Time is the index, and plots each column of data.
 xy
An (x,y) plot, also know as a scatter plot. Data must be organized as x1,y1,x2,y2,x3,y3,….
 double_mass
An (x,y) plot of the cumulative sum of x and y. Data must be organized as x1,y1,x2,y2,x3,y3,….
 boxplot
Box extends from lower to upper quartile, with line at the median. Depending on the statistics, the wiskers represent the range of the data or 1.5 times the interquartile range (Q3  Q1). Data should be organized as y1,y2,y3,….
 scatter_matrix
Plots all columns against each other in a matrix, with the diagonal plots either histogram or KDE probability distribution depending on scatter_matrix_diagonal keyword.
 lag_plot
Indicates structure in the data. Only available for a single timeseries.
 autocorrelation
Plot autocorrelation. Only available for a single timeseries.
 bootstrap
Visually assess aspects of a data set by plotting random selections of values. Only available for a single timeseries.
 histogram
Calculate and create a histogram plot. See ‘kde’ for a smooth representation of a histogram.
 kde
This plot is an estimation of the probability density function based on the data called kernel density estimation (KDE).
 kde_time
This plot is an estimation of the probability density function based on the data called kernel density estimation (KDE) combined with a timeseries plot.
 bar
Column plot.
 barh
A horizontal bar plot.
 bar_stacked
A stacked column plot.
 barh_stacked
A horizontal stacked bar plot.
 heatmap
Create a 2D heatmap of daily data, day of year xaxis, and year for yaxis. Only available for a single, daily timeseries.
 norm_xaxis
Sort, calculate probabilities, and plot data against an x axis normal distribution.
 norm_yaxis
Sort, calculate probabilities, and plot data against an y axis normal distribution.
 lognorm_xaxis
Sort, calculate probabilities, and plot data against an x axis lognormal distribution.
 lognorm_yaxis
Sort, calculate probabilities, and plot data against an y axis lognormal distribution.
 weibull_xaxis
Sort, calculate and plot data against an x axis weibull distribution.
 weibull_yaxis
Sort, calculate and plot data against an y axis weibull distribution.
 taylor
Creates a taylor diagram that compares three goodness of fit statistics on one plot. The three goodness of fit statistics calculated and displayed are standard deviation, correlation coefficient, and centered root mean square deviation. The data columns have to be organized as ‘observed,simulated1,simulated2,simulated3,…etc.’
 target
Creates a target diagram that compares three goodness of fit statistics on one plot. The three goodness of fit statistics calculated and displayed are bias, root mean square deviation, and centered root mean square deviation. The data columns have to be organized as ‘observed,simulated1,simulated2,simulated3,…etc.’
lag_plot_lag –
[optional, default to 1]
The lag used if
type
“lag_plot” is chosen.xtitle (str) –
[optional, default depends on
type
]Title of xaxis.
ytitle (str) –
[optional, default depends on
type
]Title of yaxis.
title (str) –
[optional, defaults to ‘’]
Title of chart.
figsize (str) –
[optional, defaults to ‘10,6.5’]
The ‘width,height’ of plot in inches.
legend –
[optional, defaults to True]
Whether to display the legend.
legend_names (str) –
[optional, defaults to None]
Legend would normally use the timeseries names associated with the input data. The ‘legend_names’ option allows you to override the names in the data set. You must supply a comma separated list of strings for each timeseries in the data set.
subplots –
[optional, defaults to False]
Make separate subplots for each time series.
sharex –
[optional, default to True]
In case subplots=True, share x axis.
sharey –
[optional, default to False]
In case subplots=True, share y axis.
colors –
[optional, default is ‘auto’]
The default ‘auto’ will cycle through matplotlib colors. Otherwise at command line supply a comma separated matplotlib color codes, or for the Python API a list of color code strings.
Separated ‘colors’, ‘linestyles’, and ‘markerstyles’ instead of using the ‘style’ keyword.
Code
Color
b
blue
g
green
r
red
c
cyan
m
magenta
y
yellow
k
black
Number
Color
0.75
0.75 gray
…etc.
HTML Color Names
red
burlywood
chartreuse
…etc.
Color reference: http://matplotlib.org/api/colors_api.html
linestyles –
[optional, default to ‘auto’]
If ‘auto’ will iterate through the available matplotlib line types. Otherwise on the command line a comma separated list, or a list of strings if using the Python API.
To not display lines use a space (‘ ‘) as the linestyle code.
Separated ‘colors’, ‘linestyles’, and ‘markerstyles’ instead of using the ‘style’ keyword.
Code
Lines

solid
–
dashed
.
dash_dot
:
dotted
None
draw nothing
’ ‘
draw nothing
’‘
draw nothing
Line reference: http://matplotlib.org/api/artist_api.html
markerstyles –
[optional, default to ‘ ‘]
The default ‘ ‘ will not plot a marker. If ‘auto’ will iterate through the available matplotlib marker types. Otherwise on the command line a comma separated list, or a list of strings if using the Python API.
Separated ‘colors’, ‘linestyles’, and ‘markerstyles’ instead of using the ‘style’ keyword.
Code
Markers
.
point
o
circle
v
triangle down
^
triangle up
<
triangle left
>
triangle right
1
tri_down
2
tri_up
3
tri_left
4
tri_right
8
octagon
s
square
p
pentagon
*
star
h
hexagon1
H
hexagon2
+
plus
x
x
D
diamond
d
thin diamond
_
hline
None
nothing
’ ‘
nothing
’‘
nothing
Marker reference: http://matplotlib.org/api/markers_api.html
style –
[optional, default is None]
Still available, but if None is replaced by ‘colors’, ‘linestyles’, and ‘markerstyles’ options. Currently the ‘style’ option will override the others.
Comma separated matplotlib style strings per timeseries. Just combine codes in ‘ColorMarkerLine’ order, for example ‘r*–’ is a red dashed line with star marker.
logx – DEPRECATED: use ‘–xaxis=”log”’ instead.
logy – DEPRECATED: use ‘–yaxis=”log”’ instead.
xlim –
[optional, default is based on range of x values]
Comma separated lower and upper limits for the xaxis of the plot. For example, ‘–xlim 1,1000’ would limit the plot from 1 to 1000, where ‘–xlim ,1000’ would base the lower limit on the data and set the upper limit to 1000.
ylim –
[optional, default is based on range of y values]
Comma separated lower and upper limits for the yaxis of the plot. See xlim for examples.
xaxis (str) –
[optional, default is ‘arithmetic’]
Defines the type of the xaxis. One of ‘arithmetic’, ‘log’.
yaxis (str) –
[optional, default is ‘arithmetic’]
Defines the type of the yaxis. One of ‘arithmetic’, ‘log’.
secondary_y –
[optional, default is False]
Whether to plot on the secondary yaxis. If a list/tuple, which timeseries to plot on secondary yaxis.
mark_right –
[optional, default is True]
When using a secondary_y axis, should the legend label the axis of the various timeseries automatically.
scatter_matrix_diagonal (str) –
[optional, defaults to ‘kde’]
If plot type is ‘scatter_matrix’, this specifies the plot along the diagonal. One of ‘kde’ for Kernel Density Estimation or ‘hist’ for a histogram.
bootstrap_size (int) –
[optional, defaults to 50]
The size of the random subset for ‘bootstrap’ plot.
bootstrap_samples –
[optional, defaults to 500]
The number of random subsets of ‘bootstrap_size’.
norm_xaxis – DEPRECATED: use ‘–type=”norm_xaxis”’ instead.
norm_yaxis – DEPRECATED: use ‘–type=”norm_yaxis”’ instead.
lognorm_xaxis – DEPRECATED: use ‘–type=”lognorm_xaxis”’ instead.
lognorm_yaxis – DEPRECATED: use ‘–type=”lognorm_yaxis”’ instead.
xy_match_line (str) –
[optional, defaults is ‘’]
Will add a match line where x == y. Set to a line style code.
grid –
[optional, default is False]
Whether to plot grid lines on the major ticks.
label_rotation (int) –
[optional]
Rotation for major labels for bar plots.
label_skip (int) –
[optional]
Skip for major labels for bar plots.
drawstyle (str) –
[optional, default is ‘default’]
’default’ connects the points with lines. The steps variants produce stepplots. ‘steps’ is equivalent to ‘stepspre’ and is maintained for backwardcompatibility.
ACCEPTS:
['default'  'steps'  'stepspre'  'stepsmid'  'stepspost']
por –
[optional]
Plot from first good value to last good value. Strips NANs from beginning and end.
force_freq (str) –
[optional, output format]
Force this frequency for the files. Typically you will only want to enforce a smaller interval where tstoolbox will insert missing values as needed. WARNING: you may lose data if not careful with this option. In general, letting the algorithm determine the frequency should always work, but this option will override. Use PANDAS offset codes.
invert_xaxis –
[optional, default is False]
Invert the xaxis.
invert_yaxis –
[optional, default is False]
Invert the yaxis.
plotting_position (str) –
[optional, default is ‘weibull’]
Name
a
Equation (1a)/(n+12*a)
Description
weibull
0
i/(n+1)
mean of sampling distribution (default)
benard and bos levenbach
0.3
(i0.3)/(n+0.4)
approx. median of sampling distribution
tukey
1/3
(i1/3)/(n+1/3)
approx. median of sampling distribution
gumbel
1
(i1)/(n1)
mode of sampling distribution
hazen
1/2
(i1/2)/n
midpoints of n equal intervals
cunnane
2/5
(i2/5)/(n+1/5)
subjective
california
NA
i/n
Where ‘i’ is the sorted rank of the y value, and ‘n’ is the total number of values to be plotted.
Only used for norm_xaxis, norm_yaxis, lognorm_xaxis, lognorm_yaxis, weibull_xaxis, and weibull_yaxis.
prob_plot_sort_values (str) –
[optional, default is ‘descending’]
How to sort the values for the probability plots.
Only used for norm_xaxis, norm_yaxis, lognorm_xaxis, lognorm_yaxis, weibull_xaxis, and weibull_yaxis.
columns –
[optional, defaults to all columns, input filter]
Columns to select out of input. Can use column names from the first line header or column numbers. If using numbers, column number 1 is the first data column. To pick multiple columns; separate by commas with no spaces. As used in tstoolbox pick command.
This solves a big problem so that you don’t have to create a data set with a certain order, you can rearrange columns when data is read in.
start_date (str) –
[optional, defaults to first date in timeseries, input filter]
The start_date of the series in ISOdatetime format, or ‘None’ for beginning.
end_date (str) –
[optional, defaults to last date in timeseries, input filter]
The end_date of the series in ISOdatetime format, or ‘None’ for end.
clean –
[optional, default is False, input filter]
The ‘clean’ command will repair an index, removing duplicate index values and sorting.
skiprows (listlike or integer or callable) –
[optional, default is None which will infer header from first line, input filter]
Line numbers to skip (0indexed) or number of lines to skip (int) at the start of the file.
If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be
lambda x: x in [0, 2]
.index_type (str) –
[optional, default is ‘datetime’, output format]
Can be either ‘number’ or ‘datetime’. Use ‘number’ with index values that are Julian dates, or other epoch reference.
names –
[optional, default is None, input filter]
If None, the column names are taken from the first row after ‘skiprows’ from the input dataset.
source_units –
[optional, default is None, transformation]
If unit is specified for the column as the second field of a ‘:’ delimited column name, then the specified units and the ‘source_units’ must match exactly.
Any unit string compatible with the ‘pint’ library can be used.
target_units –
[optional, default is None, transformation]
The main purpose of this option is to convert units from those specified in the header line of the input into ‘target_units’.
The units of the input timeseries or values are specified as the second field of a ‘:’ delimited name in the header line of the input or in the ‘source_units’ keyword.
Any unit string compatible with the ‘pint’ library can be used.
This option will also add the ‘target_units’ string to the column names.
round_index –
[optional, default is None which will do nothing to the index, output format]
Round the index to the nearest time point. Can significantly improve the performance since can cut down on memory and processing requirements, however be cautious about rounding to a very course interval from a small one. This could lead to duplicate values in the index.