tstoolbox.tstoolbox.plot

tstoolbox.tstoolbox.plot(input_ts='-', columns=None, start_date=None, end_date=None, clean=False, skiprows=None, index_type='datetime', names=None, ofilename='plot.png', type='time', xtitle='', ytitle='', title='', figsize='10, 6.0', legend=None, legend_names=None, subplots=False, sharex=True, sharey=False, colors='auto', linestyles='auto', markerstyles=' ', style='auto', logx=False, logy=False, xaxis='arithmetic', yaxis='arithmetic', xlim=None, ylim=None, secondary_y=False, mark_right=True, scatter_matrix_diagonal='kde', bootstrap_size=50, bootstrap_samples=500, norm_xaxis=False, norm_yaxis=False, lognorm_xaxis=False, lognorm_yaxis=False, xy_match_line='', grid=False, label_rotation=None, label_skip=1, force_freq=None, drawstyle='default', por=False, invert_xaxis=False, invert_yaxis=False, round_index=None, plotting_position='weibull', prob_plot_sort_values='descending', source_units=None, target_units=None, lag_plot_lag=1)

Plot data.

Parameters
  • input_ts (str) –

    [optional, required if using Python API, default is ‘-‘ (stdin)]

    Whether from a file or standard input, data requires a header of column names. The default header is the first line of the input, but this can be changed using the ‘skiprows’ option.

    Most separators will be automatically detected. Most common date formats can be used, but the closer to ISO 8601 date/time standard the better.

    Command line:

    +-------------------------+------------------------+
    | --input_ts=filename.csv | to read 'filename.csv' |
    +-------------------------+------------------------+
    | --input_ts='-'          | to read from standard  |
    |                         | input (stdin).         |
    +-------------------------+------------------------+
    
    In many cases it is better to use redirection rather that use
    `--input_ts=filename.csv`.  The following are identical:
    
    From a file:
    
        command subcmd --input_ts=filename.csv
    
    From standard input:
    
        command subcmd --input_ts=- < filename.csv
    
    The BEST way since you don't have to include `--input_ts=-` because
    that is the default:
    
        command subcmd < filename.csv
    
    Can also combine commands by piping:
    
        command subcmd < filename.csv | command subcmd1 > fileout.csv
    

    As Python Library:

    You MUST use the `input_ts=...` option where `input_ts` can be one
    of a [pandas DataFrame, pandas Series, dict, tuple,
    list, StringIO, or file name].
    
    If result is a time series, returns a pandas DataFrame.
    

  • ofilename (str) –

    [optional, defaults to ‘plot.png’]

    Output filename for the plot. Extension defines the type, for example ‘filename.png’ will create a PNG file.

    If used within the Python API, if ofilename is None will return the Matplotlib figure that can then be changed or added to as needed.

  • type (str) –

    [optional, defaults to ‘time’]

    The plot type.

    Can be one of the following:

    time

    Standard time series plot. Time is the index, and plots each column of data.

    xy

    An (x,y) plot, also know as a scatter plot. Data must be organized as x1,y1,x2,y2,x3,y3,….

    double_mass

    An (x,y) plot of the cumulative sum of x and y. Data must be organized as x1,y1,x2,y2,x3,y3,….

    boxplot

    Box extends from lower to upper quartile, with line at the median. Depending on the statistics, the wiskers represent the range of the data or 1.5 times the inter-quartile range (Q3 - Q1). Data should be organized as y1,y2,y3,….

    scatter_matrix

    Plots all columns against each other in a matrix, with the diagonal plots either histogram or KDE probability distribution depending on scatter_matrix_diagonal keyword.

    lag_plot

    Indicates structure in the data. Only available for a single time-series.

    autocorrelation

    Plot autocorrelation. Only available for a single time-series.

    bootstrap

    Visually assess aspects of a data set by plotting random selections of values. Only available for a single time-series.

    histogram

    Calculate and create a histogram plot. See ‘kde’ for a smooth representation of a histogram.

    kde

    This plot is an estimation of the probability density function based on the data called kernel density estimation (KDE).

    kde_time

    This plot is an estimation of the probability density function based on the data called kernel density estimation (KDE) combined with a time-series plot.

    bar

    Column plot.

    barh

    A horizontal bar plot.

    bar_stacked

    A stacked column plot.

    barh_stacked

    A horizontal stacked bar plot.

    heatmap

    Create a 2D heatmap of daily data, day of year x-axis, and year for y-axis. Only available for a single, daily time-series.

    norm_xaxis

    Sort, calculate probabilities, and plot data against an x axis normal distribution.

    norm_yaxis

    Sort, calculate probabilities, and plot data against an y axis normal distribution.

    lognorm_xaxis

    Sort, calculate probabilities, and plot data against an x axis lognormal distribution.

    lognorm_yaxis

    Sort, calculate probabilities, and plot data against an y axis lognormal distribution.

    weibull_xaxis

    Sort, calculate and plot data against an x axis weibull distribution.

    weibull_yaxis

    Sort, calculate and plot data against an y axis weibull distribution.

    taylor

    Creates a taylor diagram that compares three goodness of fit statistics on one plot. The three goodness of fit statistics calculated and displayed are standard deviation, correlation coefficient, and centered root mean square deviation. The data columns have to be organized as ‘observed,simulated1,simulated2,simulated3,…etc.’

    target

    Creates a target diagram that compares three goodness of fit statistics on one plot. The three goodness of fit statistics calculated and displayed are bias, root mean square deviation, and centered root mean square deviation. The data columns have to be organized as ‘observed,simulated1,simulated2,simulated3,…etc.’

  • lag_plot_lag

    [optional, default to 1]

    The lag used if type “lag_plot” is chosen.

  • xtitle (str) –

    [optional, default depends on type]

    Title of x-axis.

  • ytitle (str) –

    [optional, default depends on type]

    Title of y-axis.

  • title (str) –

    [optional, defaults to ‘’]

    Title of chart.

  • figsize (str) –

    [optional, defaults to ‘10,6.5’]

    The ‘width,height’ of plot in inches.

  • legend

    [optional, defaults to True]

    Whether to display the legend.

  • legend_names (str) –

    [optional, defaults to None]

    Legend would normally use the time-series names associated with the input data. The ‘legend_names’ option allows you to override the names in the data set. You must supply a comma separated list of strings for each time-series in the data set.

  • subplots

    [optional, defaults to False]

    Make separate subplots for each time series.

  • sharex

    [optional, default to True]

    In case subplots=True, share x axis.

  • sharey

    [optional, default to False]

    In case subplots=True, share y axis.

  • colors

    [optional, default is ‘auto’]

    The default ‘auto’ will cycle through matplotlib colors. Otherwise at command line supply a comma separated matplotlib color codes, or for the Python API a list of color code strings.

    Separated ‘colors’, ‘linestyles’, and ‘markerstyles’ instead of using the ‘style’ keyword.

    Code

    Color

    b

    blue

    g

    green

    r

    red

    c

    cyan

    m

    magenta

    y

    yellow

    k

    black

    Number

    Color

    0.75

    0.75 gray

    …etc.

    HTML Color Names

    red

    burlywood

    chartreuse

    …etc.

    Color reference: http://matplotlib.org/api/colors_api.html

  • linestyles

    [optional, default to ‘auto’]

    If ‘auto’ will iterate through the available matplotlib line types. Otherwise on the command line a comma separated list, or a list of strings if using the Python API.

    To not display lines use a space (‘ ‘) as the linestyle code.

    Separated ‘colors’, ‘linestyles’, and ‘markerstyles’ instead of using the ‘style’ keyword.

    Code

    Lines

    -

    solid

    dashed

    -.

    dash_dot

    :

    dotted

    None

    draw nothing

    ’ ‘

    draw nothing

    ’‘

    draw nothing

    Line reference: http://matplotlib.org/api/artist_api.html

  • markerstyles

    [optional, default to ‘ ‘]

    The default ‘ ‘ will not plot a marker. If ‘auto’ will iterate through the available matplotlib marker types. Otherwise on the command line a comma separated list, or a list of strings if using the Python API.

    Separated ‘colors’, ‘linestyles’, and ‘markerstyles’ instead of using the ‘style’ keyword.

    Code

    Markers

    .

    point

    o

    circle

    v

    triangle down

    ^

    triangle up

    <

    triangle left

    >

    triangle right

    1

    tri_down

    2

    tri_up

    3

    tri_left

    4

    tri_right

    8

    octagon

    s

    square

    p

    pentagon

    *

    star

    h

    hexagon1

    H

    hexagon2

    +

    plus

    x

    x

    D

    diamond

    d

    thin diamond

    _

    hline

    None

    nothing

    ’ ‘

    nothing

    ’‘

    nothing

    Marker reference: http://matplotlib.org/api/markers_api.html

  • style

    [optional, default is None]

    Still available, but if None is replaced by ‘colors’, ‘linestyles’, and ‘markerstyles’ options. Currently the ‘style’ option will override the others.

    Comma separated matplotlib style strings per time-series. Just combine codes in ‘ColorMarkerLine’ order, for example ‘r*–’ is a red dashed line with star marker.

  • logx – DEPRECATED: use ‘–xaxis=”log”’ instead.

  • logy – DEPRECATED: use ‘–yaxis=”log”’ instead.

  • xlim

    [optional, default is based on range of x values]

    Comma separated lower and upper limits for the x-axis of the plot. For example, ‘–xlim 1,1000’ would limit the plot from 1 to 1000, where ‘–xlim ,1000’ would base the lower limit on the data and set the upper limit to 1000.

  • ylim

    [optional, default is based on range of y values]

    Comma separated lower and upper limits for the y-axis of the plot. See xlim for examples.

  • xaxis (str) –

    [optional, default is ‘arithmetic’]

    Defines the type of the xaxis. One of ‘arithmetic’, ‘log’.

  • yaxis (str) –

    [optional, default is ‘arithmetic’]

    Defines the type of the yaxis. One of ‘arithmetic’, ‘log’.

  • secondary_y

    [optional, default is False]

    Whether to plot on the secondary y-axis. If a list/tuple, which time-series to plot on secondary y-axis.

  • mark_right

    [optional, default is True]

    When using a secondary_y axis, should the legend label the axis of the various time-series automatically.

  • scatter_matrix_diagonal (str) –

    [optional, defaults to ‘kde’]

    If plot type is ‘scatter_matrix’, this specifies the plot along the diagonal. One of ‘kde’ for Kernel Density Estimation or ‘hist’ for a histogram.

  • bootstrap_size (int) –

    [optional, defaults to 50]

    The size of the random subset for ‘bootstrap’ plot.

  • bootstrap_samples

    [optional, defaults to 500]

    The number of random subsets of ‘bootstrap_size’.

  • norm_xaxis – DEPRECATED: use ‘–type=”norm_xaxis”’ instead.

  • norm_yaxis – DEPRECATED: use ‘–type=”norm_yaxis”’ instead.

  • lognorm_xaxis – DEPRECATED: use ‘–type=”lognorm_xaxis”’ instead.

  • lognorm_yaxis – DEPRECATED: use ‘–type=”lognorm_yaxis”’ instead.

  • xy_match_line (str) –

    [optional, defaults is ‘’]

    Will add a match line where x == y. Set to a line style code.

  • grid

    [optional, default is False]

    Whether to plot grid lines on the major ticks.

  • label_rotation (int) –

    [optional]

    Rotation for major labels for bar plots.

  • label_skip (int) –

    [optional]

    Skip for major labels for bar plots.

  • drawstyle (str) –

    [optional, default is ‘default’]

    ’default’ connects the points with lines. The steps variants produce step-plots. ‘steps’ is equivalent to ‘steps-pre’ and is maintained for backward-compatibility.

    ACCEPTS:

    ['default' | 'steps' | 'steps-pre' | 'steps-mid' | 'steps-post']
    

  • por

    [optional]

    Plot from first good value to last good value. Strips NANs from beginning and end.

  • force_freq (str) –

    [optional, output format]

    Force this frequency for the files. Typically you will only want to enforce a smaller interval where tstoolbox will insert missing values as needed. WARNING: you may lose data if not careful with this option. In general, letting the algorithm determine the frequency should always work, but this option will override. Use PANDAS offset codes.

  • invert_xaxis

    [optional, default is False]

    Invert the x-axis.

  • invert_yaxis

    [optional, default is False]

    Invert the y-axis.

  • plotting_position (str) –

    [optional, default is ‘weibull’]

    Name

    a

    Equation (1-a)/(n+1-2*a)

    Description

    weibull

    0

    i/(n+1)

    mean of sampling distribution (default)

    benard and bos- levenbach

    0.3

    (i-0.3)/(n+0.4)

    approx. median of sampling distribution

    tukey

    1/3

    (i-1/3)/(n+1/3)

    approx. median of sampling distribution

    gumbel

    1

    (i-1)/(n-1)

    mode of sampling distribution

    hazen

    1/2

    (i-1/2)/n

    midpoints of n equal intervals

    cunnane

    2/5

    (i-2/5)/(n+1/5)

    subjective

    california

    NA

    i/n

    Where ‘i’ is the sorted rank of the y value, and ‘n’ is the total number of values to be plotted.

    Only used for norm_xaxis, norm_yaxis, lognorm_xaxis, lognorm_yaxis, weibull_xaxis, and weibull_yaxis.

  • prob_plot_sort_values (str) –

    [optional, default is ‘descending’]

    How to sort the values for the probability plots.

    Only used for norm_xaxis, norm_yaxis, lognorm_xaxis, lognorm_yaxis, weibull_xaxis, and weibull_yaxis.

  • columns

    [optional, defaults to all columns, input filter]

    Columns to select out of input. Can use column names from the first line header or column numbers. If using numbers, column number 1 is the first data column. To pick multiple columns; separate by commas with no spaces. As used in tstoolbox pick command.

    This solves a big problem so that you don’t have to create a data set with a certain order, you can rearrange columns when data is read in.

  • start_date (str) –

    [optional, defaults to first date in time-series, input filter]

    The start_date of the series in ISOdatetime format, or ‘None’ for beginning.

  • end_date (str) –

    [optional, defaults to last date in time-series, input filter]

    The end_date of the series in ISOdatetime format, or ‘None’ for end.

  • clean

    [optional, default is False, input filter]

    The ‘clean’ command will repair an index, removing duplicate index values and sorting.

  • skiprows (list-like or integer or callable) –

    [optional, default is None which will infer header from first line, input filter]

    Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.

    If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be

    lambda x: x in [0, 2].

  • index_type (str) –

    [optional, default is ‘datetime’, output format]

    Can be either ‘number’ or ‘datetime’. Use ‘number’ with index values that are Julian dates, or other epoch reference.

  • names

    [optional, default is None, input filter]

    If None, the column names are taken from the first row after ‘skiprows’ from the input dataset.

  • source_units

    [optional, default is None, transformation]

    If unit is specified for the column as the second field of a ‘:’ delimited column name, then the specified units and the ‘source_units’ must match exactly.

    Any unit string compatible with the ‘pint’ library can be used.

  • target_units

    [optional, default is None, transformation]

    The main purpose of this option is to convert units from those specified in the header line of the input into ‘target_units’.

    The units of the input time-series or values are specified as the second field of a ‘:’ delimited name in the header line of the input or in the ‘source_units’ keyword.

    Any unit string compatible with the ‘pint’ library can be used.

    This option will also add the ‘target_units’ string to the column names.

  • round_index

    [optional, default is None which will do nothing to the index, output format]

    Round the index to the nearest time point. Can significantly improve the performance since can cut down on memory and processing requirements, however be cautious about rounding to a very course interval from a small one. This could lead to duplicate values in the index.