Tide Signal Filters

There are three filters for tidal water elevations or current speeds. The “tide_doodson” and “tide_usgs” are kernel convolutions against 30 hour kernels. The “tide_fft” filter is a high pass filter completely damping periods less than 30 hours with a smooth transition to accepting all periods greater than 40 hours.

[27]:
%matplotlib inline
import tstoolbox
[28]:
help(tstoolbox.filter)
Help on function filter in module tstoolbox.functions.filter:

filter(filter_types: Union[Literal['bartlett', 'blackman', 'butterworth', 'fft', 'flat', 'hamming', 'hanning', 'kalman', 'lecolazet1', 'lecolazet2', 'tide_doodson', 'tide_fft', 'tide_usgs'], List[Literal['bartlett', 'blackman', 'butterworth', 'fft', 'flat', 'hamming', 'hanning', 'kalman', 'lecolazet1', 'lecolazet2', 'tide_doodson', 'tide_fft', 'tide_usgs']]], filter_pass: Literal['lowpass', 'highpass', 'bandpass', 'bandstop'], butterworth_order: typing.Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1)])] = 10, lowpass_cutoff: Optional[Annotated[float, Gt(gt=0)]] = None, highpass_cutoff: Optional[Annotated[float, Gt(gt=0)]] = None, window_len: typing.Annotated[int, Gt(gt=0)] = 3, pad_mode: Optional[Literal['edge', 'maximum', 'mean', 'median', 'minimum', 'reflect', 'symmetric', 'wrap']] = 'reflect', input_ts='-', start_date=None, end_date=None, columns=None, dropna='no', skiprows=None, index_type='datetime', names=None, clean=False, print_input=False, source_units=None, target_units=None, round_index=None, float_format='g', tablefmt='csv')
    Apply different filters to the time-series.

    Parameters
    ----------
    filter_types
        One or more of
        bartlett, blackman, butterworth, fft, flat, hamming, hanning, kalman, lecolazet1, lecolazet2, tide_doodson, tide_fft, tide_usgs

        The "fft" and "butterworth" types are configured by cutoff frequencies
        `lowpass_cutoff`, and `highpass_cutoff`, by process defined in
        `filter_pass`. The "fft" is the Fast Fourier Transform filter in the
        frequency domain.

        Doodson filter

        The Doodson X0 filter is a simple filter designed to damp out
        the main tidal frequencies. It takes hourly values, 19 values
        either side of the central one. A weighted average is taken
        with the following weights

        (1010010110201102112 0 2112011020110100101)/30.

        In "Data Analysis and Methods in Oceanography":

        "The cosine-Lanczos filter, the transform filter, and the
        Butterworth filter are often preferred to the Godin filter,
        to earlier Doodson filter, because of their superior ability
        to remove tidal period variability from oceanic signals."
    filter_pass
        OneOf("lowpass", "highpass", "bandpass", "bandstop")
        Indicates what frequencies to block for the "fft" and "butterworth"
        filters.
    butterworth_order
        [optional, default is 10]

        The order of the butterworth filter.
    lowpass_cutoff
        [optional, default is None, used only if `filter` is "fft" or
        "butterworth" and required if `filter_pass` equals "lowpass",
        "bandpass" or "bandstop"]

        The low frequency cutoff when `filter_pass` equals "lowpass",
        "bandpass", or "bandstop".
    highpass_cutoff
        [optional, default is None, used only if `filter` is "fft" or
        "butterworth" and required if `filter_pass` equals "highpass",
        "bandpass" or "bandstop"]

        The high frequency cutoff when `filter_pass` equals "highpass",
        "bandpass", or "bandstop".
    window_len
        [optional, default is 3]

        "flat", "hanning", "hamming", "bartlett", "blackman"
        Time-series is padded by one half the window length on each end.  The
        `window_len` is then used for the length of the convolution kernel.

        "fft"
        Will soften the edges of the "fft" filter in the frequency domain.
        The larger the number the softer the filter edges.  A value of 1
        will have a brick wall step function which may introduce
        frequencies into the filtered output.

        "tide_usgs", "tide_doodson"
        The `window_len` is set to 33 for "tide_usgs" and 39 for "tide_doodson".
    pad_mode
        [optional, default is "reflect"]

        The method used to pad the time-series.  Uses some of the methods in
        numpy.pad.

        The pad methods "edge", "maximum", "mean", "median", "minimum",
        "reflect", "symmetric", "wrap" are available because they require no
        extra arguments.
    input_ts : str
        [optional though required if using within Python, default is '-'
        (stdin)]

        Whether from a file or standard input, data requires a single line
        header of column names.  The default header is the first line of
        the input, but this can be changed for CSV files using the
        'skiprows' option.

        Most common date formats can be used, but the closer to ISO 8601
        date/time standard the better.

        Comma-separated values (CSV) files or tab-separated values (TSV)::

            File separators will be automatically detected.

            Columns can be selected by name or index, where the index for
            data columns starts at 1.

        Command line examples:

            +---------------------------------+---------------------------+
            | Keyword Example                 | Description               |
            +=================================+===========================+
            | --input_ts=fn.csv               | read all columns from     |
            |                                 | 'fn.csv'                  |
            +---------------------------------+---------------------------+
            | --input_ts=fn.csv,2,1           | read data columns 2 and 1 |
            |                                 | from 'fn.csv'             |
            +---------------------------------+---------------------------+
            | --input_ts=fn.csv,2,skiprows=2  | read data column 2 from   |
            |                                 | 'fn.csv', skipping first  |
            |                                 | 2 rows so header is read  |
            |                                 | from third row            |
            +---------------------------------+---------------------------+
            | --input_ts=fn.xlsx,2,Sheet21    | read all data from 2nd    |
            |                                 | sheet all data from       |
            |                                 | "Sheet21" of 'fn.xlsx'    |
            +---------------------------------+---------------------------+
            | --input_ts=fn.hdf5,Table12,T2   | read all data from table  |
            |                                 | "Table12" then all data   |
            |                                 | from table "T2" of        |
            |                                 | 'fn.hdf5'                 |
            +---------------------------------+---------------------------+
            | --input_ts=fn.wdm,210,110       | read DSNs 210, then 110   |
            |                                 | from 'fn.wdm'             |
            +---------------------------------+---------------------------+
            | --input_ts='-'                  | read all columns from     |
            |                                 | standard input (stdin)    |
            +---------------------------------+---------------------------+
            | --input_ts='-' --columns=4,1    | read column 4 and 1 from  |
            |                                 | standard input (stdin)    |
            +---------------------------------+---------------------------+

        If working with CSV or TSV files you can use redirection rather
        than use `--input_ts=fname.csv`.  The following are identical:

        From a file:

            command subcmd --input_ts=fname.csv

        From standard input (since '--input_ts=-' is the default:

            command subcmd < fname.csv

        Can also combine commands by piping:

            command subcmd < filein.csv | command subcmd1 > fileout.csv

        Python library examples::

            You must use the `input_ts=...` option where `input_ts` can be
            one of a [pandas DataFrame, pandas Series, dict, tuple, list,
            StringIO, or file name].
    start_date : str
        [optional, defaults to first date in time-series, input filter]

        The start_date of the series in ISOdatetime format, or 'None' for
        beginning.
    end_date : str
        [optional, defaults to last date in time-series, input filter]

        The end_date of the series in ISOdatetime format, or 'None' for
        end.
    columns
        [optional, defaults to all columns, input filter]

        Columns to select out of input.  Can use column names from the
        first line header or column numbers.  If using numbers, column
        number 1 is the first data column.  To pick multiple columns;
        separate by commas with no spaces. As used in `toolbox_utils pick`
        command.

        This solves a big problem so that you don't have to create a data
        set with a certain column order, you can rearrange columns when
        data is read in.
    dropna : str
        [optional, defauls it 'no', input filter]

        Set `dropna` to 'any' to have records dropped that have NA value in
        any column, or 'all' to have records dropped that have NA in all
        columns. Set to 'no' to not drop any records.  The default is 'no'.
    skiprows: list-like or integer or callable
        [optional, default is None which will infer header from first line,
        input filter]

        Line numbers to skip (0-indexed) if a list or number of lines to
        skip at the start of the file if an integer.

        If used in Python can be a callable, the callable function will be
        evaluated against the row indices, returning True if the row should
        be skipped and False otherwise.  An example of a valid callable
        argument would be

        ``lambda x: x in [0, 2]``.
    index_type : str
        [optional, default is 'datetime', output format]

        Can be either 'number' or 'datetime'.  Use 'number' with index
        values that are Julian dates, or other epoch reference.
    names: str
        [optional, default is None, transformation]

        If None, the column names are taken from the first row after
        'skiprows' from the input dataset.

        MUST include a name for all columns in the input dataset, including
        the index column.
    clean
        [optional, default is False, input filter]

        The 'clean' command will repair a input index, removing duplicate
        index values and sorting.
    print_input
        [optional, default is False, output format]

        If set to 'True' will include the input columns in the output
        table.
    source_units: str
        [optional, default is None, transformation]

        If unit is specified for the column as the second field of a ':'
        delimited column name, then the specified units and the
        'source_units' must match exactly.

        Any unit string compatible with the 'pint' library can be used.
    target_units: str
        [optional, default is None, transformation]

        The purpose of this option is to specify target units for unit
        conversion.  The source units are specified in the header line of
        the input or using the 'source_units' keyword.

        The units of the input time-series or values are specified as the
        second field of a ':' delimited name in the header line of the
        input or in the 'source_units' keyword.

        Any unit string compatible with the 'pint' library can be used.

        This option will also add the 'target_units' string to the
        column names.
    round_index
        [optional, default is None which will do nothing to the index,
        output format]

        Round the index to the nearest time point.  Can significantly
        improve the performance since can cut down on memory and processing
        requirements, however be cautious about rounding to a very course
        interval from a small one.  This could lead to duplicate values in
        the index.
    float_format
        [optional, output format]

        Format for float numbers.
    tablefmt : str
        [optional, default is 'csv', output format]

        The table format.  Can be one of 'csv', 'tsv', 'plain', 'simple',
        'grid', 'pipe', 'orgtbl', 'rst', 'mediawiki', 'latex', 'latex_raw'
        and 'latex_booktabs'.

[29]:
tdf = tstoolbox.filter(
    ["tide_doodson", "tide_usgs", "tide_fft"],
    "lowpass",
    input_ts="data_mayport_8720220_water_level.csv,1",
    print_input=True,
)
[30]:
tdf.plot()
[30]:
<Axes: xlabel='Datetime'>
../../_images/tstoolbox_notebooks_tide_signal_filters_4_1.png