Tests Test Coverage Latest release BSD-3 clause license wdmtoolbox downloads PyPI - Python Version

Usage - Command Line

Just run ‘wdmtoolbox’ to get a list of subcommands

usage: wdmtoolbox [-h] [-v]
                  {copydsnlabel,copydsn,cleancopywdm,renumberdsn,deletedsn,wdmtoswmm5rdii,extract,wdmtostd,describedsn,listdsns,createnewwdm,createnewdsn,hydhrseqtowdm,stdtowdm,csvtowdm,setattrib}
                  ...

positional arguments:
  {copydsnlabel,copydsn,cleancopywdm,renumberdsn,deletedsn,wdmtoswmm5rdii,extract,wdmtostd,describedsn,listdsns,createnewwdm,createnewdsn,hydhrseqtowdm,stdtowdm,csvtowdm,setattrib}
    copydsnlabel        Make a copy of a DSN label (no data).
    copydsn             Make a copy of a DSN.
    cleancopywdm        Make a clean copy of a WDM file.
    renumberdsn         Renumber olddsn to newdsn.
    deletedsn           Delete DSN.
    wdmtoswmm5rdii      Print out DSN data to the screen in SWMM5 RDII format.
    extract             Print out DSN data to the screen with ISO-8601 dates.
    wdmtostd            DEPRECATED: New scripts use 'extract'. Will be removed
                        in the future.
    describedsn         Print out attributes of a single DSN
    listdsns            Print out a table describing all DSNs in the WDM.
    createnewwdm        Create a new WDM file, optional to overwrite.
    createnewdsn        Create a new DSN.
    hydhrseqtowdm       Write HYDHR sequential file to a DSN.
    stdtowdm            DEPRECATED: Use 'csvtowdm'.
    csvtowdm            Write data from a CSV file to a DSN.
    setattrib           Set an attribute value for the DSN. See WDM
                        documentation for full list

options:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

The default for all of the subcommands that accept time-series data is to pull from stdin (typically a pipe or redirection). If a subcommand accepts an input file for an argument, you can use “… –input_ts=input_file_name.csv …”, or redirection “… < input_file_name.csv”.

A WDM file stores time-series asociated with a Data Set Number (DSN). A DSN is a number between 1 and 32000, though HSPF can only use for input and output DSNs below 1000. DSN numbers of 1000 and above should be used for calculation and observed time-series. The DSN must exist before before being used.

Typical usage:

wdmtoolbox createnewwdm met.wdm

wdmtoolbox createnewdsn met.wdm 101 --tcode=3 --constituent=HPCP --tstype=HPCP --location=12345678 --description='NWS STATION 1' --scenario=INPUT

wdmtoolbox csvtowdm met.wdm 1011 < nws_station_1.csv

To look at the DSN table:

wdmtoolbox listdsns met.wdm

You can also use “tsgettoolbox” to populate the DSN with data from various on-line sources. Look at the “tsgettoolbox” documentation at tsgettoolbox for particulars on installation, but it may be as easy as “pip install tsgettoolbox”.

“tsgettoolbox” examples:

# Make a new wdm.
wdmtoolbox createnewwdm obs.wdm

# Create new DSN.
wdmtoolbox createnewdsn obs.wdm 10 --scenario SIMULATE --location 02232000 --constituent FLOW

# Download flow data for USGS station 02232000 and pipe into DSN.
# The --startDT option is required otherwise only the latest value is
# returned.
tsgettoolbox nwis --sites=02232000 --parameterCd=00060 --startDT 2000-01-01 | wdmtoolbox csvtowdm obs.wdm 10

# List DSNs.
wdmtoolbox listdsns obs.wdm

# Plot the flow data in DSN 10.
wdmtoolbox extract obs.wdm 10 | tstoolbox plot

Sub-command Detail

cleancopywdm

usage: wdmtoolbox cleancopywdm [-h] [--overwrite] inwdmpath outwdmpath

Make a clean copy of a WDM file.

positional arguments:
  inwdmpath    Path and WDM filename of the input
    WDM file.

  outwdmpath   Path and WDM filename of the output
    WDM file.


options:
  -h | --help
      show this help message and exit
  --overwrite
      Whether to overwrite the target DSN if it exists.

copydsn

usage: wdmtoolbox copydsn [-h] [--overwrite] inwdmpath indsn outwdmpath outdsn

Make a copy of a DSN.

positional arguments:
  inwdmpath    Path and WDM filename of the input
    WDM file.

  indsn        Source
    DSN.

  outwdmpath   Path and WDM filename of the output
    WDM file.

  outdsn       Target
    DSN.


options:
  -h | --help
      show this help message and exit
  --overwrite
      Whether to overwrite the target DSN if it exists.

createnewdsn

usage: wdmtoolbox createnewdsn [-h] [--tstype TSTYPE] [--base_year BASE_YEAR]
  [--tcode TCODE] [--tsstep TSSTEP] [--statid STATID] [--scenario SCENARIO]
  [--location LOCATION] [--description DESCRIPTION] [--constituent
  CONSTITUENT] [--tsfill TSFILL] wdmpath dsn

Create a new DSN.

positional arguments:
  wdmpath               Path and WDM
    filename.

  dsn                   The Data Set Number (DSN) for the time series in the WDM file.
    This number must be greater or equal to 1 and less than or equal to 32000.
    HSPF can only use for input or output DSNs of 1 to 9999, inclusive.


options:
  -h | --help
      show this help message and exit
  --tstype TSTYPE
      [optional, default to first 4 characters of 'constituent']
      Time series type. Can be any 4 character string, but if not specified
      defaults to first 4 characters of 'constituent'. Must match what is
      used in HSPF UCI file.
      Limited to 4 characters.
  --base_year BASE_YEAR
      [optional, defaults to 1900]
      Base year of time series. The DSN will not accept any time-series before
      this date and with the default settings of TGROUP=6 (i.e. yearly)
      would allow time-series up to 2199.
  --tcode TCODE
      [optional, defaults to 4=daily time series]
      Time series code, (1=second, 2=minute, 3=hour, 4=day, 5=month, 6=year)
  --tsstep TSSTEP
      [optional, defaults to 1]
      Time series steps, defaults (and almost always is) 1.
  --statid STATID
      [optional, defaults to '']
      The station name, limited to 16 characters.
  --scenario SCENARIO
      [optional defaults to '']
      The name of the scenario. Can be anything, but typically, 'OBSERVED' for
      calibration and input time-series and 'SIMULATE' for model results.
      Limited to 8 characters.
  --location LOCATION
      [optional defaults to '']
      The location name.
      Limited to 8 characters.
  --description DESCRIPTION
      [optional, defaults to '']
      Descriptive text.
      Limited to 48 characters.
  --constituent CONSTITUENT
      [optional, defaults to '']
      The constituent that the time series represents.
      Limited to 8 characters.
  --tsfill TSFILL
      [optional, defaults to -999]
      A time-series in a WDM file must have a value for every time interval. The
      "tsfill" number is used as a placeholder for missing values.
      Change to a number that is guaranteed to not be a valid number in your
      time-series.

createnewwdm

usage: wdmtoolbox createnewwdm [-h] [--overwrite] wdmpath

Create a new WDM file, optional to overwrite.

positional arguments:
  wdmpath      Path and WDM
    filename.


options:
  -h | --help
      show this help message and exit
  --overwrite
      Whether to overwrite the target DSN if it exists.

csvtowdm

usage: wdmtoolbox csvtowdm [-h] [--start_date START_DATE]
  [--end_date END_DATE] [--columns COLUMNS] [--force_freq FORCE_FREQ] [--groupby
  GROUPBY] [--round_index ROUND_INDEX] [--clean] [--target_units TARGET_UNITS]
  [--source_units SOURCE_UNITS] [--input_ts INPUT_TS] wdmpath dsn

File can have comma separated 'year', 'month', 'day', 'hour', 'minute',
'second', 'value' OR 'date/time string', 'value'

positional arguments:
  wdmpath               Path and WDM
    filename.

  dsn                   The Data Set Number (DSN) for the time series in the WDM file.
    This number must be greater or equal to 1 and less than or equal to 32000.
    HSPF can only use for input or output DSNs of 1 to 9999, inclusive.


options:
  -h | --help
      show this help message and exit
  --start_date START_DATE
      [optional, defaults to first date in time-series, input filter]
      The start_date of the series in ISOdatetime format, or 'None' for
      beginning.
  --end_date END_DATE
      [optional, defaults to last date in time-series, input filter]
      The end_date of the series in ISOdatetime format, or 'None' for end.
  --columns COLUMNS
      [optional, defaults to all columns, input filter]
      Columns to select out of input. Can use column names from the first line
      header or column numbers. If using numbers, column number 1 is the
      first data column. To pick multiple columns; separate by commas with
      no spaces. As used in toolbox_utils pick command.
      This solves a big problem so that you don't have to create a data set with
      a certain column order, you can rearrange columns when data is read
      in.
  --force_freq FORCE_FREQ
      [optional, output format]
      Force this frequency for the output. Typically you will only want to
      enforce a smaller interval where toolbox_utils will insert missing
      values as needed. WARNING: you may lose data if not careful with
      this option. In general, letting the algorithm determine the
      frequency should always work, but this option will override. Use
      PANDAS offset codes.
  --groupby GROUPBY
      [optional, default is None, transformation]
      The pandas offset code to group the time-series data into. A special code
      is also available to group 'months_across_years' that will group
      into twelve monthly categories across the entire time-series.
  --round_index ROUND_INDEX
      [optional, default is None which will do nothing to the index, output
      format]
      Round the index to the nearest time point. Can significantly improve the
      performance since can cut down on memory and processing
      requirements, however be cautious about rounding to a very course
      interval from a small one. This could lead to duplicate values in
      the index.
  --clean
      [optional, default is False, input filter]
      The 'clean' command will repair a input index, removing duplicate index
      values and sorting.
  --target_units TARGET_UNITS
      [optional, default is None, transformation]
      The purpose of this option is to specify target units for unit conversion.
      The source units are specified in the header line of the input or
      using the 'source_units' keyword.
      The units of the input time-series or values are specified as the second
      field of a ':' delimited name in the header line of the input or in
      the 'source_units' keyword.
      Any unit string compatible with the 'pint' library can be used.
      This option will also add the 'target_units' string to the column names.
  --source_units SOURCE_UNITS
      [optional, default is None, transformation]
      If unit is specified for the column as the second field of a ':' delimited
      column name, then the specified units and the 'source_units' must
      match exactly.
      Any unit string compatible with the 'pint' library can be used.
  --input_ts INPUT_TS
      [optional though required if using within Python, default is '-' (stdin)]
      Whether from a file or standard input, data requires a single line header
      of column names. The default header is the first line of the input,
      but this can be changed for CSV files using the 'skiprows' option.
      Most common date formats can be used, but the closer to ISO 8601 date/time
      standard the better.
      Comma-separated values (CSV) files or tab-separated values (TSV):
      File separators will be automatically detected.
      
      Columns can be selected by name or index, where the index for
      data columns starts at 1.

      Command line examples:
        ┌─────────────────────────────────┬───────────────────────────┐
        │ Keyword Example                 │ Description               │
        ╞═════════════════════════════════╪═══════════════════════════╡
        │ --input_ts=fn.csv               │ read all columns from     │
        │                                 │ 'fn.csv'                  │
        ├─────────────────────────────────┼───────────────────────────┤
        │ --input_ts=fn.csv,2,1           │ read data columns 2 and 1 │
        │                                 │ from 'fn.csv'             │
        ├─────────────────────────────────┼───────────────────────────┤
        │ --input_ts=fn.csv,2,skiprows=2  │ read data column 2 from   │
        │                                 │ 'fn.csv', skipping first  │
        │                                 │ 2 rows so header is read  │
        │                                 │ from third row            │
        ├─────────────────────────────────┼───────────────────────────┤
        │ --input_ts=fn.xlsx,2,Sheet21    │ read all data from 2nd    │
        │                                 │ sheet all data from       │
        │                                 │ "Sheet21" of 'fn.xlsx'    │
        ├─────────────────────────────────┼───────────────────────────┤
        │ --input_ts=fn.hdf5,Table12,T2   │ read all data from table  │
        │                                 │ "Table12" then all data   │
        │                                 │ from table "T2" of        │
        │                                 │ 'fn.hdf5'                 │
        ├─────────────────────────────────┼───────────────────────────┤
        │ --input_ts=fn.wdm,210,110       │ read DSNs 210, then 110   │
        │                                 │ from 'fn.wdm'             │
        ├─────────────────────────────────┼───────────────────────────┤
        │ --input_ts='-'                  │ read all columns from     │
        │                                 │ standard input (stdin)    │
        ├─────────────────────────────────┼───────────────────────────┤
        │ --input_ts='-' --columns=4,1    │ read column 4 and 1 from  │
        │                                 │ standard input (stdin)    │
        ╘═════════════════════════════════╧═══════════════════════════╛

      If working with CSV or TSV files you can use redirection rather than use
      --input_ts=fname.csv. The following are identical:
      From a file:
        command subcmd --input_ts=fname.csv
      From standard input (since '--input_ts=-' is the default:
        command subcmd < fname.csv
      Can also combine commands by piping:
        command subcmd < filein.csv | command subcmd1 > fileout.csv
      Python library examples:
      You must use the `input_ts=...` option where `input_ts` can be
      one of a [pandas DataFrame, pandas Series, dict, tuple, list,
      StringIO, or file name].

deletedsn

usage: wdmtoolbox deletedsn [-h] wdmpath dsn

Delete DSN.

positional arguments:
  wdmpath     Path and WDM
    filename.

  dsn         DSN to
    delete.


options:
  -h | --help
      show this help message and exit

describedsn

usage: wdmtoolbox describedsn [-h] [--attrs ATTRS] [--tablefmt TABLEFMT]
  wdmpath dsn

Print out attributes of a single DSN

positional arguments:
  wdmpath              Path and WDM
    filename.

  dsn                  The Data Set Number (DSN) for the time series in the WDM file.
    This number must be greater or equal to 1 and less than or equal to 32000.
    HSPF can only use for input or output DSNs of 1 to 9999, inclusive.


options:
  -h | --help
      show this help message and exit
  --attrs ATTRS
      [optional, default to "default"]
      Attributes to retrieve from the DSN.
      ┌────────────────────┬─────────────────────────────────────────────┐
      │ attrs              │ Attributes Retrieved                        │
      ╞════════════════════╪═════════════════════════════════════════════╡
      │ default            │ DSN, TSSTEP, TCODE, TSFILL, IDLOCN, IDSCEN, │
      │                    │ IDCONS, TSBYR, STANAM, TSTYPE               │
      ├────────────────────┼─────────────────────────────────────────────┤
      │ all                │ All attributes set of the 450 total         │
      ├────────────────────┼─────────────────────────────────────────────┤
      │ comma separated    │ Specific attributes named in the list       │
      │ list of attribute  │                                             │
      ╘═names══════════════╧═════════════════════════════════════════════╛

  --tablefmt TABLEFMT
      [optional, default is 'csv', output format]
      The table format. Can be one of 'csv', 'tsv', 'plain', 'simple', 'grid',
      'pipe', 'orgtbl', 'rst', 'mediawiki', 'latex', 'latex_raw' and
      'latex_booktabs'.

hydhrseqtowdm

usage: wdmtoolbox hydhrseqtowdm [-h] [--input_ts INPUT_TS]
  [--start_century START_CENTURY] wdmpath dsn

Write HYDHR sequential file to a DSN.

positional arguments:
  wdmpath               Path and WDM
    filename.

  dsn                   The Data Set Number (DSN) for the time series in the WDM file.
    This number must be greater or equal to 1 and less than or equal to 32000.
    HSPF can only use for input or output DSNs of 1 to 9999, inclusive.


options:
  -h | --help
      show this help message and exit
  --input_ts INPUT_TS
      [optional though required if using within Python, default is '-' (stdin)]
      Whether from a file or standard input, data requires a single line header
      of column names. The default header is the first line of the input,
      but this can be changed for CSV files using the 'skiprows' option.
      Most common date formats can be used, but the closer to ISO 8601 date/time
      standard the better.
      Comma-separated values (CSV) files or tab-separated values (TSV):
      File separators will be automatically detected.
      
      Columns can be selected by name or index, where the index for
      data columns starts at 1.

      Command line examples:
        ┌─────────────────────────────────┬───────────────────────────┐
        │ Keyword Example                 │ Description               │
        ╞═════════════════════════════════╪═══════════════════════════╡
        │ --input_ts=fn.csv               │ read all columns from     │
        │                                 │ 'fn.csv'                  │
        ├─────────────────────────────────┼───────────────────────────┤
        │ --input_ts=fn.csv,2,1           │ read data columns 2 and 1 │
        │                                 │ from 'fn.csv'             │
        ├─────────────────────────────────┼───────────────────────────┤
        │ --input_ts=fn.csv,2,skiprows=2  │ read data column 2 from   │
        │                                 │ 'fn.csv', skipping first  │
        │                                 │ 2 rows so header is read  │
        │                                 │ from third row            │
        ├─────────────────────────────────┼───────────────────────────┤
        │ --input_ts=fn.xlsx,2,Sheet21    │ read all data from 2nd    │
        │                                 │ sheet all data from       │
        │                                 │ "Sheet21" of 'fn.xlsx'    │
        ├─────────────────────────────────┼───────────────────────────┤
        │ --input_ts=fn.hdf5,Table12,T2   │ read all data from table  │
        │                                 │ "Table12" then all data   │
        │                                 │ from table "T2" of        │
        │                                 │ 'fn.hdf5'                 │
        ├─────────────────────────────────┼───────────────────────────┤
        │ --input_ts=fn.wdm,210,110       │ read DSNs 210, then 110   │
        │                                 │ from 'fn.wdm'             │
        ├─────────────────────────────────┼───────────────────────────┤
        │ --input_ts='-'                  │ read all columns from     │
        │                                 │ standard input (stdin)    │
        ├─────────────────────────────────┼───────────────────────────┤
        │ --input_ts='-' --columns=4,1    │ read column 4 and 1 from  │
        │                                 │ standard input (stdin)    │
        ╘═════════════════════════════════╧═══════════════════════════╛

      If working with CSV or TSV files you can use redirection rather than use
      --input_ts=fname.csv. The following are identical:
      From a file:
        command subcmd --input_ts=fname.csv
      From standard input (since '--input_ts=-' is the default:
        command subcmd < fname.csv
      Can also combine commands by piping:
        command subcmd < filein.csv | command subcmd1 > fileout.csv
      Python library examples:
      You must use the `input_ts=...` option where `input_ts` can be
      one of a [pandas DataFrame, pandas Series, dict, tuple, list,
      StringIO, or file name].

  --start_century START_CENTURY
      Since 2 digit years are used, need century, defaults to 1900.

listdsns

usage: wdmtoolbox listdsns [-h] wdmpath

Print out a table describing all DSNs in the WDM.

positional arguments:
  wdmpath     Path and WDM
    filename.


options:
  -h | --help
      show this help message and exit

renumberdsn

usage: wdmtoolbox renumberdsn [-h] wdmpath olddsn newdsn

Renumber olddsn to newdsn.

positional arguments:
  wdmpath     Path and WDM
    filename.

  olddsn      Old DSN to
    renumber.

  newdsn      New DSN to change old DSN
    to.


options:
  -h | --help
      show this help message and exit

extract

usage: wdmtoolbox extract [-h] [--start_date START_DATE] [--end_date END_DATE]
  [wdmpath ...]

Print out DSN data to the screen with ISO-8601 dates.

positional arguments:
  wdmpath               Path and WDM
    filename. followed by space separated list of DSNs. For example:
    'file.wdm 234 345 456'
    
    OR
    `wdmpath` can be space separated sets of 'wdmpath,dsn'.
    
    'file.wdm,101 file2.wdm,104 file.wdm,227'



options:
  -h | --help
      show this help message and exit
  --start_date START_DATE
      [optional, defaults to first date in time-series, input filter]
      The start_date of the series in ISOdatetime format, or 'None' for
      beginning.
  --end_date END_DATE
      [optional, defaults to last date in time-series, input filter]
      The end_date of the series in ISOdatetime format, or 'None' for end.

wdmtostd

usage: wdmtoolbox wdmtostd [-h] wdmpath [dsns ...] kwds

DEPRECATED: New scripts use 'extract'. Will be removed in the future.

positional arguments:
  wdmpath dsns kwds

options:
  -h | --help
      show this help message and exit

wdmtoswmm5rdii

<string>:8: (WARNING/2) Definition list ends without a blank line; unexpected unindent.
usage: wdmtoolbox wdmtoswmm5rdii [-h] wdmpath [dsns ...] kwds

Print out DSN data to the screen in SWMM5 RDII format.

positional arguments:
  wdmpath     Path and WDM
    filename.

  Definition list ends without a blank line; unexpected unindent.
  dsns kwds

options:
  -h | --help
      show this help message and exit

Usage - API

You can use all of the command line subcommands as functions. The function signature is identical to the command line subcommands.

Returns:

  • wdmtoolbox.extract returns a PANDAS DataFrame.

  • wdmtoolbox.listdsns returns a Python dictionary.

  • Almost all of the remaining functions do not return anything.

Input can be a CSV or TAB separated file, or a PANDAS DataFrame and is supplied to the function via the ‘input_ts’ keyword.

Simply import wdmtoolbox:

from wdmtoolbox import wdmtoolbox

# Then you could call the functions
ntsd = wdmtoolbox.extract('test.wdm', 4)

# Once you have a PANDAS DataFrame you can use that as input.
# For example, use 'tstoolbox' to aggregate...
from tstoolbox import tstoolbox
ntsd = tstoolbox.aggregate(statistic='mean', agg_interval='daily', input_ts=ntsd)