tstoolbox.tstoolbox.convert_index¶
- tstoolbox.tstoolbox.convert_index(to, interval=None, epoch='julian', input_ts='-', columns=None, start_date=None, end_date=None, round_index=None, dropna='no', clean=False, names=None, source_units=None, target_units=None, skiprows=None)¶
Convert datetime to/from Julian dates from different epochs.
- Parameters:
to (str) – One of ‘number’ or ‘datetime’. If ‘number’, the source time-series should have a datetime index to convert to a number. If ‘datetime’, source data should be a number and the converted index will be datetime.
interval –
[optional, defaults to None, transformation]
The interval parameter defines the unit time. One of the pandas offset codes. The default of ‘None’ will set the unit time for all defined epochs to daily except ‘unix’ which will default to seconds.
You can give any smaller unit time than daily for all defined epochs except ‘unix’ which requires an interval less than seconds. For an epoch that begins with an arbitrary date, you can use any interval equal to or smaller than the frequency of the time-series.
Alias
Description
N
Nanoseconds
U
microseconds
L
milliseconds
S
Secondly
T
Minutely
H
Hourly
D
calendar Day
W
Weekly
M
Month end
MS
Month Start
Q
Quarter end
QS
Quarter Start
A
Annual end
AS
Annual Start
Business offset codes.
Alias
Description
B
Business day
BM
Business Month end
BMS
Business Month Start
BQ
Business Quarter end
BQS
Business Quarter Start
BA
Business Annual end
BAS
Business Annual Start
C
Custom business day (experimental)
CBM
Custom Business Month end
CBMS
Custom Business Month Start
Weekly has the following anchored frequencies:
Alias
Equivalents
Description
W-SUN
W
Weekly frequency (SUNdays)
W-MON
Weekly frequency (MONdays)
W-TUE
Weekly frequency (TUEsdays)
W-WED
Weekly frequency (WEDnesdays)
W-THU
Weekly frequency (THUrsdays)
W-FRI
Weekly frequency (FRIdays)
W-SAT
Weekly frequency (SATurdays)
Quarterly frequencies (Q, BQ, QS, BQS) and annual frequencies (A, BA, AS, BAS) replace the “x” in the “Alias” column to have the following anchoring suffixes:
Alias
Examples
Equivalents
Description
x-DEC
A-DEC Q-DEC AS-DEC QS-DEC
A Q AS QS
year ends end of DECember
x-JAN
year ends end of JANuary
x-FEB
year ends end of FEBruary
x-MAR
year ends end of MARch
x-APR
year ends end of APRil
x-MAY
year ends end of MAY
x-JUN
year ends end of JUNe
x-JUL
year ends end of JULy
x-AUG
year ends end of AUGust
x-SEP
year ends end of SEPtember
x-OCT
year ends end of OCTober
x-NOV
year ends end of NOVember
- epochstr
[optional, defaults to ‘julian’, transformation]
Can be one of, ‘julian’, ‘reduced’, ‘modified’, ‘truncated’, ‘dublin’, ‘cnes’, ‘ccsds’, ‘lop’, ‘lilian’, ‘rata_die’, ‘mars_sol_date’, ‘unix’, or a date and time.
If supplying a date and time, most formats are recognized, however the closer the format is to ISO 8601 the better. Also should check and make sure date was parsed as expected. If supplying only a date, the epoch starts at midnight the morning of that date.
The ‘unix’ epoch uses a default interval of seconds, and all other defined epochs use a default interval of ‘daily’.
epoch
Epoch
Calculation
Notes
julian
4713-01-01:12 BCE
JD
reduced
1858-11-16:12
JD - 2400000
modified
1858-11-17:00
JD - 2400000.5
SAO 1957
truncated
1968-05-24:00
floor (JD - 2440000.5)
NASA 1979, integer
dublin
1899-12-31:12
JD - 2415020
IAU 1955
cnes
1950-01-01:00
JD - 2433282.5
CNES [ [3] ]
ccsds
1958-01-01:00
JD - 2436204.5
CCSDS [ [3] ]
lop
1992-01-01:00
JD - 2448622.5
LOP [ [3] ]
lilian
1582-10-15[13]
floor (JD - 2299159.5)
Count of days of the Gregorian calendar, integer
rata_die
0001-01-01[13] proleptic Gregorian calendar
floor (JD - 1721424.5)
Count of days of the Common Era, integer
mars_sol
1873-12-29:12
(JD - 2405522) /1.02749
Count of Martian days
unix
1970-01-01 T00:00:00
JD - 2440587.5
seconds
- input_tsstr
[optional though required if using within Python, default is ‘-’ (stdin)]
Whether from a file or standard input, data requires a single line header of column names. The default header is the first line of the input, but this can be changed for CSV files using the ‘skiprows’ option.
Most common date formats can be used, but the closer to ISO 8601 date/time standard the better.
Comma-separated values (CSV) files or tab-separated values (TSV):
File separators will be automatically detected. Columns can be selected by name or index, where the index for data columns starts at 1.
Command line examples:
Keyword Example
Description
–input_ts=fn.csv
read all columns from ‘fn.csv’
–input_ts=fn.csv,2,1
read data columns 2 and 1 from ‘fn.csv’
–input_ts=fn.csv,2,skiprows=2
read data column 2 from ‘fn.csv’, skipping first 2 rows so header is read from third row
–input_ts=fn.xlsx,2,Sheet21
read all data from 2nd sheet all data from “Sheet21” of ‘fn.xlsx’
–input_ts=fn.hdf5,Table12,T2
read all data from table “Table12” then all data from table “T2” of ‘fn.hdf5’
–input_ts=fn.wdm,210,110
read DSNs 210, then 110 from ‘fn.wdm’
–input_ts=’-’
read all columns from standard input (stdin)
–input_ts=’-’ –columns=4,1
read column 4 and 1 from standard input (stdin)
If working with CSV or TSV files you can use redirection rather than use –input_ts=fname.csv. The following are identical:
From a file:
command subcmd –input_ts=fname.csv
From standard input (since ‘–input_ts=-’ is the default:
command subcmd < fname.csv
Can also combine commands by piping:
command subcmd < filein.csv | command subcmd1 > fileout.csv
Python library examples:
You must use the `input_ts=...` option where `input_ts` can be one of a [pandas DataFrame, pandas Series, dict, tuple, list, StringIO, or file name].
- columns
[optional, defaults to all columns, input filter]
Columns to select out of input. Can use column names from the first line header or column numbers. If using numbers, column number 1 is the first data column. To pick multiple columns; separate by commas with no spaces. As used in toolbox_utils pick command.
This solves a big problem so that you don’t have to create a data set with a certain column order, you can rearrange columns when data is read in.
- start_datestr
[optional, defaults to first date in time-series, input filter]
The start_date of the series in ISOdatetime format, or ‘None’ for beginning.
- end_datestr
[optional, defaults to last date in time-series, input filter]
The end_date of the series in ISOdatetime format, or ‘None’ for end.
- round_index
[optional, default is None which will do nothing to the index, output format]
Round the index to the nearest time point. Can significantly improve the performance since can cut down on memory and processing requirements, however be cautious about rounding to a very course interval from a small one. This could lead to duplicate values in the index.
- dropnastr
[optional, defauls it ‘no’, input filter]
Set dropna to ‘any’ to have records dropped that have NA value in any column, or ‘all’ to have records dropped that have NA in all columns. Set to ‘no’ to not drop any records. The default is ‘no’.
- clean
[optional, default is False, input filter]
The ‘clean’ command will repair a input index, removing duplicate index values and sorting.
- skiprows: list-like or integer or callable
[optional, default is None which will infer header from first line, input filter]
Line numbers to skip (0-indexed) if a list or number of lines to skip at the start of the file if an integer.
If used in Python can be a callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be
lambda x: x in [0, 2]
.- names: str
[optional, default is None, transformation]
If None, the column names are taken from the first row after ‘skiprows’ from the input dataset.
MUST include a name for all columns in the input dataset, including the index column.
- source_units: str
[optional, default is None, transformation]
If unit is specified for the column as the second field of a ‘:’ delimited column name, then the specified units and the ‘source_units’ must match exactly.
Any unit string compatible with the ‘pint’ library can be used.
- target_units: str
[optional, default is None, transformation]
The purpose of this option is to specify target units for unit conversion. The source units are specified in the header line of the input or using the ‘source_units’ keyword.
The units of the input time-series or values are specified as the second field of a ‘:’ delimited name in the header line of the input or in the ‘source_units’ keyword.
Any unit string compatible with the ‘pint’ library can be used.
This option will also add the ‘target_units’ string to the column names.
- tablefmtstr
[optional, default is ‘csv’, output format]
The table format. Can be one of ‘csv’, ‘tsv’, ‘plain’, ‘simple’, ‘grid’, ‘pipe’, ‘orgtbl’, ‘rst’, ‘mediawiki’, ‘latex’, ‘latex_raw’ and ‘latex_booktabs’.