alphapy package
Submodules
alphapy.alias module
- class alphapy.alias.Alias(name, expr, replace=False)
Bases:
object
Create a new alias as a key-value pair. All aliases are stored in
Alias.aliases
. Duplicate keys or values are not allowed, unless thereplace
parameter isTrue
.- Parameters:
name (str) – Alias key.
expr (str) – Alias value.
replace (bool, optional) – Replace the current key-value pair if it already exists.
- Variables:
Alias.aliases (dict) – Class variable for storing all known aliases
Examples
>>> Alias('atr', 'ma_truerange') >>> Alias('hc', 'higher_close')
- aliases = {}
- alphapy.alias.get_alias(alias)
Find an alias value with the given key.
- Parameters:
alias (str) – Key for finding the alias value.
- Returns:
alias_value – Value for the corresponding key.
- Return type:
str
Examples
>>> alias_value = get_alias('atr') >>> alias_value = get_alias('hc')
alphapy.alphapy_main module
alphapy.calendrical module
Package : calendrical Created : July 11, 2017 Reference : Calendrical Calculations, Cambridge Press, 2002
Copyright 2020 ScottFree Analytics LLC Mark Conway & Robert D. Scott II
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
- alphapy.calendrical.biz_day_month(rdate)
Calculate the business day of the month.
- Parameters:
rdate (int) – RDate date format.
- Returns:
bdm – Business day of month.
- Return type:
int
- alphapy.calendrical.biz_day_week(rdate)
Calculate the business day of the week.
- Parameters:
rdate (int) – RDate date format.
- Returns:
bdw – Business day of week.
- Return type:
int
- alphapy.calendrical.christmas_day(gyear, observed)
Get Christmas Day for a given year.
- Parameters:
gyear (int) – Gregorian year.
observed (bool) – False if the exact date, True if the weekday.
- Returns:
xmas – Christmas Day in RDate format.
- Return type:
int
- alphapy.calendrical.cinco_de_mayo(gyear)
Get Cinco de Mayo for a given year.
- Parameters:
gyear (int) – Gregorian year.
- Returns:
cinco_de_mayo – Cinco de Mayo in RDate format.
- Return type:
int
- alphapy.calendrical.day_of_week(rdate)
Get the ordinal day of the week.
- Parameters:
rdate (int) – RDate date format.
- Returns:
dw – Ordinal day of the week.
- Return type:
int
- alphapy.calendrical.day_of_year(gyear, gmonth, gday)
Calculate the day number of the given calendar year.
- Parameters:
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
gday (int) – Gregorian day.
- Returns:
dy – Day number of year in RDate format.
- Return type:
int
- alphapy.calendrical.days_left_in_year(gyear, gmonth, gday)
Calculate the number of days remaining in the calendar year.
- Parameters:
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
gday (int) – Gregorian day.
- Returns:
days_left – Calendar days remaining in RDate format.
- Return type:
int
- alphapy.calendrical.easter_day(gyear)
Get Easter Day for a given year.
- Parameters:
gyear (int) – Gregorian year.
- Returns:
ed – Easter Day in RDate format.
- Return type:
int
- alphapy.calendrical.expand_dates(date_list)
- alphapy.calendrical.fathers_day(gyear)
Get Father’s Day for a given year.
- Parameters:
gyear (int) – Gregorian year.
- Returns:
fathers_day – Father’s Day in RDate format.
- Return type:
int
- alphapy.calendrical.first_kday(k, gyear, gmonth, gday)
Calculate the first kday in RDate format.
- Parameters:
k (int) – Day of the week.
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
gday (int) – Gregorian day.
- Returns:
fkd – first-kday in RDate format.
- Return type:
int
- alphapy.calendrical.gdate_to_rdate(gyear, gmonth, gday)
Convert Gregorian date to RDate format.
- Parameters:
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
gday (int) – Gregorian day.
- Returns:
rdate – RDate date format.
- Return type:
int
- alphapy.calendrical.get_holiday_names()
Get the list of defined holidays.
- Returns:
holidays – List of holiday names.
- Return type:
list of str
- alphapy.calendrical.get_nth_kday_of_month(gday, gmonth, gyear)
Convert Gregorian date to RDate format.
- Parameters:
gday (int) – Gregorian day.
gmonth (int) – Gregorian month.
gyear (int) – Gregorian year.
- Returns:
nth – Ordinal number of a given day’s occurrence within the month, for example, the third Friday of the month.
- Return type:
int
- alphapy.calendrical.get_rdate(row)
Extract RDate from a dataframe.
- Parameters:
row (pandas.DataFrame) – Row of a dataframe containing year, month, and day.
- Returns:
rdate – RDate date format.
- Return type:
int
- alphapy.calendrical.good_friday(gyear)
Get Good Friday for a given year.
- Parameters:
gyear (int) – Gregorian year.
- Returns:
gf – Good Friday in RDate format.
- Return type:
int
- alphapy.calendrical.halloween(gyear)
Get Halloween for a given year.
- Parameters:
gyear (int) – Gregorian year.
- Returns:
halloween – Halloween in RDate format.
- Return type:
int
- alphapy.calendrical.independence_day(gyear, observed)
Get Independence Day for a given year.
- Parameters:
gyear (int) – Gregorian year.
observed (bool) – False if the exact date, True if the weekday.
- Returns:
d4j – Independence Day in RDate format.
- Return type:
int
- alphapy.calendrical.kday_after(rdate, k)
Calculate the day after a given RDate.
- Parameters:
rdate (int) – RDate date format.
k (int) – Day of the week.
- Returns:
kda – kday-after in RDate format.
- Return type:
int
- alphapy.calendrical.kday_before(rdate, k)
Calculate the day before a given RDate.
- Parameters:
rdate (int) – RDate date format.
k (int) – Day of the week.
- Returns:
kdb – kday-before in RDate format.
- Return type:
int
- alphapy.calendrical.kday_nearest(rdate, k)
Calculate the day nearest a given RDate.
- Parameters:
rdate (int) – RDate date format.
k (int) – Day of the week.
- Returns:
kdn – kday-nearest in RDate format.
- Return type:
int
- alphapy.calendrical.kday_on_after(rdate, k)
Calculate the day on or after a given RDate.
- Parameters:
rdate (int) – RDate date format.
k (int) – Day of the week.
- Returns:
kdoa – kday-on-or-after in RDate format.
- Return type:
int
- alphapy.calendrical.kday_on_before(rdate, k)
Calculate the day on or before a given RDate.
- Parameters:
rdate (int) – RDate date format.
k (int) – Day of the week.
- Returns:
kdob – kday-on-or-before in RDate format.
- Return type:
int
- alphapy.calendrical.labor_day(gyear)
Get Labor Day for a given year.
- Parameters:
gyear (int) – Gregorian year.
- Returns:
lday – Labor Day in RDate format.
- Return type:
int
- alphapy.calendrical.last_kday(k, gyear, gmonth, gday)
Calculate the last kday in RDate format.
- Parameters:
k (int) – Day of the week.
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
gday (int) – Gregorian day.
- Returns:
lkd – last-kday in RDate format.
- Return type:
int
- alphapy.calendrical.leap_year(gyear)
Determine if this is a Gregorian leap year.
- Parameters:
gyear (int) – Gregorian year.
- Returns:
leap_year – True if a Gregorian leap year, else False.
- Return type:
bool
- alphapy.calendrical.memorial_day(gyear)
Get Memorial Day for a given year.
- Parameters:
gyear (int) – Gregorian year.
- Returns:
md – Memorial Day in RDate format.
- Return type:
int
- alphapy.calendrical.mlk_day(gyear)
Get Martin Luther King Day for a given year.
- Parameters:
gyear (int) – Gregorian year.
- Returns:
mlkday – Martin Luther King Day in RDate format.
- Return type:
int
- alphapy.calendrical.mothers_day(gyear)
Get Mother’s Day for a given year.
- Parameters:
gyear (int) – Gregorian year.
- Returns:
mothers_day – Mother’s Day in RDate format.
- Return type:
int
- alphapy.calendrical.new_years_day(gyear, observed)
Get New Year’s day for a given year.
- Parameters:
gyear (int) – Gregorian year.
observed (bool) – False if the exact date, True if the weekday.
- Returns:
nyday – New Year’s Day in RDate format.
- Return type:
int
- alphapy.calendrical.next_event(rdate, events)
Find the next event after a given date.
- Parameters:
rdate (int) – RDate date format.
events (list of RDate (int)) – Monthly events in RDate format.
- Returns:
event – Next event in RDate format.
- Return type:
RDate (int)
- alphapy.calendrical.next_holiday(rdate, holidays)
Find the next holiday after a given date.
- Parameters:
rdate (int) – RDate date format.
holidays (dict of RDate (int)) – Holidays in RDate format.
- Returns:
holiday – Next holiday in RDate format.
- Return type:
RDate (int)
- alphapy.calendrical.nth_bizday(n, gyear, gmonth)
Calculate the nth business day in a month.
- Parameters:
n (int) – Number of the business day to get.
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
- Returns:
bizday – Nth business day of a given month in RDate format.
- Return type:
int
- alphapy.calendrical.nth_kday(n, k, gyear, gmonth, gday)
Calculate the nth-kday in RDate format.
- Parameters:
n (int) – Occurrence of a given day counting in either direction.
k (int) – Day of the week.
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
gday (int) – Gregorian day.
- Returns:
nthkday – nth-kday in RDate format.
- Return type:
int
- alphapy.calendrical.presidents_day(gyear)
Get President’s Day for a given year.
- Parameters:
gyear (int) – Gregorian year.
- Returns:
prezday – President’s Day in RDate format.
- Return type:
int
- alphapy.calendrical.previous_event(rdate, events)
Find the previous event before a given date.
- Parameters:
rdate (int) – RDate date format.
events (list of RDate (int)) – Monthly events in RDate format.
- Returns:
event – Previous event in RDate format.
- Return type:
RDate (int)
- alphapy.calendrical.previous_holiday(rdate, holidays)
Find the previous holiday before a given date.
- Parameters:
rdate (int) – RDate date format.
holidays (dict of RDate (int)) – Holidays in RDate format.
- Returns:
holiday – Previous holiday in RDate format.
- Return type:
RDate (int)
- alphapy.calendrical.rdate_to_gdate(rdate)
Convert RDate format to Gregorian date format.
- Parameters:
rdate (int) – RDate date format.
- Returns:
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
gday (int) – Gregorian day.
- alphapy.calendrical.rdate_to_gyear(rdate)
Convert RDate format to Gregorian year.
- Parameters:
rdate (int) – RDate date format.
- Returns:
gyear – Gregorian year.
- Return type:
int
- alphapy.calendrical.saint_patricks_day(gyear)
Get Saint Patrick’s day for a given year.
- Parameters:
gyear (int) – Gregorian year.
observed (bool) – False if the exact date, True if the weekday.
- Returns:
patricks – Saint Patrick’s Day in RDate format.
- Return type:
int
- alphapy.calendrical.set_events(n, k, gyear, gday)
Define monthly events for a given year.
- Parameters:
n (int) – Occurrence of a given day counting in either direction.
k (int) – Day of the week.
gyear (int) – Gregorian year for the events.
gday (int) – Gregorian day representing the first day to consider.
- Returns:
events – Monthly events in RDate format.
- Return type:
list of RDate (int)
Example
>>> # Options Expiration (Third Friday of every month) >>> set_events(3, 5, 2017, 1)
- alphapy.calendrical.set_holidays(gyear, observe)
Determine if this is a Gregorian leap year.
- Parameters:
gyear (int) – Value for the corresponding key.
observe (bool) – True to get the observed date, otherwise False.
- Returns:
holidays – Set of holidays in RDate format for a given year.
- Return type:
dict of int
- alphapy.calendrical.subtract_dates(gyear1, gmonth1, gday1, gyear2, gmonth2, gday2)
Calculate the difference between two Gregorian dates.
- Parameters:
gyear1 (int) – Gregorian year of first date.
gmonth1 (int) – Gregorian month of first date.
gday1 (int) – Gregorian day of first date.
gyear2 (int) – Gregorian year of successive date.
gmonth2 (int) – Gregorian month of successive date.
gday2 (int) – Gregorian day of successive date.
- Returns:
delta_days – Difference in days in RDate format.
- Return type:
int
- alphapy.calendrical.thanksgiving_day(gyear)
Get Thanksgiving Day for a given year.
- Parameters:
gyear (int) – Gregorian year.
- Returns:
tday – Thanksgiving Day in RDate format.
- Return type:
int
- alphapy.calendrical.valentines_day(gyear)
Get Valentine’s day for a given year.
- Parameters:
gyear (int) – Gregorian year.
- Returns:
valentines – Valentine’s Day in RDate format.
- Return type:
int
- alphapy.calendrical.veterans_day(gyear, observed)
Get Veteran’s day for a given year.
- Parameters:
gyear (int) – Gregorian year.
observed (bool) – False if the exact date, True if the weekday.
- Returns:
veterans – Veteran’s Day in RDate format.
- Return type:
int
alphapy.data module
alphapy.estimators module
- class alphapy.estimators.Estimator(algorithm, model_type, estimator, grid)
Bases:
object
Store information about each estimator.
- Parameters:
algorithm (str) – Abbreviation representing the given algorithm.
model_type (enum ModelType) – The machine learning task for this algorithm.
estimator (function) – A scikit-learn, TensorFlow, or XGBoost function.
grid (dict) – The dictionary of hyperparameters for grid search.
- alphapy.estimators.find_optional_packages()
Find optional machine learning packages.
- Parameters:
None
- Return type:
None
- alphapy.estimators.get_algos_config(cfg_dir)
Read the algorithms configuration file.
- Parameters:
cfg_dir (str) – The directory where the configuration file
algos.yml
is stored.- Returns:
specs – The specifications for determining which algorithms to run.
- Return type:
dict
- alphapy.estimators.get_estimators(alphapy_specs, model)
Define all the AlphaPy estimators based on the contents of the
algos.yml
file.- Parameters:
alphapy_specs (dict) – The specifications for controlling the AlphaPy pipeline.
model (alphapy.Model) – The model object containing global AlphaPy parameters.
- Returns:
estimators – All of the estimators required for running the pipeline.
- Return type:
dict
alphapy.features module
alphapy.frame module
- class alphapy.frame.Frame(name, space, df)
Bases:
object
Create a new Frame that points to a dataframe in memory. All frames are stored in
Frame.frames
. Names must be unique.- Parameters:
name (str) – Frame key.
space (alphapy.Space) – Namespace of the given frame.
df (pandas.DataFrame) – The contents of the actual dataframe.
- Variables:
frames (dict) – Class variable for storing all known frames
Examples
>>> Frame('tech', Space('stock', 'prices', '5m'), df)
- frames = {}
- alphapy.frame.dump_frames(group, directory, extension, separator)
Save a group of data frames to disk.
- Parameters:
group (alphapy.Group) – The collection of frames to be saved to the file system.
directory (str) – Full directory specification.
extension (str) – File name extension, e.g.,
csv
.separator (str) – The delimiter between fields in the file.
- Returns:
None
- Return type:
None
- alphapy.frame.frame_name(name, space)
Get the frame name for the given name and space.
- Parameters:
name (str) – Group name.
space (alphapy.Space) – Context or namespace for the given group name.
- Returns:
fname – Frame name.
- Return type:
str
Examples
>>> fname = frame_name('tech', Space('stock', 'prices', '1d')) # 'tech_stock_prices_1d'
- alphapy.frame.load_frames(group, directory, extension, separator, splits=False)
Read a group of dataframes into memory.
- Parameters:
group (alphapy.Group) – The collection of frames to be read into memory.
directory (str) – Full directory specification.
extension (str) – File name extension, e.g.,
csv
.separator (str) – The delimiter between fields in the file.
splits (bool, optional) – If
True
, then all the members of the group are stored in separate files corresponding with each member. IfFalse
, then the data are stored in a single file.
- Returns:
all_frames – The list of pandas dataframes loaded from the file location. If the files cannot be located, then
None
is returned.- Return type:
list
- alphapy.frame.read_frame(directory, filename, extension, separator, index_col=False)
Read a delimiter-separated file into a data frame.
- Parameters:
directory (str) – Full directory specification.
filename (str) – Name of the file to read, excluding the
extension
.extension (str) – File name extension, e.g.,
csv
.separator (str) – The delimiter between fields in the file.
index_col (str, optional) – Column to use as the row labels in the dataframe.
- Returns:
df – The pandas dataframe loaded from the file location. If the file cannot be located, then
None
is returned.- Return type:
pandas.DataFrame
- alphapy.frame.sequence_frame(df, target, date_id, forecast_period=1, n_lags=1, leaders=[], group_id=None)
Create sequences of lagging and leading values, with lagging applied within groups.
- Parameters:
df (pandas.DataFrame) – The original dataframe.
target (str) – The target variable for prediction.
date_id (str) – The datetime column.
forecast_period (int) – The period for forecasting the target of the analysis.
n_lags (int) – The number of lagged rows for prediction.
leaders (list) – The features that are contemporaneous with the target.
group_id (str, optional) – The grouping column.
- Returns:
new_frame – The transformed dataframe with variable sequences.
- Return type:
pandas.DataFrame
- alphapy.frame.write_frame(df, directory, filename, extension, separator, tag='', index=False, index_label=None, columns=None)
Write a dataframe into a delimiter-separated file.
- Parameters:
df (pandas.DataFrame) – The pandas dataframe to save to a file.
directory (str) – Full directory specification.
filename (str) – Name of the file to write, excluding the
extension
.extension (str) – File name extension, e.g.,
csv
.separator (str) – The delimiter between fields in the file.
tag (str, optional) – An additional tag to add to the file name.
index (bool, optional) – If
True
, write the row names (index).index_label (str, optional) – A column label for the
index
.columns (str, optional) – A list of column names.
- Returns:
None
- Return type:
None
alphapy.globals module
- class alphapy.globals.BarType(*values)
Bases:
Enum
Bar Types.
Bar Types for running models, usually translated from a normal OHLC bar to a weighted bar based on volume, dollar amount, etc.
- dollar = 2
- heikinashi = 3
- time = 1
- class alphapy.globals.Encoders(*values)
Bases:
Enum
AlphaPy Encoders.
These are the encoders used in AlphaPy, as configured in the
model.yml
file (features:encoding:type) You can learn more about encoders here [ENC].- backdiff = 1
- basen = 2
- binary = 3
- catboost = 4
- hashing = 5
- helmert = 6
- jstein = 7
- leaveone = 8
- mestimate = 9
- onehot = 10
- ordinal = 11
- polynomial = 12
- sum = 13
- target = 14
- woe = 15
- class alphapy.globals.ModelType(*values)
Bases:
Enum
AlphaPy Model Types.
Note
Multiclass Classification
multiclass
is not yet implemented.- classification = 1
- multiclass = 3
- ranking = 2
- regression = 4
- system = 5
- class alphapy.globals.Objective(*values)
Bases:
Enum
Scoring Function Objectives.
Best model selection is based on the scoring or Objective function, which must be either maximized or minimized. For example,
roc_auc
is maximized, whileneg_log_loss
is minimized.- maximize = 1
- minimize = 2
- class alphapy.globals.Orders
Bases:
object
System Order Types.
- Variables:
le (str) – long entry
se (str) – short entry
lx (str) – long exit
sx (str) – short exit
lh (str) – long exit at the end of the holding period
sh (str) – short exit at the end of the holding period
- le = 'le'
- lh = 'lh'
- lx = 'lx'
- se = 'se'
- sh = 'sh'
- sx = 'sx'
alphapy.group module
- class alphapy.group.Group(name, space=<alphapy.space.Space object>, dynamic=True, members={})
Bases:
object
Create a new Group that contains common members. All defined groups are stored in
Group.groups
. Group names must be unique.- Parameters:
name (str) – Group name.
space (alphapy.Space, optional) – Namespace for the given group.
dynamic (bool, optional, default
True
) – Flag for defining whether or not the group membership can change.members (set, optional) – The initial members of the group, especially if the new group is fixed, e.g., not
dynamic
.
- Variables:
groups (dict) – Class variable for storing all known groups
Examples
>>> Group('tech')
- add(newlist)
Add new members to the group.
- Parameters:
newlist (list) – New members or identifiers to add to the group.
- Returns:
None
- Return type:
None
Notes
New members cannot be added to a fixed or non-dynamic group.
- groups = {}
- member(item)
Find a member in the group.
- Parameters:
item (str) – The member to find the group.
- Returns:
member_exists – Flag indicating whether or not the member is in the group.
- Return type:
bool
- remove(remlist)
Read in data from the given directory in a given format.
- Parameters:
remlist (list) – The list of members to remove from the group.
- Returns:
None
- Return type:
None
Notes
Members cannot be removed from a fixed or non-dynamic group.
alphapy.metalabel module
alphapy.mflow_main module
alphapy.model module
alphapy.nlp module
alphapy.optimize module
- alphapy.optimize.grid_report(results, n_top=3)
Report the top grid search scores.
- Parameters:
results (dict of numpy arrays) – Mean test scores for each grid search iteration.
n_top (int, optional) – The number of grid search results to report.
- Returns:
None
- Return type:
None
- alphapy.optimize.hyper_grid_search(model, estimator)
Return the best hyperparameters for a grid search.
- Parameters:
model (alphapy.Model) – The model object with grid search parameters.
estimator (alphapy.Estimator) – The estimator containing the hyperparameter grid.
- Returns:
model – The model object with the grid search estimator.
- Return type:
alphapy.Model
Notes
To reduce the time required for grid search, use either randomized grid search with a fixed number of iterations or a full grid search with subsampling. AlphaPy uses the scikit-learn Pipeline with feature selection to reduce the feature space.
References
For more information about grid search, refer to [GRID].
To learn about pipelines, refer to [PIPE].
- alphapy.optimize.rfecv_search(model, algo)
Return the best feature set using recursive feature elimination with cross-validation.
- Parameters:
model (alphapy.Model) – The model object with RFE parameters.
algo (str) – Abbreviation of the algorithm to run.
- Returns:
model – The model object with the RFE support vector and the best estimator.
- Return type:
alphapy.Model
Notes
If a scoring function is available, then AlphaPy can perform RFE with Cross-Validation (CV), as in this function; otherwise, it just does RFE without CV.
References
For more information about Recursive Feature Elimination, refer to [RFECV].
alphapy.plots module
alphapy.portfolio module
alphapy.space module
- class alphapy.space.Space(subject='stock', source='prices', fractal='1d')
Bases:
object
Create a new namespace.
- Parameters:
subject (str) – An identifier for a group of related items.
source (str) – The data source of the
subject
.fractal (str) – The time fractal of the data, e.g., “5m” or “1d”.
- alphapy.space.space_name(subject, source, fractal)
Get the namespace string.
- Parameters:
subject (str) – An identifier for a group of related items.
source (str) – The data source of the
subject
.fractal (str) – The time fractal of the data, e.g., “5m” or “1d”.
- Returns:
name – The joined namespace string.
- Return type:
str
alphapy.system module
alphapy.transforms module
Package : AlphaPy Module : transforms Created : March 14, 2020
Copyright 2024 ScottFree Analytics LLC Mark Conway & Robert D. Scott II
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
- alphapy.transforms.adx(df, p=14)
Calculate the Average Directional Index (ADX).
- Parameters:
df (pandas.DataFrame) – Dataframe with all columns required for calculation. If you are applying ADX through
vapply
, then these columns are calculated automatically.p (int) – The period over which to calculate the ADX.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
References
The Average Directional Movement Index (ADX) was invented by J. Welles Wilder in 1978 [WIKI_ADX]. Its value reflects the strength of trend in any given instrument.
- alphapy.transforms.bbands(df, c='close', p=20, sd=2.0, low_band=True)
Calculate the Bollinger Bands.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the Simple Moving Average.
sd (float) – The number of standard deviations.
low_band (bool, optional) – If set to True, then calculate the lower band, else the upper band.
- Returns:
bband – The series for the selected Bollinger Band.
- Return type:
pandas.Series
- alphapy.transforms.bblower(df, c='close', p=20, sd=1.5)
Calculate the lower Bollinger Band.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the Simple Moving Average.
sd (float) – The number of standard deviations.
- Returns:
lower_band – The series containing the lower Bollinger Band.
- Return type:
pandas.Series
- alphapy.transforms.bbupper(df, c='close', p=20, sd=1.5)
Calculate the upper Bollinger Band.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the Simple Moving Average.
sd (float) – The number of standard deviations.
- Returns:
upper_band – The series containing the upper Bollinger Band.
- Return type:
pandas.Series
- alphapy.transforms.bizday(df, c)
Extract business day of month and week.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.
- Returns:
date_features – The dataframe containing the date features.
- Return type:
pandas.DataFrame
- alphapy.transforms.c2max(df, c1, c2)
Take the maximum value between two columns in a dataframe.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the two columns
c1
andc2
.c1 (str) – Name of the first column in the dataframe
df
.c2 (str) – Name of the second column in the dataframe
df
.
- Returns:
max_val – The maximum value of the two columns.
- Return type:
float
- alphapy.transforms.c2min(df, c1, c2)
Take the minimum value between two columns in a dataframe.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the two columns
c1
andc2
.c1 (str) – Name of the first column in the dataframe
df
.c2 (str) – Name of the second column in the dataframe
df
.
- Returns:
min_val – The minimum value of the two columns.
- Return type:
float
- alphapy.transforms.dateparts(df, c)
Extract date into its components: year, month, day, dayofweek.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.
- Returns:
date_features – The dataframe containing the date features.
- Return type:
pandas.DataFrame
- alphapy.transforms.diff(df, c, n=1)
Calculate the n-th order difference for the given variable.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.n (int) – The number of times that the values are differenced.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
- alphapy.transforms.diminus(df, p=14)
Calculate the Minus Directional Indicator (-DI).
- Parameters:
df (pandas.DataFrame) – Dataframe with columns
high
andlow
.p (int) – The period over which to calculate the -DI.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
References
A component of the average directional index (ADX) that is used to measure the presence of a downtrend. When the -DI is sloping downward, it is a signal that the downtrend is getting stronger [IP_NDI].
- alphapy.transforms.diplus(df, p=14)
Calculate the Plus Directional Indicator (+DI).
- Parameters:
df (pandas.DataFrame) – Dataframe with columns
high
andlow
.p (int) – The period over which to calculate the +DI.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
References
A component of the average directional index (ADX) that is used to measure the presence of an uptrend. When the +DI is sloping upward, it is a signal that the uptrend is getting stronger [IP_PDI].
- alphapy.transforms.dminus(df)
Calculate the Minus Directional Movement (-DM).
- Parameters:
df (pandas.DataFrame) – Dataframe with high and low columns.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
References
Directional movement is negative (minus) when the prior low minus the current low is greater than the current high minus the prior high. This so-called Minus Directional Movement (-DM) equals the prior low minus the current low, provided it is positive. A negative value would simply be entered as zero [SC_ADX].
- alphapy.transforms.dmplus(df)
Calculate the Plus Directional Movement (+DM).
- Parameters:
df (pandas.DataFrame) – Dataframe with high and low columns.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
References
Directional movement is positive (plus) when the current high minus the prior high is greater than the prior low minus the current low. This so-called Plus Directional Movement (+DM) then equals the current high minus the prior high, provided it is positive. A negative value would simply be entered as zero [SC_ADX].
- alphapy.transforms.ema(df, c, p=20)
Calculate the Exponential Moving Average (EMA) on a rolling basis.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the Exponential Moving Average (EMA).
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
References
An exponential moving average (EMA) is a type of moving average that is similar to a simple moving average, except that more weight is given to the latest data [IP_EMA].
- alphapy.transforms.gap(df)
Calculate the gap percentage between the current open and the previous close.
- Parameters:
df (pandas.DataFrame) – Dataframe with open and close columns.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
References
A gap is a break between prices on a chart that occurs when the price of a stock makes a sharp move up or down with no trading occurring in between [IP_GAP].
- alphapy.transforms.gapbadown(df)
Determine whether or not there has been a breakaway gap down.
- Parameters:
df (pandas.DataFrame) – Dataframe with open and low columns.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (bool)
References
A breakaway gap represents a gap in the movement of a stock price supported by levels of high volume [IP_BAGAP].
- alphapy.transforms.gapbaup(df)
Determine whether or not there has been a breakaway gap up.
- Parameters:
df (pandas.DataFrame) – Dataframe with open and high columns.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (bool)
References
A breakaway gap represents a gap in the movement of a stock price supported by levels of high volume [IP_BAGAP].
- alphapy.transforms.gapdown(df)
Determine whether or not there has been a gap down.
- Parameters:
df (pandas.DataFrame) – Dataframe with open and close columns.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (bool)
References
A gap is a break between prices on a chart that occurs when the price of a stock makes a sharp move up or down with no trading occurring in between [IP_GAP].
- alphapy.transforms.gapup(df)
Determine whether or not there has been a gap up.
- Parameters:
df (pandas.DataFrame) – Dataframe with open and close columns.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (bool)
References
A gap is a break between prices on a chart that occurs when the price of a stock makes a sharp move up or down with no trading occurring in between [IP_GAP].
- alphapy.transforms.gtval(df, c1, c2)
Determine whether or not the first column of a dataframe is greater than the second.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the two columns
c1
andc2
.c1 (str) – Name of the first column in the dataframe
df
.c2 (str) – Name of the second column in the dataframe
df
.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (bool)
- alphapy.transforms.gtval0(df, c1, c2)
For positive values in the first column of the dataframe that are greater than the second column, get the value in the first column, otherwise return zero.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the two columns
c1
andc2
.c1 (str) – Name of the first column in the dataframe
df
.c2 (str) – Name of the second column in the dataframe
df
.
- Returns:
new_val – A positive value or zero.
- Return type:
float
- alphapy.transforms.haclose(df)
Calculate the Heikin-Ashi Close.
- Parameters:
df (pandas.DataFrame) – Dataframe with OHLC columns.
- Returns:
haclose_ds – The series containing the Heikin-Ashi Close.
- Return type:
pandas.Series
- alphapy.transforms.hahigh(df)
Calculate the Heikin-Ashi High.
- Parameters:
df (pandas.DataFrame) – Dataframe with OHLC columns.
- Returns:
hahigh_ds – The series containing the Heikin-Ashi High.
- Return type:
pandas.Series
- alphapy.transforms.halow(df)
Calculate the Heikin-Ashi Low.
- Parameters:
df (pandas.DataFrame) – Dataframe with OHLC columns.
- Returns:
halow_ds – The series containing the Heikin-Ashi Low.
- Return type:
pandas.Series
- alphapy.transforms.haopen(df)
Calculate the Heikin-Ashi Open.
- Parameters:
df (pandas.DataFrame) – Dataframe with OHLC columns.
- Returns:
haopen – The series containing the Heikin-Ashi Open.
- Return type:
pandas.Series
- alphapy.transforms.higher(df, c, o=1)
Determine whether or not a series value is higher than the value
o
periods back.- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.o (int, optional) – Offset value for shifting the series.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (bool)
- alphapy.transforms.highest(df, c, p=20)
Calculate the highest value on a rolling basis.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the rolling maximum.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
- alphapy.transforms.hlrange(df, p=1)
Calculate the Range, the difference between High and Low.
- Parameters:
df (pandas.DataFrame) – Dataframe with columns
high
andlow
.p (int) – The period over which the range is calculated.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
- alphapy.transforms.keltner(df, c='close', p=20, atrs=1.5, channel='midline')
Calculate the Keltner Channels.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the Exponential Moving Average.
atrs (float) – The multiple of Average True Range.
- Returns:
kc – The series containing the Keltner Channel.
- Return type:
pandas.Series
- alphapy.transforms.keltnerlb(df, c='close', p=20, atrs=1.5)
Calculate the lower Keltner Channel.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the Exponential Moving Average.
atrs (float) – The multiple of Average True Range.
- Returns:
kclb – The series containing the lower Keltner Channel.
- Return type:
pandas.Series
- alphapy.transforms.keltnerml(df, c='close', p=20, atrs=1.5)
Calculate the midline Keltner Channel.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the Exponential Moving Average.
atrs (float) – The multiple of Average True Range.
- Returns:
kcml – The series containing the midline Keltner Channel.
- Return type:
pandas.Series
- alphapy.transforms.keltnerub(df, c='close', p=20, atrs=1.5)
Calculate the upper Keltner Channel.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the Exponential Moving Average.
atrs (float) – The multiple of Average True Range.
- Returns:
kcub – The series containing the upper Keltner Channel.
- Return type:
pandas.Series
- alphapy.transforms.lower(df, c, o=1)
Determine whether or not a series value is lower than the value
o
periods back.- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.o (int, optional) – Offset value for shifting the series.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (bool)
- alphapy.transforms.lowest(df, c, p=20)
Calculate the lowest value on a rolling basis.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the rolling minimum.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
- alphapy.transforms.ma(df, c='close', p=20)
Calculate the mean on a rolling basis.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the rolling mean.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
References
In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating series of averages of different subsets of the full data set [WIKI_MA].
- alphapy.transforms.maabove(df, c, p=50)
Determine those values of the dataframe that are above the moving average.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period of the moving average.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (bool)
- alphapy.transforms.mabelow(df, c, p=50)
Determine those values of the dataframe that are below the moving average.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period of the moving average.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (bool)
- alphapy.transforms.maratio(df, c, p1=1, p2=10)
Calculate the ratio of two moving averages.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p1 (int) – The period of the first moving average.
p2 (int) – The period of the second moving average.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
- alphapy.transforms.negval0(df, c)
Get the negative value, otherwise zero.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.
- Returns:
new_column – Negative value or zero.
- Return type:
pandas.Series (float)
- alphapy.transforms.negvals(df, c)
Find the negative values in the series.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (bool)
- alphapy.transforms.net(df, c='close', o=1)
Calculate the net change of a given column.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.o (int, optional) – Offset value for shifting the series.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
References
Net change is the difference between the closing price of a security on the day’s trading and the previous day’s closing price. Net change can be positive or negative and is quoted in terms of dollars [IP_NET].
- alphapy.transforms.netreturn(df, c, o=1)
Calculate the net return, or Return On Invesment (ROI)
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.o (int, optional) – Offset value for shifting the series.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
References
ROI measures the amount of return on an investment relative to the original cost. To calculate ROI, the benefit (or return) of an investment is divided by the cost of the investment, and the result is expressed as a percentage or a ratio [IP_ROI].
- alphapy.transforms.pchange(df, c, o=1)
Calculate the percentage change within the same variable.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.o (int) – Offset to the previous value.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
- alphapy.transforms.pchange2(df, c1, c2)
Calculate the percentage change between two variables.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the two columns
c1
andc2
.c1 (str) – Name of the first column in the dataframe
df
.c2 (str) – Name of the second column in the dataframe
df
.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
- alphapy.transforms.posval0(df, c)
Get the positive value, otherwise zero.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.
- Returns:
new_column – Positive value or zero.
- Return type:
pandas.Series (float)
- alphapy.transforms.posvals(df, c)
Find the positive values in the series.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (bool)
- alphapy.transforms.rebalancesignal(df)
Shannon’s Demon rebalance signal for ML target.
- Parameters:
df (pandas.DataFrame) – Dataframe with Shannon signal columns.
- Returns:
new_column – 1 for rebalancing periods, 0 for hold periods.
- Return type:
pandas.Series (int)
- alphapy.transforms.rindex(df, ci, ch, cl, p=1)
Calculate the range index spanning a given period
p
.The range index is a number between 0 and 100 that relates the value of the index column
ci
to the high columnch
and the low columncl
. For example, if the low value of the range is 10 and the high value is 20, then the range index for a value of 15 would be 50%. The range index for 18 would be 80%.- Parameters:
df (pandas.DataFrame) – Dataframe containing the columns
ci
,ch
, andcl
.ci (str) – Name of the index column in the dataframe
df
.ch (str) – Name of the high column in the dataframe
df
.cl (str) – Name of the low column in the dataframe
df
.p (int) – The period over which the range index of column
ci
is calculated.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
- alphapy.transforms.rsi(df, p=14)
Calculate the Relative Strength Index (RSI).
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
net
.p (int) – The period over which to calculate the RSI.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
References
Developed by J. Welles Wilder, the Relative Strength Index (RSI) is a momentum oscillator that measures the speed and change of price movements [SC_RSI].
- alphapy.transforms.runs(df, c='close', w=20)
Calculate the total number of runs.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.w (int) – The rolling period.
- Returns:
runs_value – The total number of distinct runs in the rolling window.
- Return type:
int
Example
>>> runs(df, c, 20)
- alphapy.transforms.runstest(df, c='close', w=20, wfuncs='all')
Perform a runs test on binary series.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.w (int) – The rolling period.
wfuncs (list) – The set of runs test functions to apply to the column:
'all'
:Run all of the functions below.
'rtotal'
:The running total over the
window
period.'runs'
:Total number of runs in
window
.'streak'
:The length of the latest streak.
'zscore'
:The Z-Score over the
window
period.
- Returns:
new_features – The dataframe containing the runs test features.
- Return type:
pandas.DataFrame
References
For more information about runs tests for detecting non-randomness, refer to [RUNS].
- alphapy.transforms.runtotal(df, c='close', w=50)
Calculate the running total.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.w (int) – The rolling period.
- Returns:
running_total – The final running total.
- Return type:
int
Example
>>> runtotal(df, c, 50))
- alphapy.transforms.shannhold(df)
Shannon’s Demon hold signal.
- Parameters:
df (pandas.DataFrame) – Dataframe with weight_deviation column.
- Returns:
new_column – True when weight deviation is low.
- Return type:
pandas.Series (bool)
- alphapy.transforms.shannlong(df)
Shannon’s Demon long signal.
- Parameters:
df (pandas.DataFrame) – Dataframe with weight_deviation column.
- Returns:
new_column – True when high weight deviation and positive weight deviation.
- Return type:
pandas.Series (bool)
- alphapy.transforms.shannshort(df)
Shannon’s Demon short signal.
- Parameters:
df (pandas.DataFrame) – Dataframe with weight_deviation column.
- Returns:
new_column – True when high weight deviation and negative weight deviation.
- Return type:
pandas.Series (bool)
- alphapy.transforms.split2letters(df, c)
Separate text into distinct characters.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the text column in the dataframe
df
.
- Returns:
new_feature – The array containing the new feature.
- Return type:
pandas.Series
Example
The value ‘abc’ becomes ‘a b c’.
- alphapy.transforms.streak(df, c='close', w=20)
Determine the length of the latest streak.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.w (int) – The rolling period.
- Returns:
latest_streak – The length of the latest streak.
- Return type:
int
Example
>>> streak(df, c, 20)
- alphapy.transforms.tdseqbuy(df, c='close', high='high', low='low')
Calculate Tom DeMark’s Sequential Buy indicator.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the columns
c
,high
, andlow
.c (str, optional) – Name of the column in the dataframe
df
representing the close prices.high (str, optional) – Name of the column in the dataframe
df
representing the high prices.low (str, optional) – Name of the column in the dataframe
df
representing the low prices.
- Returns:
tdbuy – The array containing the Sequential Buy count.
- Return type:
pandas.Series
References
Tom DeMark’s Sequential indicator is used to identify a potential reversal of the current trend by comparing the closing price to previous closing prices over a fixed period [WIKI_TDSEQ].
- alphapy.transforms.tdseqsell(df, c='close', high='high', low='low')
Calculate Tom DeMark’s Sequential Sell indicator.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the columns
c
,high
, andlow
.c (str, optional) – Name of the column in the dataframe
df
representing the close prices.high (str, optional) – Name of the column in the dataframe
df
representing the high prices.low (str, optional) – Name of the column in the dataframe
df
representing the low prices.
- Returns:
tdsell – The array containing the Sequential Sell count.
- Return type:
pandas.Series
References
Tom DeMark’s Sequential indicator is used to identify a potential reversal of the current trend by comparing the closing price to previous closing prices over a fixed period [WIKI_TDSEQ].
- alphapy.transforms.texplode(df, c)
Get dummy values for a text column.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the text column in the dataframe
df
.
- Returns:
dummies – The dataframe containing the dummy variables.
- Return type:
pandas.DataFrame
Example
This function is useful for columns that appear to have separate character codes but are consolidated into a single column. Here, the column
c
is transformed into five dummy variables.c
0_a
1_x
1_b
2_x
2_z
abz
1
0
1
0
1
abz
1
0
1
0
1
axx
1
1
0
1
0
abz
1
0
1
0
1
axz
1
1
0
0
1
- alphapy.transforms.timeparts(df, c)
Extract time into its components: hour, minute, second.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.
- Returns:
time_features – The dataframe containing the time features.
- Return type:
pandas.DataFrame
- alphapy.transforms.truehigh(df)
Calculate the True High value.
- Parameters:
f (pandas.DataFrame) – Dataframe with high and low columns.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
References
Today’s high, or the previous close, whichever is higher [TS_TR].
- alphapy.transforms.truelow(df)
Calculate the True Low value.
- Parameters:
f (pandas.DataFrame) – Dataframe with high and low columns.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
References
Today’s low, or the previous close, whichever is lower [TS_TR].
- alphapy.transforms.truerange(df)
Calculate the True Range value.
- Parameters:
df (pandas.DataFrame) – Dataframe with with high and low columns.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (float)
References
True High - True Low [TS_TR].
- alphapy.transforms.ttmsqueeze(df, c='close', p=20)
Calculate the TTM Squeeze momentum oscillator.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the Exponential Moving Average.
- Returns:
ttmosc – The value of the TTM Squeeze Indicator.
- Return type:
float
- alphapy.transforms.ttmsqueezelong(df, c='close', p=20, sd=2.0, atrs=1.5)
Signal a TTM Squeeze Long Entry.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the Exponential Moving Average.
sd (float) – The number of standard deviations.
atrs (float) – The multiple of Average True Range.
- Returns:
squeezelong – True if there is a TTM Squeeze Long Entry.
- Return type:
bool
- alphapy.transforms.ttmsqueezeoff(df, c='close', p=20, sd=2.0, atrs=1.5)
Determine the TTM Squeeze Off condition.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the Exponential Moving Average.
sd (float) – The number of standard deviations.
atrs (float) – The multiple of Average True Range.
- Returns:
squeezeoff – The status of the TTM Squeeze Off Indicator.
- Return type:
bool
- alphapy.transforms.ttmsqueezeon(df, c='close', p=20, sd=2.0, atrs=1.5)
Determine the TTM Squeeze On condition.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the Exponential Moving Average.
sd (float) – The number of standard deviations.
atrs (float) – The multiple of Average True Range.
- Returns:
squeezeon – The status of the TTM Squeeze On Indicator.
- Return type:
bool
- alphapy.transforms.ttmsqueezeshort(df, c='close', p=20, sd=2.0, atrs=1.5)
Signal a TTM Squeeze Short Entry.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.p (int) – The period over which to calculate the Exponential Moving Average.
sd (float) – The number of standard deviations.
atrs (float) – The multiple of Average True Range.
- Returns:
squeezeshort – True if there is a TTM Squeeze Short Entry.
- Return type:
bool
- alphapy.transforms.vwap(df, c='close', granularity='day', anchor_dates=None)
Adjusted VWAP calculation using Unix timestamps for compatibility with np.digitize.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.granularity (str) – The calendrical period over which to calculate VWAP.
anchor_dates (list) – The set of dates over which to calculate VWAP.
- Returns:
vwap_value – The calculated Volume-Weighted Average Price (VWAP).
- Return type:
float
- alphapy.transforms.wdev(df)
Calculate weight deviation based on price movements.
Simulates portfolio weight deviation by tracking price changes from a 50/50 rebalanced portfolio assumption.
- Parameters:
df (pandas.DataFrame) – Dataframe with OHLCV data.
- Returns:
new_column – The simulated weight deviation values.
- Return type:
pandas.Series (float)
- alphapy.transforms.wdevhigh(df)
Determine if weight deviation is high (>= 0.2).
- Parameters:
df (pandas.DataFrame) – Dataframe with weight_deviation column.
- Returns:
new_column – True when absolute weight deviation >= 0.2.
- Return type:
pandas.Series (bool)
- alphapy.transforms.wdevlow(df)
Determine if weight deviation is low (<= 0.05).
- Parameters:
df (pandas.DataFrame) – Dataframe with weight_deviation column.
- Returns:
new_column – True when absolute weight deviation <= 0.05.
- Return type:
pandas.Series (bool)
- alphapy.transforms.xmadown(df, c='close', pfast=20, pslow=50)
Determine those values of the dataframe that cross below the moving average.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str, optional) – Name of the column in the dataframe
df
.pfast (int, optional) – The period of the fast moving average.
pslow (int, optional) – The period of the slow moving average.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (bool)
References
In the statistics of time series, and in particular the analysis of financial time series for stock trading purposes, a moving-average crossover occurs when, on plotting two moving averages each based on different degrees of smoothing, the traces of these moving averages cross [WIKI_XMA].
- alphapy.transforms.xmaup(df, c='close', pfast=20, pslow=50)
Determine those values of the dataframe that are below the moving average.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str, optional) – Name of the column in the dataframe
df
.pfast (int, optional) – The period of the fast moving average.
pslow (int, optional) – The period of the slow moving average.
- Returns:
new_column – The array containing the new feature.
- Return type:
pandas.Series (bool)
References
In the statistics of time series, and in particular the analysis of financial time series for stock trading purposes, a moving-average crossover occurs when, on plotting two moving averages each based on different degrees of smoothing, the traces of these moving averages cross [WIKI_XMA].
- alphapy.transforms.zscore(df, c='close', w=20)
Calculate the Z-Score.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
df
.w (int) – The rolling period.
- Returns:
zscore – The value of the Z-Score.
- Return type:
float
References
To calculate the Z-Score, you can find more information here [ZSCORE].
Example
>>> zscore(f, c, 20)
alphapy.utilities module
- alphapy.utilities.datetime_stamp()
Returns today’s datetime stamp.
- Returns:
dtstamp – The valid datetime string in YYYYmmdd_hhmmss format.
- Return type:
str
- alphapy.utilities.ensure_dir(directory_path)
- alphapy.utilities.get_web_content(url)
Use the requests package to get data over HTTP.
- Parameters:
url (str) – The URL for making the request over HTTP.
- Returns:
response – The results returned from the request.
- Return type:
str
- alphapy.utilities.most_recent_file(directory, file_spec)
Find the most recent file in a directory.
- Parameters:
directory (str) – Full directory specification.
file_spec (str) – Wildcard search string for the file to locate.
- Returns:
file_name – Name of the file to read, excluding the
extension
.- Return type:
str
- alphapy.utilities.np_store_data(data, dir_name, file_name, extension, separator)
Store NumPy data in a file.
- Parameters:
data (numpy array) – The model component to store
dir_name (str) – Full directory specification.
file_name (str) – Name of the file to read, excluding the
extension
.extension (str) – File name extension, e.g.,
csv
.separator (str) – The delimiter between fields in the file.
- Returns:
None
- Return type:
None
- alphapy.utilities.remove_list_items(elements, alist)
Remove one or more items from the given list.
- Parameters:
elements (list) – The items to remove from the list
alist
.alist (list) – Any object of any type can be a list item.
- Returns:
sublist – The subset of items after removal.
- Return type:
list
Examples
>>> test_list = ['a', 'b', 'c', test_func] >>> remove_list_items([test_func], test_list) # ['a', 'b', 'c']
- alphapy.utilities.run_command(cmd_with_args, cwd)
Run a subprocess based on the command with arguments.
- Parameters:
cmd_with_args (str) – The command to run as a subprocess.
cwd (str) – The current working directory.
- Returns:
result – The result returned from running the subprocess.
- Return type:
str
- alphapy.utilities.split_duration(duration_str)
Subtract a number of days from a given date.
- Parameters:
duration_str (str) – An alphanumeric string in the format of a Pandas offset alias.
- Returns:
value (int) – The duration value.
unit (str) – The scale of the period.
Examples
>>> split_duration('5min') # 5, 'min'
- alphapy.utilities.subtract_days(date_string, ndays)
Subtract a number of days from a given date.
- Parameters:
date_string (str) – An alphanumeric string in the format %Y-%m-%d.
ndays (int) – Number of days to subtract.
- Returns:
new_date_string – The adjusted date string in the format %Y-%m-%d.
- Return type:
str
Examples
>>> subtract_days('2017-11-10', 31) # '2017-10-10'
- alphapy.utilities.valid_date(date_string)
Determine whether or not the given string is a valid date.
- Parameters:
date_string (str) – An alphanumeric string in the format %Y-%m-%d.
- Returns:
date_string – The valid date string.
- Return type:
str
- Raises:
ValueError – Not a valid date.
Examples
>>> valid_date('2016-7-1') # datetime.datetime(2016, 7, 1, 0, 0) >>> valid_date('345') # ValueError: Not a valid date
- alphapy.utilities.valid_name(name)
Determine whether or not the given string is a valid alphanumeric string.
- Parameters:
name (str) – An alphanumeric identifier.
- Returns:
result –
True
if the name is valid, elseFalse
.- Return type:
bool
Examples
>>> valid_name('alpha') # True >>> valid_name('!alpha') # False
alphapy.variables module
- class alphapy.variables.Variable(name, expr, replace=False)
Bases:
object
Create a new variable as a key-value pair. All variables are stored in
Variable.variables
. Duplicate keys or values are not allowed, unless thereplace
parameter isTrue
.- Parameters:
name (str) – Variable key.
expr (str) – Variable value.
replace (bool, optional) – Replace the current key-value pair if it already exists.
- Variables:
variables (dict) – Class variable for storing all known variables
Examples
>>> Variable('rrunder', 'rr_3_20 <= 0.9') >>> Variable('hc', 'higher_close')
- variables = {}
- alphapy.variables.allvars(expr, match_fractal=True, match_lag=True)
Get the list of valid names in the expression.
- Parameters:
expr (str) – A valid expression conforming to the Variable Definition Language.
match_fractal (bool) – Flag to match fractal special character.
match_lag (bool) – Flag to match fractal special character.
- Returns:
vlist – List of valid variable names.
- Return type:
list
- alphapy.variables.get_daily_dollar_vol(df, p=60)
Calculate daily dollar volume.
- Parameters:
df (pandas.DataFrame) – Frame containing the close and volume values.
p (int) – The lookback period for computing daily dollar volume.
- Returns:
ds_dv – The array of dollar volumes.
- Return type:
pandas.Series (float)
- alphapy.variables.map_bar_type(df, bar_type, fractal, p=100, pv_factor=1.0)
Map time bars to a different bar type.
- Parameters:
df (pandas.DataFrame) – The dataframe to convert to a different bar type.
bar_type (Enum.BarType) – The bar type for conversion (Dollar Bar, Heikin-Ashi, et al).
fractal (str) – Pandas offset alias.
p (int) – The period over which to calculate dollar volume.
pv_factor (float) – The multiple of daily dollar volume for the dollar bar threshold.
- Returns:
df – The converted dataframe for the target bar type.
- Return type:
pandas.DataFrame
- alphapy.variables.map_dollar_bars(df, cols, fractal, p=100, pv_factor=1.0)
Map time bars to dollar bars.
- Parameters:
df (pandas.DataFrame) – The dataframe to convert to a different bar type.
cols (list) – List of column names in the price dataframe.
fractal (str) – Pandas offset alias.
p (int) – The period over which to calculate dollar volume.
pv_factor (float) – The multiple of daily dollar volume for the dollar bar threshold.
- Returns:
dollar_bars – The list of dollar bar records.
- Return type:
list
- alphapy.variables.vapply(group, market_specs, vfuncs=None)
Apply a set of variables to multiple dataframes.
- Parameters:
group (alphapy.Group) – The input group.
market_specs (dict) – The specifications for controlling the MarketFlow pipeline.
vfuncs (dict, optional) – Dictionary of external modules and functions.
- Returns:
dfs – The list of pandas dataframes to analyze.
- Return type:
list
- Other Parameters:
Frame.frames (dict) – Global dictionary of dataframes
- alphapy.variables.vexec(f, v, vfuncs=None)
Add a variable to the given dataframe.
This is the core function for adding a variable to a dataframe. The default variable functions are already defined locally in
alphapy.transforms
; however, you may want to define your own variable functions. If so, then thevfuncs
parameter will contain the list of modules and functions to be imported and applied by thevexec
function.To write your own variable function, your function must have a pandas DataFrame as an input parameter and must return a pandas DataFrame with the new variable(s).
- Parameters:
f (pandas.DataFrame) – Dataframe to contain the new variable.
v (str) – Variable to add to the dataframe.
vfuncs (dict, optional) – Dictionary of external modules and functions.
- Returns:
f – Dataframe with the new variable.
- Return type:
pandas.DataFrame
- Other Parameters:
Variable.variables (dict) – Global dictionary of variables
- alphapy.variables.vexpr(f, v)
Get the expanded expression for a variable.
- Parameters:
f (pandas.DataFrame) – Dataframe containing the variables.
v (str) – Variable to add to the dataframe.
- Returns:
expr_new – Expanded expression for evaluation.
- Return type:
str
- Other Parameters:
Variable.variables (dict) – Global dictionary of variables
- alphapy.variables.vfunc(f, v, vfuncs)
Find a function for defining a variable.
- Parameters:
f (pandas.DataFrame) – Dataframe to contain the new variable.
v (str) – Variable representing a function.
vfuncs (dict, optional) – Dictionary of external modules and functions.
- Returns:
func (function) – Function to execute for defining the variable.
newlist (list) – Function parameter list.
- Other Parameters:
Variable.variables (dict) – Global dictionary of variables
- alphapy.variables.vparse(vname)
Parse a variable name into its respective components.
- Parameters:
vname (str) – The name of the variable.
- Returns:
vxlag (str) – Original variable name without the
lag
component.root (str) – The base variable name without the parameters.
valias (str) – Expanded name with alias substitution.
plist (list) – The parameter list.
lag (int) – The offset starting with the current value [0] and counting back, e.g., an offset [1] means the previous value of the variable.
Notes
AlphaPy makes feature creation easy. The syntax of a variable name maps to a function call:
xma_20_50 => xma(20, 50)
Examples
>>> vparse('lmin_5[2]') # (0, 'lmin_5', 'lmin', 'lowest_low', ['5'], 2)
- alphapy.variables.vsub(v, expr)
Substitute the variable parameters into the expression.
This function performs the parameter substitution when applying features to a dataframe. It is a mechanism for the user to override the default values in any given expression when defining a feature, instead of having to programmatically call a function with new values.
- Parameters:
v (str) – Variable name.
expr (str) – The expression for substitution.
- Returns:
The expression with the new, substituted values.
- Return type:
newexpr
- alphapy.variables.vtree(vname)
Get all of the antecedent variables.
Before applying a variable to a dataframe, we have to recursively get all of the child variables, beginning with the starting variable’s expression. Then, we have to extract the variables from all the subsequent expressions. This process continues until all antecedent variables are obtained.
- Parameters:
vname (str) – A valid variable stored in
Variable.variables
.- Returns:
all_variables – The variables that need to be applied before
vname
.- Return type:
list
- Other Parameters:
Variable.variables (dict) – Global dictionary of variables