Market Prediction Tutorial
==========================

*MarketFlow Running Time: Approximately 6 minutes*

.. image:: amzn.png
   :alt: Amazon Candlestick Chart
   :width: 80%
   :align: center

Machine learning subsumes *technical analysis* because collectively,
technical analysis is just a set of features for market prediction.
We can use machine learning as a feature blender for moving averages,
indicators such as RSI and ADX, and even representations of chart
formations such as double tops and head-and-shoulder patterns.

We are not directly predicting net return in our models, although
that is the ultimate goal. By characterizing the market with models,
we can increase the Return On Investment (ROI). We have a wide range
of dependent or target variables from which to choose, not just net
return. There is more power in building a classifier rather than a
more traditional regression model, so we want to define binary
conditions such as whether or not today is going to be a trend day,
rather than a numerical prediction of today’s return.

In this tutorial, we will train a model that predicts whether or
not the next day will have a larger-than-average range. This is
important for deciding which system to deploy on the prediction
day. If our model gives us predictive power, then we can filter
out those days where trading a given system is a losing strategy.

**Step 1**: From the ``examples`` directory, change your
directory::

    cd "Trading Model"

Before running MarketFlow, let's briefly review the configuration
files in the ``config`` directory:

``market.yml``:
    The MarketFlow configuration file

``model.yml``:
    The AlphaPy configuration file

In ``market.yml``, we limit our model to six stocks in the target
group ``test``, going back 2000 trading days. You can define any
group of stock symbols in the ``groups`` section, and then set
the ``target_group`` attribute in the ``market`` section to the
name of that group.

This is a 1-day forecast, but we also use those features that can
be calculated at the market open, such as gap information in the
``leaders`` section. In the ``features`` section, we define many
variables for moving averages, historical range, RSI, volatility,
and volume.

.. literalinclude:: rrover_market.yml
   :language: yaml
   :caption: **market.yml**

In each of the tutorials, we experiment with different options in
``model.yml`` to run AlphaPy. Here, we first apply univariate feature
selection and then run a random forest classifier with Recursive
Feature Elimination, including Cross-Validation (RFECV). When you
choose RFECV, the process takes much longer, so if you want to see
more logging, then increase the ``verbosity`` level in the ``pipeline``
section.

Since stock prices are time series data, we apply the ``runs_test``
function to twelve features in the ``treatments`` section. Treatments
are powerful because you can write any function to extrapolate new
features from existing ones. AlphaPy provides some of these functions
in the ``alphapy.features`` module, but it can also import external
functions as well.

Our target variable is ``rrover``, the ratio of the 1-day range to
the 10-day average high/low range. If that ratio is greater than
or equal to 1.0, then the value of ``rrover`` is True. This is
what we are trying to predict.

.. literalinclude:: rrover_model.yml
   :language: yaml
   :caption: **model.yml**

**Step 2**: Now, let's run MarketFlow::

    mflow --pdate 2017-10-01

As ``mflow`` runs, you will see the progress of the workflow,
and the logging output is saved in ``market_flow.log``. When the
workflow completes, your project structure will look like this,
with a different datestamp::

    Trading Model
    ├── market_flow.log
    ├── config
        ├── algos.yml
        ├── market.yml
        ├── model.yml
    └── data
    └── input
        ├── test_20170420.csv
        ├── test.csv
        ├── train_20170420.csv
        ├── train.csv
    └── model
        ├── feature_map_20170420.pkl
        ├── model_20170420.pkl
    └── output
        ├── predictions_20170420.csv
        ├── probabilities_20170420.csv
        ├── rankings_20170420.csv
    └── plots
        ├── calibration_test.png
        ├── calibration_train.png
        ├── confusion_test_RF.png
        ├── confusion_train_RF.png
        ├── feature_importance_train_RF.png
        ├── learning_curve_train_RF.png
        ├── roc_curve_test.png
        ├── roc_curve_train.png

Let's look at the results in the ``plots`` directory. Since our
scoring function was ``roc_auc``, we examine the ROC Curve first.
The AUC is approximately 0.61, which is not very high but in the
context of the stock market, we may still be able to derive
some predictive power. Further, we are running the model on a
relatively small sample of stocks, as denoted by the jittery
line of the ROC Curve.

.. image:: rrover_roc_curve.png
   :alt: ROC Curve
   :width: 100%
   :align: center

We can benefit from more samples, as the learning curve shows
that the training and cross-validation lines have yet to converge.

.. image:: rrover_learning_curve.png
   :alt: ROC Curve
   :width: 100%
   :align: center

The good news is that even with a relatively small number of
testing points, the Reliability Curve slopes upward from left
to right, with the dotted line denoting a perfect classifier.

.. image:: rrover_calibration.png
   :alt: ROC Curve
   :width: 100%
   :align: center

To get better accuracy, we can raise our threshold to find the
best candidates, since they are ranked by probability, but this
also means limiting our pool of stocks. Let's take a closer
look at the rankings file.

**Step 3**: From the command line, enter::

    jupyter notebook

**Step 4**: Click on the notebook named::

    A Trading Model.ipynb

**Step 5**: Run the commands in the notebook, making sure that
when you read in the rankings file, change the date to match
the result from the ``ls`` command.

``Conclusion`` We can predict large-range days with some confidence,
but only at a higher probability threshold. This is important for
choosing the correct system on any given day. We can achieve
better results with more data, so we recommend expanding the
stock universe, e.g., a group with at least 100 members going
five years back.