Kaggle Tutorial =============== *AlphaPy Running Time: Approximately 2 minutes* .. image:: titanic.jpg :alt: RMS Titanic :width: 50% :align: center The most popular introductory project on Kaggle is Titanic_, in which you apply machine learning to predict which passengers were most likely to survive the sinking of the famous ship. In this tutorial, we will run AlphaPy to train a model, generate predictions, and create a submission file so you can see where you land on the Kaggle leaderboard. .. _Titanic: https://www.kaggle.com/c/titanic .. note:: AlphaPy is a good starter for most Kaggle competitions. We also use it for other competitions such as the crowd-sourced hedge fund Numerai_. .. _Numerai: https://numer.ai/leaderboard **Step 1**: From the ``examples`` directory, change your directory:: cd Kaggle Before running AlphaPy, let's briefly review the ``model.yml`` file in the ``config`` directory. We will submit the actual predictions (1 vs. 0) instead of the probabilities, so ``submit_probas`` is set to ``False``. All features will be included except for the ``PassengerId``. The target variable is ``Survived``, the label we are trying to accurately predict. We'll compare random forests and XGBoost, run recursive feature elimination and a grid search, and select the best model. Note that a blended model of all the algorithms is a candidate for best model. The details of each algorithm are located in the ``algos.yml`` file. .. literalinclude:: titanic.yml :language: yaml :caption: **model.yml** **Step 2**: Now, we are ready to run AlphaPy. Enter the following command:: alphapy As ``alphapy`` runs, you will see the progress of the workflow, and the logging output is saved in ``alphapy.log``. When the workflow completes, your project structure will look like this, with a different datestamp:: Kaggle ├── alphapy.log ├── config ├── algos.yml ├── model.yml └── data └── input ├── test.csv ├── train.csv └── model ├── feature_map_20170420.pkl ├── model_20170420.pkl └── output ├── predictions_20170420.csv ├── probabilities_20170420.csv ├── rankings_20170420.csv ├── submission_20170420.csv └── plots ├── calibration_train.png ├── confusion_train_RF.png ├── confusion_train_XGB.png ├── feature_importance_train_RF.png ├── feature_importance_train_XGB.png ├── learning_curve_train_RF.png ├── learning_curve_train_XGB.png ├── roc_curve_train.png **Step 3**: To see how your model ranks on the Kaggle leaderboard, upload the submission file from the ``output`` directory to the Web site https://www.kaggle.com/c/titanic/submit. .. image:: kaggle.png :alt: Kaggle Submission :width: 100% :align: center