Skip to content

serifsonerserbest/RecoMovie

 
 

Repository files navigation

Netflix Recommender System

This project is a part of the Machine Learning course at Swiss Federal Institute of Technology Lausanne (EPFL). Details about the course can be found here.

Team Members

CrowdAI Best Score

Name : Soner Submission ID : 23831

We managed to be in top 20 by scoring 1.018 RMSE where the best score of the competition is 1.016 RMSE. You can find the competition link here.

Project Report

You can find the detailed project report here.

Required Libraries and Setting up the Envioronment

  • We used python evironment for our project (anaconda).

pip install libraries

  • scikit-learn
  • Pandas
  • NumPy
  • Pickle
  • scipy
  • os

Install custom libraries

 git clone https://github.com/NicolasHug/surprise.git
 python setup.py install
 pip install git+https://github.com/coreylynch/pyFM
  • PySpark

    • You can find a detailed installation guide for pySpark here.
  • Data Sets:

    • Data sets are taken from here. note that you need a epfl e-mail address to reach web site.

Files

  • Data Files :
    • data_train.csv : train set
    • data_test.csv : test set provided (originally sampleSubmissionn.csv)
    • tmp_train.csv : train file obtained from train-test split of train set
    • tmp_test.csv : test file obtained from train-test split of train set
  • Python files :
    • model_surprise.py : Contains the models from Surprise library : BaseLineOnly, SlopeOne, KNN, SVD, SVD++
    • model_pyfm.py : Contains the model from PyFM library (FM refers to Factorization Machine)
    • model_pyspark.py Contains the model from PySpark library : ALS
    • model_matrixfactorization.py : Contains the models we implemented by ourselves using Exercise 10 template: SGD, ALS
    • model_means.py : Contains the models we implemented by ourselves Global Mean, User Mean, Item Mean
    • matrix_fact_helpers : Helper functions for the models we implemented from Exerice 10
    • hyperparameter_tuning.py : Contains functions for hyper parameter tuning for most of the models.
    • blend.py : Contains blending(voting) function to ensemble different models.
    • implementations.py : Contains helper functions, i.e. reading csv files, transforming data frames etc.
  • Pickle Object:
  • linreg.pkl : linear regression model for blending

About

Recommender System for Subscribers of Netflix

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%