This repository contains the notebooks of the intensive course that I taught for six weeks in 2018. The intention was to study the "idiomatic Pandas", but first we reviewed topics about Python and Numpy,and then focus on the understanding and preparation of data (following the CRISP-DM methodology).
Later, I will share more notes on more topics that I consider relevant, unfortunately we could not review them.
- Basic Programming in Python and Basic topic of Numpy: We review Python and its collections (dic, list, tuples, sets). About Numpy, we go from introduction (vectors and matrices) to specific operations (Brocasting, Matrix operation, Vectorization, etc.).
- Pandas and the environment in Jupyter Notebook: In the first part, we examined Pandas and their objects (DataFrames and Series, and their functionalities), we also saw aspects of Jupyter's functionalities, such as the magic commands and the operation of the notebook.
- DataFrames and Series / Relation between SQL and Pandas 1: We reviewed many ways to create DataFrames and Series, and aspects of the operation with them and each other. With examples, we examined the relationship between basic queries in SQL but using Pandas.
- Pipeline/Relation between SQL and Pandas 2 / GroupBy and Pivot Tables: We encouraged the use of Pipeline in Pandas step by step with examples. Later, we explored the GropuBy operation on DataFrames and made examples. Finally, we finished reviewing the basic queries in SQL but in Pandas.
- Methods in Pandas/ Merges and Joins/ Structure of the Data Analysis Project: Previously in other lessons, we used some methods in DataFrames. In this notebook we explored the most useful methods on DataFrames with the intention of working with pipeline. I briefly mentioned something about merge, join, concat and addend, but I showed many examples of how we could work with pipelines and visualizations.
- Two Data Analysis mini-projects: In this last notebook we made two mini-projects of Data Analysis. We did it using the knowledge from the previous lesson. We also reviewed the topics covered in this course and commented on the omitted topics. For the first mini-project, we estimated some models to see the possible problems in an Machine Learning project.
- Categories and strings in Pandas.
- Tidy Dataframes.
- Time Series in Pandas.
- Brief introduction in Dask.
- Unfortunately, this course was taught in Spanish, so the comments in the notebooks were done in Spanish. But I think if you like and you know Python, you won't have problems with that.
- You can run these notebooks in Colaboraty, on that platform the course was taught.