Advanced Python for Data Science

Instructor: Dr. Gregory Watson

Python is now being widely used in data science and scientific computing. It is easy to learn, and it has a large number of libraries available that do everything from web scraping to image manipulation to accessing databases. Two powerful libraries for manipulating data and performing numerical computations are the pandas and NumPy packages, and these provide a significant performance boost over pure Python methods. However, when the data sets become very large or very computationally intensive operations need to be performed, the limitations of Python and these libraries becomes apparent.

In this course, we will examine a range of advanced techniques for improving the performance of Python programs, including the use of parallel computation and GPU acceleration. We will also investigate how Python can be used for big data analysis using frameworks such as Apache Hadoop and Apache Spark. Students will have the opportunity to employ these techniques and gain hands-on experience developing advanced Python applications.

The course will be based on the excellent Software Carpentry curriculum and will incorporate pair-programming and live coding. The course will take a student-centered, active learning, approach to teaching this material. Class will typically consist of a short introductions to programming techniquess, followed by hands on computing exercises.