Machine Learning

Installing TensorFlow GPU Tip – follow the instructions in the referenced blog!

Ok, how hard should this actually be, I mean seriously?

If you are learning how to do machine learning, then you have to have TensorFlow as one of your main tools.  TensorFlow comes in two main versions – the version that runs on the CPU’s in your computer, and the one that runs on GPU’s if your computer has “CUDA-enabled GPU cards”.   There are multiple benefits of using GPUs over CPUs – they are more specialized at performing matrix operations and mathematical transformation, and they run much, much faster.

However, the GPU version of TensorFlow is not that easy to install, in my opinion.  I was unable to get it to work at all on my laptop – a Microsoft Surface Book – which has an i7-6600U CPU  and a NVIDIA GeForce GTX 965M GPU.   I could never take advantage of the GPU, because I could not get all of the dependencies for TensorFlow GPU installed and working correctly, despite multiple hours/days working on this.  I was stuck using the slower and less efficient CPUs whenever I used TensorFlow.

I just purchased a new desktop – an iBUYPOWER – running an i7-8700 CPU and a NVIDIA GeForce  RTX 2070 GPU.  Today I tried to install the GPU version of TensorFlow with no success – until I found a blog post – and I was able to install very easily and quickly following the instructions.  If I were you, I would ignore the instructions posted on TensorFlow, and go immediately to the blog posting and follow those instructions.

The GPU version of TensorFlow markedly improved performance on my desktop.  Using the code example in the post to train LeNet-5 on the MNIST digits data using Keras, the CPU version took 55-59 seconds to complete each individual epoch, while the GPU version took just 4 seconds to complete an epoch – a 14 fold increase in speed.

Here is the post:  https://www.pugetsystems.com/labs/hpc/The-Best-Way-to-Install-TensorFlow-with-GPU-Support-on-Windows-10-Without-Installing-CUDA-1187/

Thank you Dr. Donald Kinghorn!!

TensorFlow GPU

Data Science, Deep Learning, Machine Learning, Neural Networks

Neural Networks, Deep Learning, Machine Learning resources

I have come across a few great resources that I wanted to share.  For students taking a machine learning class (like Northwestern University’s MSDS 422 Practical Machine Learning) these are great references, and a way to learn about them before, during, or after the class.  This is not a comprehensive list, just a starter.

Textbook

There is a free online textbook, Neural Networks and Deep Learning.

Videos

There is a great math visualization site called 3Blue1Brown and they have a YouTube channel.  There are 4 videos on neural networks/deep learning which are really informative and a good introduction.

  1.  But what *is* a Neural Network? Chapter 1, deep learning
  2.  Gradient Descent, how neural networks learn. Chapter 2, deep learning
  3.  What is backpropagation really doing? Chapter 3, deep learning
  4.  Backpropagation calculus. Appendix to deep learning chapter 3.

There is a great playlist on Essence of linear algebra, which is a great review and explanation of linear algebra and matrix operations.  I wish I would have seen this when I was learning it.

Scikit-Learn Tutorials

There are tutorials on the Scikit-Learn site.

TensorFlow tutorials

They provide a link to this Google “Machine Learning Crash Course” – Google’s fast-paced, practical introduction to machine learning.

The TensorFlow site has a Tutorials page.  There are tutorials for Images, Sequences, Data Representation, and a few other things.

 

Google AI

Google has it’s own education site (which also has the Machine Learning Crash Course referenced above).

 

Blog sites

Adventures in Machine Learning, Andy Thomas’s blog.

This is a must view site, and worth visiting several times over.   Andy does a great job explaining the topics and has some great visuals as well.  These are fantastic tutorials.  I have listed only a few below.

Neural Networks Tutorial – A Pathway to Deep Learning

Python TensorFlow Tutorial – Build a Neural Network

Convolutional Neural Networks Tutorial in TensorFlow

Word2Vec work embedding tutorial in Python and TensorFlow

Recurrent neural networks and LSTM tutorial in Python and TensorFlow

 

colah’s blog – Christopher Olah’s blog

Another great blog, with lots of good postings.  A few are listed below.

Deep Learning, NLP, and Representations

Neural Networks, Types and Functional Programming

 

Courses

DataCamp – one of my favorite learning sites.  It does require a subscription.

DataCamp currently has 9 Python machine learning courses, which are listed below.  They also have 9 R machine learning courses.

Machine Learning with the Experts: School Budgets

Deep Learning in Python

Building Chatbots in Python

Natural Language Processing Fundamentals in Python

Unsupervised Learning in Python

Linear Classifiers in Python

Extreme Gradient Boosting wiht XGBoost

HR Analytics in Python: Predicting Employee Churn

Supervised Learning with Scikit-Learn

 

Udemy courses

Udemy is also a favorite learning site.  You can generally get the course for about $10.

My favorite Udemy learning series is from Lazy Programmers Inc.  They have a variety of courses.  Their blog site explains what order to take the courses in.   There are many other courses from different instructors as well.

Deep Learning Prerequisites: The Numpy stack in Python

Deep Learning Prerequisites: Linear Regression in Python

Deep Learning Prerequisites: Logistic Regression in Python

Data Science: Deep Learning in Python

Modern Deep Learning in Python

Convolutional Neural Networks in Python

Recurrent Neural Networks in Python

Deep Learning with Natural Language Processing in Python

Advanced AI: Deep Reinforcement Learning in Python

Plus many other courses on Supervised and Unsupervised Learning, Bayesian ML, Ensemble ML, Cluster Analysis, and a few others.

 

If you have other favorite machine learning resources, please let me know.

 

 

Machine Learning, Northwestern University MSDS Program, Northwestern University MSPA

Northwestern University MSDS (formerly MSPA) 422 – Practical Machine Learning Course Review

This course was taught by Dr. Thomas Miller, who is the faculty director of the Data Science program (formerly known as the Predictive Analytics program – I am going to post an article discussing the program name change from the Master of Science in Predictive Analytics (MSPA) to the Master of Science in Data Science (MSDS)).  Overall, this was an excellent review of machine learning, and is a required core course for all students in the program.  It is most definitely a foundational course for any student of data science in today’s world.  It is also a foundational course for the Artificial Intelligence and Deep Learning specialization, which is currently being developed (more on this in a subsequent post as well).  The course covers the following topics:

  • Supervised, Unsupervised, and Semi-supervised learning
  • Regression versus Classification
  • Decision Trees and Random Forests
  • Dimensionality Reduction techniques
  • Clustering Techniques
  • Feature Engineering
  • Artificial Neural Networks
  • Deep Neural Networks
  • Convolutional Neural Networks (CNN)
  • Recurrent Neural Networks (RNN)

This course uses Python and the Python Libraries Scikit-Learn and TensorFlow. In addition to using Jupyter Notebooks to run my code, I also learned how to run TensorFlow from the Command Line, which is a faster way of running neural networks through a large number of epochs. The course is currently offered in R as well, but they will be discontinuing the R course, and only offering the Python/TensorFlow course starting in the fall semester.   Dr. Miller commented that they will be using Python much more extensively going forward, especially in the AI/Deep Learning specialization courses.  R apparently will still be offered in the Analytics/Modeling courses – 410 (Regression Analysis) and 411 (Generalized Linear Models).   I did learn to use Python/Scikit-Learn/TensorFlow at an intermediate level, and feel like I have a great foundation to build upon, in terms of programming.

Course Structure

There is required reading every week, mainly from the two required textbooks, although there are a few articles to read as well.  There were a total of 5 sync sessions which reviewed various topics.   I wish the sync sessions had been a little more robust, and covered the current assignments and the coding required to complete the assignments.  I found this very helpful in previous courses.  There were weekly discussion board assignments, which covered basic concepts, and turned out to be very informative, especially since a lot of the topics covered on the final exam were covered in these discussions.  There are weekly assignments which must be completed, in which you either develop the code yourself, or use a skeletal code base provided and build upon it.   These ranged from very easy to very difficult, especially as you moved into the artificial neural networks.  There was a non-proctored final exam and a proctored final exam.

Textbooks

Primary Textbooks:

Géron, A. 2017. Hands-On Machine Learning with Scikit-Learn & TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Sebastopol, Calif.: O’Reilly. [ISBN-13 978-1-491-96229-9] Source code available at https://github.com/ageron/handson-ml  This was the primary textbook for most of the course.  It is an excellent text with lots of great coding examples.

Müller, A. C. and Guido, S. 2017. Introduction to Machine Learning with Python: A Guide for Data Scientists. Sebastopol, Calif.: O’Reilly. [ISBN-13: 978-1449369415] Code examples at https://github.com/amueller/introduction_to_ml_with_python

Reference Textbook:

Izenman, A. J. 2008. Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. New York: Springer. [ISBN-13: 978-0-387-78188-4] This was used very little.

Learning Outcomes (from syllabus):

Learning Outcomes Practical Machine Learning is a survey course with a long list of learning outcomes:

  • Explain the learning algorithm trade-offs, balancing performance within training data and robustness on unobserved test data.
  • Distinguish between supervised and unsupervised learning methods.
  • Distinguish between regression and classification problems
  • Explain bootstrap and cross-validation procedures
  • Explore and visualize data and perform basic statistical analysis
  • List alternative methods for evaluating classifiers.
  • List alternative methods for evaluating regression
  • Demonstrate the application of traditional statistical methods for classification and regression
  • Demonstrate the application of trees and random forests for classification and regression
  • Demonstrate principal components for dimension reduction.
  • Demonstrate principal components regression
  • Describe hierarchical and non-hierarchical clustering techniques
  • Describe how semi-supervised learning may be utilized in addressing classification and regression problems
  • Explain how measurement and feature engineering are relevant to modeling
  • Describe how artificial neural networks are constructed from logical connections of artificial neurons and activation functions
  • Demonstrate the use of artificial neural networks (including deep neural networks) in classification and regression
  • Describe how convolutional neural networks are constructed
  • Describe how recurrent neural networks are constructed
  • Distinguish between autoencoders and other forms of unsupervised learning
  • Describe applications of autoencoders
  • Explain how the results of machine learning can be useful to business managers
  • Transform data and research results into actionable insights

 

Weekly Assignments

Here are the weekly learning titles and assignments:

Week 1.  Introduction to Machine Learning

  • Assignment 1. Exploring and Visualizing Data

Week 2.  Supervised Learning for Classification

  • Assignment 2. Evaluating Classification Models

Week 3.  Supervised Learning for Regression

  • Assignment 3. Evaluating Regression Models

Week 4. Trees and Random Forests

  • Assignment 4. Random Forests

Week 5.  Unsupervised Learning

  • Assignment 5. Principal Components Analysis

Week 6. Neural Networks

  • Assignment 6. Neural Networks

Week 7.  Deep Learning for Computer Vision

  • Assignment 7. Deep Learning

Week 8.  Deep Learning for Natural Language Procession

  • Assignment 8 Natural Language Processing

Week 9.  Neural Networks Autoencoders

  • No assignment

 

Final Examinations

There were 2 final examinations, one being non-proctored and the other proctored.  The non-proctored exam was open book, and tested your ability to look at data and the various analytical techniques, and interpret the results of the analyses.  The proctored final exam was closed book and covered general concepts.

Final Thoughts

This was a great overview of some of the more important topics in machine learning.  I was able to get a good theoretical background in these topics, and learned the coding necessary to perform these.   This is a great foundation upon which to add more advanced and in-depth use of these techniques.  This course really challenged me to rethink what analytical techniques I should be learning and applying in the future, to the point that I am going to change my specialization to Artificial Intelligence and Deep Learning.

 

Data Science, Northwestern University MSPA

Python Tops KDNuggets 2017 Data Science Software Poll

The results of KDNuggets’ 18th annual Software Poll should be fascinating reading for anyone involved in data science and analytics.  Some highlights – Python (52.6%) finally overtook R (52.1%), SQL remained at about 35%, and Spark and Tensorflow have increased to above 20%.

KDNugetts_poll

(Graph taken from http://www.kdnuggets.com/2017/05/poll-analytics-data-science-machine-learning-software-leaders.html/2)

I am about halfway through Northwestern University’s Master of Science in Predictive Analytics (MSPA) program.  I am very thankful that the program has made learning different languages a priority.  I have already learned Python, Jupyter Notebooks, R, SQL, some NoSQL (MongoDB), and SAS.  In my current class in Generalized Linear Models, I have also started to learn Angoss, SAS Enterprise Miner, and Microsoft Azure machine learning.  However, it looks like you can’t ever stop learning new things – and I am going to have to learn Spark and Tensorflow – to name a few more.

I highly recommend you read this article.

 

Data Science, Northwestern University MSPA, Uncategorized

DataCamp’s Importing Data in Python Part 1 and Part 2.

I recently finished these DataCamp  courses and really liked them.  I highly recommend them to students in general and especially to the students in Northwestern University’s Master of Science in Predictive Analytics (MSPA) program.

Importing Data in Python Part 1 is described as:

As a Data Scientist, on a daily basis you will need to clean data, wrangle and munge it, visualize it, build predictive models and interpret these models. Before doing any of these, however, you will need to know how to get data into Python. In this course, you’ll learn the many ways to import data into Python: (i) from flat files such as .txts and .csvs; (ii) from files native to other software such as Excel spreadsheets, Stata, SAS and MATLAB files; (iii) from relational databases such as SQLite & PostgreSQL.

Importing Data in Python Part 2 is described as:

As a Data Scientist, on a daily basis you will need to clean data, wrangle and munge it, visualize it, build predictive models and interpret these models. Before doing any of these, however, you will need to know how to get data into Python. In the prequel to this course, you have already learnt many ways to import data into Python: (i) from flat files such as .txts and .csvs; (ii) from files native to other software such as Excel spreadsheets, Stata, SAS and MATLAB files; (iii) from relational databases such as SQLite & PostgreSQL. In this course, you’ll extend this knowledge base by learning to import data (i) from the web and (ii) a special and essential case of this: pulling data from Application Programming Interfaces, also known as APIs, such as the Twitter streaming API, which allows us to stream real-time tweets.

 

Data Science, Northwestern University MSPA

Learning to Use Python’s SQLAlchemy in DataCamp’s “Introduction to Databases in Python” – useful for students taking Northwestern’s MSPA Predict 420.

I just completed DataCamp’s course titled “Introduction to Databases in Python“.  This is a very informative course, and is actually one of the few tutorials out there that I have run across on SQLAlchemy.

I just finished Northwestern University’s MSPA (Master of Science in Predictive Analytics) Predict 420 class – Database Systems and Data Preparation Review, and I wish I would have taken DataCamp’s course first.  It would have helped tremendously.  You have the opportunity to use SQLAlchemy to interact with SQL databases in Predict 420, but I looked and could not find a really good tuturial on this, until I ran across DataCamp’s course, after I finished Predict 420.  I highly recommend this DataCamp course to other MSPA students.

Introduction to Databases in Python is divided up into 5 sections, with the course’s description of each section attached.

  1.  Basics of Relational Database.  In this chapter, you will become acquainted with the fundamentals of Relational Databases and the Relational Model. You will learn how to connect to a database and then interact with it by writing basic SQL queries, both in raw SQL as well as with SQLAlchemy, which provides a Pythonic way of interacting with databases.
  2. Applying Filtering, Ordering, and Grouping to Queries.  In this chapter, you will build on the database knowledge you began acquiring in the previous chapter by writing more nuanced queries that allow you to filter, order, and count your data, all within the Pythonic framework provided by SQLAlchemy!
  3. Advanced SQLAlchemy Queries.  Herein, you will learn to perform advanced – and incredibly useful – queries that will enable you to interact with your data in powerful ways.
  4. Creating and Manipulating your own Databases.  In the previous chapters, you interacted with existing databases and queried them in various different ways. Now, you will learn how to build your own databases and keep them updated!
  5. Putting it all together.  Here, you will bring together all of the skills you acquired in the previous chapters to work on a real life project! From connecting to a database, to populating it, to reading and querying it, you will have a chance to apply all the key concepts you learned in this course.

 

 

 

 

Data Science, Northwestern University MSPA

Northwestern University MSPA Program – learning R and python resources page creation

For new students coming into Northwestern University’s Master of Science in Predictive Analytics (MSPA) program, there is often considerable apprehension about learning the programming languages (mainly R and Python, and some SAS).   I have created a page on my blog site – Northwestern University MSPA Program – Learning R and Python resources – that lists some of the resources that are available, and my favorites.

I would encourage students to start taking the programming courses ahead of the particular classes, and whatever language you are required to use in that class.  There is enough time between the official courses to take some of these courses.  That way you don’t have to learn the course content and the programming language at the same time (if you don’t it is still doable, just will take more effort).