Data Science, Machine Learning

The world of machine learning algorithms – a summary infographic.

This is a very nice infographic that shows the basic types of machine learning algorithm categories.   It is somewhat informative to follow the path of how the algorithm got posted on twitter, where I saw it.  It was somewhat misleading (although not intentional I believe) about who actually created this infographic.  To me this highlights the importance of making sure we are crediting our information sources correctly.  This topic was also broached in this FiveThirtyEight article “Who Will Debunk The Debunkers” by Daniel Engber.  The article discusses many myths, one of them being a myth of how spinach was credited with having too much iron content.  It mentions that an unscholarly and unsourced article became “the ultimate authority for all the citations that followed”.  I have run across this as well, when I was trying to find the source of quotation about what a “Learning Health System” was defined as.  This definition was cited by at least twenty scholarly articles, but there was not reference for the citation, only circular references to the other articles that used this definition.  This highlights the importance of making sure we correctly cite the source of information, so it can be critically analyzed by other people interested in using the data.

I noticed this infographic after it had been tweeted by Evan Sinar (@EvanSinar).  The tweet cited an article in @DataScienceCentral.  That article “12 Algorithms Every Data Scientist Should Know” by Emmanuelle Rieuf, mentions an article posted by Mark van Rijmenan, with the same title – 12 Algorithms Every Data Scientist Should Know“, and then shows the infographic, giving the impression that this was the source of the algorithm.  That article mentions that the “guys from Think Big Data developed the infographic” and provided a link.  That links to the article “Which are the best known machine learning algorithms? Infographic” by Anubhav Srivastava.  It “mentioned over a dozen algorithms, segregated by their application intent, that should be in the repertoire of every data scientist”.  The bottom line, try to be careful with your source citations so it is not hard for people to follow the source backwards in time.  I was able to do this in this case, it just took a little while.  But there are many times where it is impossible to do this.

Now, for the infographic.