Data Science, Data Visualization

Altair – A Declarative Statistical Visualization Library for Python – Unveiled at SciPy 2016 Keynote Speech by Brain Granger.

You should check out Altair, an API designed to make data visualization much easier in Python.  Altair was introduced today during a keynote speech by Brian Granger during the opening day of SciPy 2016 (Scientific Computing with Python). Brian is the leader of the IPython project and co-founder of Project Jupyter (Jupyter notebooks are my favorite way to code in Python or R).

Matplotlib has been the cornerstone of data visualization in Python, and as Brian Granger pointed out, you can do anything you want to in matplotlib, but there is a price to pay for that, and that is time and effort.

Altair is designed as “a declarative statistical visualization library for Python”.  Here is the link to Brian Granger’s GitHub site which houses the Altair files.  Altair is designed to be a very simple API, with minimal coding required to produce really nice visualizations.  A point Brian made in his talk was that Altair is a declarative API, which specifies what should be done, but not how it should be done.  The source of the data is a pandas DataFrame, that is in a “tidy format”.  The end result is a JSON data structure that follows the Vega-Lite specifications.

Here is my understanding of this relationship from a very high level Altair to Vega-Lite to Vega to D3.  (For more information, follow this link)  D3 (Data-Driven Documents) is a web-based visualization tool, but this is a low-level system.  Vega is designed as a higher-level visualization specification language built on top of D3.  Vega-Lite is a high-level visualization grammar, and a higher level language than Vega.  It provides a concise JSON syntax, which can be compiled to Vega specifications (link).  Altair is an even higher-level, and emits JSON data structures following the Vega-Lite specifications.   The idea is that as you get higher up, the complexity and difficulty of producing a graphic goes down.

On the GitHub site there are a number of Jupyter notebook tutorials.  There is a somewhat restricted library of data visualizations available, and they currently list scatter charts, bar charts, line charts, area charts, layered charts, and grouped regression charts.

The fundamental object in Altair is the “Chart”, which takes a pandas dataframe as a single argument.  You then start specifying what you want: what kind of “mark” and visual encodings ( X,Y, Color, Opacity, Shape, Size, etc.) you want.  There are a variety of data transformations available, such as aggregation, values, count, valid, missing, distinct, sum, average, variance, stdev, median, min, max, etc.  It is also easy to export the charts and publish them on web as Vega-Lite plots.

This looks like a very exciting and much easier to use data visualization API, and I look forward to exploring it more soon.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s