Introduction to Data Visualization in Python (2024)

Introduction to Data Visualization in Python (3)

Data visualization is the discipline of trying to understand data by placing it in a visual context so that patterns, trends and correlations that might not otherwise be detected can be exposed.

Python offers multiple great graphing libraries that come packed with lots of different features. No matter if you want to create interactive, live or highly customized plots python has an excellent library for you.

To get a little overview here are a few popular plotting libraries:

In this article, we will learn how to create basic plots using Matplotlib, Pandas visualization and Seaborn as well as how to use some specific features of each library. This article will focus on the syntax and not on interpreting the graphs, which I will cover in another blog post.

In further articles, I will go over interactive plotting tools like Plotly, which is built on D3 and can also be used with JavaScript.

Matplotlib is the most popular python plotting library. It is a low-level library with a Matlab like interface which offers lots of freedom at the cost of having to write more code.

To install Matplotlib pip and conda can be used.

pip install matplotlib
or
conda install matplotlib

Matplotlib is specifically good for creating basic graphs like line charts, bar charts, histograms and many more. It can be imported by typing:

import matplotlib.pyplot as plt

Scatter Plot

To create a scatter plot in Matplotlib we can use the scatter method. We will also create a figure and an axis using plt.subplots so we can give our plot a title and labels.

Introduction to Data Visualization in Python (4)

We can give the graph more meaning by coloring in each data-point by its class. This can be done by creating a dictionary which maps from class to color and then scattering each point on its own using a for-loop and passing the respective color.

Introduction to Data Visualization in Python (5)

Line Chart

In Matplotlib we can create a line chart by calling the plot method. We can also plot multiple columns in one graph, by looping through the columns we want and plotting each column on the same axis.

Histogram

In Matplotlib we can create a Histogram using the hist method. If we pass it categorical data like the points column from the wine-review dataset it will automatically calculate how often each class occurs.

Introduction to Data Visualization in Python (7)

Bar Chart

A bar chart can be created using the bar method. The bar-chart isn’t automatically calculating the frequency of a category so we are going to use pandas value_counts function to do this. The bar-chart is useful for categorical data that doesn’t have a lot of different categories (less than 30) because else it can get quite messy.

Introduction to Data Visualization in Python (8)

Pandas is an open source high-performance, easy-to-use library providing data structures, such as dataframes, and data analysis tools like the visualization tools we will use in this article.

Pandas Visualization makes it really easy to create plots out of a pandas dataframe and series. It also has a higher level API than Matplotlib and therefore we need less code for the same results.

Pandas can be installed using either pip or conda.

pip install pandas
or
conda install pandas

Scatter Plot

To create a scatter plot in Pandas we can call <dataset>.plot.scatter() and pass it two arguments, the name of the x-column as well as the name of the y-column. Optionally we can also pass it a title.

Introduction to Data Visualization in Python (9)

As you can see in the image it is automatically setting the x and y label to the column names.

Line Chart

To create a line-chart in Pandas we can call <dataframe>.plot.line(). Whilst in Matplotlib we needed to loop-through each column we wanted to plot, in Pandas we don’t need to do this because it automatically plots all available numeric columns (at least if we don’t specify a specific column/s).

Introduction to Data Visualization in Python (10)

If we have more than one feature Pandas automatically creates a legend for us, as can be seen in the image above.

Histogram

In Pandas, we can create a Histogram with the plot.hist method. There aren’t any required arguments but we can optionally pass some like the bin size.

Introduction to Data Visualization in Python (11)

It’s also really easy to create multiple histograms.

Introduction to Data Visualization in Python (12)

The subplots argument specifies that we want a separate plot for each feature and the layout specifies the number of plots per row and column.

Bar Chart

To plot a bar-chart we can use the plot.bar() method, but before we can call this we need to get our data. For this we will first count the occurrences using the value_count() method and then sort the occurrences from smallest to largest using the sort_index() method.

Introduction to Data Visualization in Python (13)

It’s also really simple to make a horizontal bar-chart using the plot.barh() method.

Introduction to Data Visualization in Python (14)

We can also plot other data then the number of occurrences.

Introduction to Data Visualization in Python (15)

In the example above we grouped the data by country and then took the mean of the wine prices, ordered it, and plotted the 5 countries with the highest average wine price.

Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for creating attractive graphs.

Seaborn has a lot to offer. You can create graphs in one line that would take you multiple tens of lines in Matplotlib. Its standard designs are awesome and it also has a nice interface for working with pandas dataframes.

It can be imported by typing:

import seaborn as sns

Scatter plot

We can use the .scatterplot method for creating a scatterplot, and just as in Pandas we need to pass it the column names of the x and y data, but now we also need to pass the data as an additional argument because we aren’t calling the function on the data directly as we did in Pandas.

Introduction to Data Visualization in Python (16)

We can also highlight the points by class using the hue argument, which is a lot easier than in Matplotlib.

Introduction to Data Visualization in Python (17)

Line chart

To create a line-chart the sns.lineplot method can be used. The only required argument is the data, which in our case are the four numeric columns from the Iris dataset. We could also use the sns.kdeplot method which rounds of the edges of the curves and therefore is cleaner if you have a lot of outliers in your dataset.

Introduction to Data Visualization in Python (18)

Histogram

To create a histogram in Seaborn we use the sns.distplot method. We need to pass it the column we want to plot and it will calculate the occurrences itself. We can also pass it the number of bins, and if we want to plot a gaussian kernel density estimate inside the graph.

Introduction to Data Visualization in Python (19)
Introduction to Data Visualization in Python (20)

Bar chart

In Seaborn a bar-chart can be created using the sns.countplot method and passing it the data.

Introduction to Data Visualization in Python (21)

Now that you have a basic understanding of the Matplotlib, Pandas Visualization and Seaborn syntax I want to show you a few other graph types that are useful for extracting insides.

For most of them, Seaborn is the go-to library because of its high-level interface that allows for the creation of beautiful graphs in just a few lines of code.

Box plots

A Box Plot is a graphical method of displaying the five-number summary. We can create box plots using seaborns sns.boxplot method and passing it the data as well as the x and y column name.

Introduction to Data Visualization in Python (22)

Box Plots, just like bar-charts are great for data with only a few categories but can get messy really quickly.

Heatmap

A Heatmap is a graphical representation of data where the individual values contained in a matrix are represented as colors. Heatmaps are perfect for exploring the correlation of features in a dataset.

To get the correlation of the features inside a dataset we can call <dataset>.corr(), which is a Pandas dataframe method. This will give us the correlation matrix.

We can now use either Matplotlib or Seaborn to create the heatmap.

Matplotlib:

Introduction to Data Visualization in Python (23)

To add annotations to the heatmap we need to add two for loops:

Introduction to Data Visualization in Python (24)

Seaborn makes it way easier to create a heatmap and add annotations:

Introduction to Data Visualization in Python (25)

Faceting

Faceting is the act of breaking data variables up across multiple subplots and combining those subplots into a single figure.

Faceting is really helpful if you want to quickly explore your dataset.

To use one kind of faceting in Seaborn we can use the FacetGrid. First of all, we need to define the FacetGrid and pass it our data as well as a row or column, which will be used to split the data. Then we need to call the map function on our FacetGrid object and define the plot type we want to use, as well as the column we want to graph.

Introduction to Data Visualization in Python (26)

You can make plots a lot bigger and more complicated than the example above. You can find a few examples here.

Pairplot

Lastly, I will show you Seaborns pairplot and Pandas scatter_matrix , which enable you to plot a grid of pairwise relationships in a dataset.

Introduction to Data Visualization in Python (27)
Introduction to Data Visualization in Python (28)

As you can see in the images above these techniques are always plotting two features with each other. The diagonal of the graph is filled with histograms and the other plots are scatter plots.

Data visualization is the discipline of trying to understand data by placing it in a visual context so that patterns, trends and correlations that might not otherwise be detected can be exposed.

Python offers multiple great graphing libraries that come packed with lots of different features. In this article, we looked at Matplotlib, Pandas visualization and Seaborn.

If you liked this article consider subscribing on my Youtube Channel and following me on social media.

The code covered in this article is available as a Github Repository.

If you have any questions, recommendations or critiques, I can be reached via Twitter or the comment section.

Introduction to Data Visualization in Python (2024)

FAQs

Is Python good for data visualization? ›

While Python isn't considered to be the best option for data visualization, we recommend it because of the scalability and flexibility on offer. The open-source nature of the programming language allows developers to work on it and bring data to life through visualizations.

Is data visualization hard to learn? ›

The ability to create stunning data visualizations requires time and training. Data visualization is a field that requires proficiency with various tools and applications like Excel and Tableau, each of which takes the average person weeks or months to learn.

What is the easiest data visualization Python? ›

  • Matplotlib. Matplotlib is one of the best Python visualization library for generating powerful yet simple visualization. ...
  • Plotly. The most popular data visualization library in Python is Plotly, which delivers an interactive plot and is easily readable to beginners. ...
  • Seaborn. ...
  • GGplot. ...
  • Altair. ...
  • Bokeh. ...
  • Pygal. ...
  • Geoplotlib.
Apr 24, 2024

Is data visualization easy? ›

However, it's not simply as easy as just dressing up a graph to make it look better or slapping on the “info” part of an infographic. Effective data visualization is a delicate balancing act between form and function.

Is Python enough for data analytics? ›

Despite the vast range of programming languages, most data analysts choose to work with Python. While some data analysts use other programming languages like Javascript, Scala, and MATLAB; Python remains the popular choice due to its flexibility, scalability, and impressive range of libraries.

What is the salary of Python data visualization? ›

$100,500 is the 25th percentile. Salaries below this are outliers. $138,500 is the 75th percentile.

Does data visualization pay well? ›

How much does a Data Visualization make? As of May 19, 2024, the average annual pay for a Data Visualization in the United States is $109,451 a year. Just in case you need a simple salary calculator, that works out to be approximately $52.62 an hour. This is the equivalent of $2,104/week or $9,120/month.

Does data visualization require math? ›

To sum it all up — the core concepts associated with Algebra and Statistics are going to be the majority of math you'll need to know in a data profession. Realizing that both simple algebra and descriptive statistics are the main types of math you'll be doing in a visualization tool like Tableau.

Is data visualization a soft or hard skill? ›

Hard skills for a data analyst

They are responsible for preprocessing data into machine-readable format, performing statistical and predictive analysis, and preparing visualization and reports to communicate their findings. All of these tasks require a well-developed set of technical or hard skills.

Why is Python better than Excel for data visualization? ›

Python code is reproducible and compatible, which makes it suitable for further manipulation by other contributors who are running independent projects. Unlike the VBA language used in Excel, data analysis using Python is cleaner and provides better version control.

What is the best tool for data visualization in Python? ›

Matplotlib is the backbone of Data Visualization Python that provides an open-source platform for representing intricate patterns in meaningful ways. Matplotlib offers a wide range of plot options, modification features, and various functions for users to produce all sorts of visualizations.

What is the easiest project in Python? ›

Top 14 Mini Python Projects
  • Mad Libs Generator.
  • Password Generator.
  • Basic Text Editor.
  • Mini Weather App.
  • Basic Paint Application.
  • Basic Chat Application.
  • Importance of Python in Data Science.
  • Conclusion.

How do I become good at data visualization? ›

Nine Considerations for Your Next Data Visualization
  1. Establish the goal of your visualization. ...
  2. Clean up and understand your dataset. ...
  3. Know your audience. ...
  4. Choose a type of chart. ...
  5. Don't try to pack too much into one chart. ...
  6. Map the data to visual variables. ...
  7. Text is “totally underrated.” Use It.

Why is data visualization hard? ›

Some consider the following to be the most challenging aspects of learning how to visualize data: Deciding what data to include in a visualization. The process of sorting and ultimately deciding what to include in a data visualization can be complicated for some people.

How do I start learning data visualization? ›

1 Find out what you like about data visualizations
  1. 1 Find out what you like about data visualizations.
  2. 2 Don't worry too much about data or tools.
  3. 3 Start to visualize data.
  4. 4 Get better at visualizing data.
Nov 15, 2023

Is Python better than Tableau? ›

⚙️ Limited data manipulation: While Tableau offers basic data cleaning and transformation capabilities, it is not as comprehensive as Python when it comes to advanced data manipulation and wrangling tasks.

Which programming language is best for data visualization? ›

JavaScript is well-suited for data visualizations because of its ability to specify page behavior. D3. js, a JavaScript library, is one of the most versatile visualization libraries and can be used to create stunning, interactive visualizations.

Which data visualization tool is best for Python? ›

Below is the list of best Python libraries that are highly being used for data visualization:
  • Seaborn.
  • GGplot.
  • Altair.
  • Bokeh.
  • Pygal.
  • Geoplotlib.
  • SciPy.
  • Pandas.
Mar 8, 2024

References

Top Articles
Latest Posts
Article information

Author: Nathanial Hackett

Last Updated:

Views: 5537

Rating: 4.1 / 5 (52 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Nathanial Hackett

Birthday: 1997-10-09

Address: Apt. 935 264 Abshire Canyon, South Nerissachester, NM 01800

Phone: +9752624861224

Job: Forward Technology Assistant

Hobby: Listening to music, Shopping, Vacation, Baton twirling, Flower arranging, Blacksmithing, Do it yourself

Introduction: My name is Nathanial Hackett, I am a lovely, curious, smiling, lively, thoughtful, courageous, lively person who loves writing and wants to share my knowledge and understanding with you.