Home/Blog/Python Data Visualization with Matplotlib: Complete Tutorial
AnalyticsPython

Python Data Visualization with Matplotlib: Complete Tutorial

Master the essential chart types in Python: bar charts, line charts, scatter plots, and pie charts with step-by-step examples and code samples.

Python Data Visualization with Matplotlib: Complete Tutorial

Prerequisites and Setup

Before diving into chart creation, ensure you have the required Python libraries installed. These two packages provide all the functionality needed for data visualization and numerical operations.

pip3 install matplotlib
pip3 install numpy

Creating Bar Charts for Data Comparison

Bar charts excel at comparing discrete categories or values. They’re perfect for visualizing survey results, sales figures, or any data where you want to compare quantities across different groups. In this example, we’ll create a simple comparison of weekly walking distances.

import matplotlib.pyplot as plt

# Create a list showing how many miles each person walked
values = [1, 2, 3]

# Create a list of names
names = ["Matt", "Sally", "John"]

# Declare bar chart
plt.bar(values, values)

# Associate the names with the values
plt.xticks(values, names)

# Show the bar chart
plt.show()

The code above creates a simple bar chart comparing walking distances. The plt.bar() function creates the bars, while plt.xticks() labels each bar with the corresponding person’s name.

Bar chart comparing data values for Matt, Sally, and John with John having the highest value, visualizing statistical differences.

Line Charts for Trend Analysis

Line charts are ideal for displaying data changes over time or showing relationships between continuous variables. They help identify trends, patterns, and correlations in your data. The connected points create a visual flow that makes trends immediately apparent.

import matplotlib.pyplot as plt

# Declare line chart and pass in Y values for line
plt.plot([1, 23, 2, 4])

# Declare the X values for the chart and assign labels to each point
plt.xticks([0, 1, 2, 3], ["one", "two", "three", "four"])

# Assign a label to show on the left side of the chart
plt.ylabel('some numbers')

# Draw the chart
plt.show()

This example demonstrates how plt.plot() creates a line connecting data points. The plt.ylabel() function adds a descriptive label to the y-axis, making the chart more informative.

Line graph showing data trend over categories one to four, with a peak at two and a decline at three, illustrating variable changes.

Scatter Plots for Relationship Discovery

Scatter plots are powerful tools for exploring relationships between two variables. They help identify correlations, clusters, outliers, and patterns that might not be obvious in raw data. Each point represents a pair of values, making it easy to spot trends.

Individual Point Method

import matplotlib.pyplot as plt

# Draw individual points on the chart
plt.scatter(1, 2)
plt.scatter(2, 3)
plt.scatter(3, 5)
plt.scatter(4, 3)

# Show the scatterplot
plt.show()

Scatter plot displaying four colored data points, illustrating variable distribution across a coordinate grid.

For efficiency and cleaner code, it’s better to pass arrays of coordinates to the scatter plot function:

import matplotlib.pyplot as plt

# Declare arrays showing the X and Y coordinates
x = [1, 2, 3, 4, 5, 6, 7]
y = [1, 3, 3, 2, 5, 7, 9]

# Pass all the points into the scatter plot
plt.scatter(x, y)

# Show the scatterplot on the screen
plt.show()

Scatter plot depicting an upward trend with data points from one to seven on the x-axis, showing increasing values on the y-axis.

Adding Trend Lines to Scatter Plots

When scatter plot points show a pattern, adding a trend line helps visualize the relationship more clearly. NumPy provides powerful functions for calculating linear regression and creating best-fit lines through your data points.

💡 Pro Tip: NumPy’s polyfit() function calculates the best-fit line coefficients, while poly1d() creates a polynomial function for generating trend line coordinates.

import matplotlib.pyplot as plt
import numpy as np

# Declare arrays showing the X and Y coordinates
x = [1, 2, 3, 4, 5, 6, 7]
y = [1, 3, 3, 2, 5, 7, 9]

# Create scatter plot
plt.scatter(x, y)

# Calculate trend line coefficients
m, b = np.polyfit(x, y, 1)

# Create trend line
plt.plot(x, [m*i + b for i in x], color='red', linestyle='--', linewidth=2)

# Add labels
plt.xlabel('X Values')
plt.ylabel('Y Values')
plt.title('Scatter Plot with Trend Line')

# Show the plot
plt.show()

# Print the correlation strength
print(f"Slope: {m:.2f}, Intercept: {b:.2f}")

This enhanced version adds a red dashed trend line that clearly shows the data’s upward trend. The slope and intercept values help quantify the relationship strength.

Scatter plot with a blue regression line showing an upward trend, data points labeled from one to seven, indicating correlation strength.

Pie Charts for Proportional Data

Pie charts effectively show how individual parts contribute to a whole. They’re perfect for displaying percentages, market share, budget allocation, or any data where the total equals 100%. Each slice represents a proportion of the complete dataset.

Basic Pie Chart

import matplotlib.pyplot as plt

# Create list of values
values = [1, 2, 3]

# Create a list of names
names = ["Matt", "Sally", "John"]

# Declare pie chart
plt.pie(values, labels=names)

# Show pie chart
plt.show()

The basic pie chart automatically calculates proportions and assigns different colors to each slice. Labels are positioned around the chart for easy identification.

Pie chart illustrating data distribution among Sally, Matt, and John, with sections in green, blue, and red, respectively.

Exploded Pie Chart for Emphasis

To highlight specific data segments, you can “explode” slices by pulling them away from the center. This technique draws attention to important categories or outliers in your data.

# Exploded pie chart
import matplotlib.pyplot as plt

# Create list of values
values = [1, 2, 3]

# Create a list of names
names = ["Matt", "Sally", "John"]

# Define explosion distances (0 = no explosion, 0.1 = slight separation)
explode = (0, 0.1, 0)

# Create exploded pie chart
plt.pie(values, explode=explode, labels=names, autopct='%1.1f%%')

# Show pie chart
plt.show()

The exploded version separates Sally’s slice from the main chart, creating visual emphasis. The autopct parameter adds percentage labels to each slice.

Exploded pie chart showing data distribution among Sally, Matt, and John with segments in green, blue, and red on a gray background.

Key Takeaways and Next Steps

You now have the fundamental skills to create four essential chart types using Python and Matplotlib:

  • Bar charts for comparing discrete categories and values
  • Line charts for showing trends and changes over time
  • Scatter plots for exploring relationships between variables
  • Pie charts for displaying proportional data and percentages

These visualization techniques form the foundation for more advanced data analysis and presentation. As you continue developing your Python skills, consider exploring Python functions and project management with requirements.txt to build more sophisticated data visualization applications.

Frequently Asked Questions

Find answers to common questions

Matplotlib for: full control (customize every detail), static images (save to file), basic plots (line, bar, scatter). Seaborn for: statistical plots (distributions, correlations, regression), beautiful defaults (publication-ready without tweaking), built on Matplotlib (can drop down to Matplotlib for customization). Plotly for: interactive plots (hover, zoom, pan), web dashboards, 3D visualizations. Use Matplotlib when: exporting images for reports/papers, need pixel-perfect control, making basic plots. Use Seaborn when: exploring data (quick statistical visualizations), want plots to look good without effort. Use Plotly when: building dashboard, need user interaction, sharing plots online. Start with Seaborn for quick exploration (best defaults), use Matplotlib for final publication plots (full control), add Plotly only if you need interactivity. Don't use all three in same project—pick one, master it.

Five improvements: 1) Use style sheet (plt.style.use('seaborn-v0_8') or 'ggplot'—better defaults than base Matplotlib), 2) Increase figure size (plt.figure(figsize=(10,6))—default is tiny), 3) Add labels (plt.xlabel(), plt.ylabel(), plt.title()—explain what plot shows), 4) Remove chart junk (plt.grid(alpha=0.3) for subtle grid, ax.spines['top'].set_visible(False) to remove unnecessary borders), 5) Use color wisely (stick to 3-4 colors max, use colorblind-friendly palettes). These five changes take 5 minutes, make massive difference in readability. Don't: use default settings (plots look amateur), use rainbow colors (hard to distinguish, not colorblind-friendly), make charts too small (text becomes unreadable). Do: increase size, reduce clutter, add clear labels. Seaborn gives better defaults automatically if you don't want to customize Matplotlib.

Pandas built-in plotting (wraps Matplotlib): df.plot(kind='line') for line chart, df.plot(kind='bar') for bar chart, df['column'].hist() for histogram. One line creates plot. Example: df.groupby('category')['sales'].sum().plot(kind='bar') creates bar chart of sales by category. Limitations: less control than direct Matplotlib (hard to customize), fewer plot types than Matplotlib/Seaborn. Use Pandas plotting for: quick data exploration (is there a trend? any outliers?), throwaway plots (not for presentation, just for understanding data). Use Matplotlib directly when: need customization, building presentation plots, complex multi-panel figures. Workflow: explore with Pandas plotting (5 seconds to create plot), polish with Matplotlib (5 minutes to customize for presentation). Don't spend time making Pandas plots beautiful—use for quick exploration, then recreate in Matplotlib for final version.

Use plt.savefig(): plt.savefig('plot.png', dpi=300, bbox_inches='tight') saves high-resolution PNG. Parameters: dpi=300 (print quality—default 100 is too low for papers), bbox_inches='tight' (remove white space around plot), format (PNG for reports, PDF for papers, SVG for web). Common mistakes: saving after plt.show() (closes figure, nothing to save—save before show), low DPI (blurry when printed—use 300), forgetting transparent background (transparent=True useful for slides). Save multiple formats: plt.savefig('plot.png', dpi=300); plt.savefig('plot.pdf') creates both PNG and PDF. File size: PNG with dpi=300 is 1-5MB typically (fine for most uses), reduce DPI if file size is concern, use vector format (PDF, SVG) for scalability without size penalty.

Time series: line plot (shows trend over time). Comparison between categories: bar chart (categorical data). Distribution: histogram (single variable) or box plot (compare distributions across groups). Relationship between variables: scatter plot (correlation). Proportions: pie chart (avoid—hard to compare) or stacked bar chart (better). Heat map: matrix of values (correlation matrix, confusion matrix). Don't use: pie chart for more than 3-4 categories (use bar chart instead), 3D plots (hard to read, use 2D with color instead), line plot for categorical data (use bar chart). Choose based on: what question are you answering? Trend over time → line. Comparison → bar. Distribution → histogram. Relationship → scatter. Most common mistake: using wrong plot type makes data harder to understand—line chart for categories looks weird, bar chart for time series loses temporal information.

Automate Your IT Operations

Leverage automation to improve efficiency, reduce errors, and free up your team for strategic work.