Pandas is one of the most powerful and popular Python libraries for working with data. It’s widely used by data analysts, scientists, and engineers to clean, explore, and manipulate structured data.

With Pandas, you can easily load data from files or the web, filter specific values, reshape datasets, summarize statistics, and even generate visualizations. Whether you’re working with spreadsheets, databases, or time series data, Pandas gives you the tools to make your work easier and faster.

In this tutorial, we’ll start by installing Pandas and then walk through beginner-friendly examples of common use cases—loading data, filtering, plotting, pivoting, and summarizing. No previous experience with Pandas is required.

What is Pandas?

Pandas is a powerful Python library used for data analysis and manipulation. It provides easy-to-use tools to work with structured data, such as tables or spreadsheets, directly within your Python code.

At the core of Pandas are two primary data structures:

Series – A one-dimensional labeled array, similar to a column in a spreadsheet or a single list of values.
DataFrame – A two-dimensional labeled data structure, like an Excel sheet or a SQL table, where each column can be a different data type (numbers, text, dates, etc.).

Example:

import pandas as pd

# A Series
series_example = pd.Series([10, 20, 30])
print(series_example)

# A DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
}
df = pd.DataFrame(data)
print(df)

With just a few lines of code, you’ve created structures that are ready for filtering, math, grouping, visualization, and more.

Pandas Works Well With:

Tabular data (like CSVs, Excel files, and SQL tables)
Time series data (stock prices, web traffic logs)
Matrix-style data (with labeled rows and columns)
Statistical or observational datasets (used in research or data science)

Pandas is built on top of NumPy, which means it’s both efficient and integrates easily with scientific libraries like matplotlib and scikit-learn. Even if you’re just getting started with Python, Pandas gives you the power to analyze real-world data right away.

What is Numpy?

Before diving too deep into Pandas, it helps to understand NumPy, short for Numerical Python. Pandas is actually built on top of NumPy, and many of its core features depend on NumPy under the hood.

NumPy makes working with numbers and arrays in Python much faster and more efficient. It introduces a new kind of data structure called a NumPy array, which is like a supercharged version of a Python list.

Why is NumPy useful?

Here are a few things NumPy does really well:

Efficient storage and manipulation of large numerical datasets
Performing math on entire arrays at once (vectorization)
Working with multi-dimensional data (like matrices or grids)
Advanced indexing and filtering

Example:

import numpy as np

# A simple NumPy array
arr = np.array([1, 2, 3, 4, 5])
print(arr * 2)  # Output: [ 2  4  6  8 10]

With NumPy, you don’t need to write loops to multiply each number—just write arr * 2, and NumPy handles the rest. This kind of operation is not only simpler, it’s much faster, especially with large datasets.

While Pandas handles more complex, labeled data (like spreadsheets), it uses NumPy under the hood to do the heavy lifting. If you’re using Pandas, you’re already benefiting from NumPy—no need to master it first, but it’s good to know it’s there.

Installing Pandas

To get started with Pandas, you need to install it on your computer. The easiest way is to use Python’s built-in package manager called pip.

Install with pip

If you’re using Python 3, run this command in your terminal or command prompt:

pip3 install pandas

If you’re using Python 2 (which is not recommended, since it’s no longer supported), you would run:

pip install pandas

✅ Tip: If pip isn’t installed on your machine, check out our Python Basics guide for how to set it up.

Optional: Use Anaconda

Another popular option is to install Anaconda, a distribution that includes Pandas along with other useful tools for data science like NumPy, Jupyter Notebooks, and matplotlib.

If you’re working with large datasets or want an all-in-one setup, Anaconda is a great choice.

You can download it here: https://www.anaconda.com/products/distribution

Once Pandas is installed, you’re ready to start working with real data! Let’s jump into how to load and explore data using Pandas next. Ready?

Using Pandas

To help you learn Pandas through hands-on experience, we’ll walk through a real-world dataset: the 2016 U.S. presidential polling data published by FiveThirtyEight. This dataset includes poll results from various organizations across all 50 states.

You don’t need to download the data in advance—we’ll load it directly from the web using Pandas. However, if you plan to run the script multiple times or want faster performance, downloading the file locally is a good idea.

What You’ll Learn

In this example-driven section, we’ll explore some of the most common and useful features of Pandas:

Filtering data – Focus on the rows you care about
Summarizing data – Calculate averages, counts, and other stats
Plotting data – Create basic visualizations to spot trends
Pivoting data – Reshape your dataset for better insights

Let’s begin by importing the libraries we’ll use throughout the tutorial:

import pandas as pd        # Core data analysis library
import matplotlib.pyplot as plt  # For plotting graphs
import numpy as np         # For numerical operations

Next, we’ll load the CSV file directly from the FiveThirtyEight website into a Pandas DataFrame:

# Load polling data into a DataFrame
df = pd.read_csv("http://projects.fivethirtyeight.com/general-model/president_general_polls_2016.csv")

Once loaded, df will contain all the polling data—including columns like state, pollster, population type (e.g. likely voters), candidate names, and adjusted poll numbers.

You can take a quick look at the first few rows using:

print(df.head())

This will give you a feel for the structure of the dataset before we dive into filtering and analyzing it.

Filtering Data in Pandas

Once you’ve loaded your dataset, you may notice that it contains a lot of information—more than you need all at once. Viewing everything with print(df) is rarely helpful, especially with large datasets. Instead, Pandas makes it easy to filter down to just the rows that matter.

Let’s say you’re only interested in polling data from California, conducted by a specific pollster like YouGov. You can apply filters to isolate those records using a simple Boolean condition.

Here’s how to do it:

# Filter the DataFrame to only include YouGov polls from California
df_filtered = df[(df["state"] == "California") & (df["pollster"] == "YouGov")]

What’s happening here:

df["state"] == "California" filters rows where the state column equals "California"
df["pollster"] == "YouGov" filters for rows where the pollster column equals "YouGov"
The & operator combines both conditions using a logical AND

Together, this returns a new DataFrame df_filtered that only contains rows meeting both criteria.

View the Filtered Results

You can inspect the filtered results using .head() again:

print(df_filtered.head())

This gives you a quick look at the first few entries that match your filter.

Filtering is a foundational skill in data analysis. Once you’ve narrowed your data down to what matters, it becomes much easier to visualize trends, calculate summaries, or build insights.

Plotting

Next, let’s plot the poll results of both Trump and Clinton:

df_filtered["adjpoll_clinton"].plot()
df_filtered["adjpoll_trump"].plot()
plt.show()

Your result should look something like this:

"Line graph comparison visualizing data trends using Python Pandas for analysis."

That is useful. But it would be more helpful if we could add some labels
We can add the legend parameter to identify each line:

df_filtered["adjpoll_clinton"].plot(legend=True)
df_filtered["adjpoll_trump"].plot(legend=True)

your chart should now look more like this:

"Poll comparison line graph showing adjusted trends for Clinton and Trump using Python Pandas."

That looks even better. When we start going beyond this point, I think it is a lot easier to use matplotlib directly to do more plotting. Here is a similar plot done using matplotlib:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv("http://projects.fivethirtyeight.com/general-model/president_general_polls_2016.csv")
df = df.sort_values ('startdate',ascending=False)
plt.plot(df['startdate'],df['adjpoll_clinton'])
plt.plot(df['startdate'],df['adjpoll_trump'])
plt.legend()
plt.ylabel('Approval')
plt.xticks(rotation=45)

plt.show()

Here is the result:

"Approval trend line graph showing adjusted polling data for Clinton and Trump in 2016, analyzed with Python Pandas."

As you can see above, we start by importing our libraries, then reading our csv file. We then sort our values based on the date of the poll, then we plot both the Clinton and trump approval ratings. We add a legend by calling plt.legend(). We add the label on the left side of the graph using the plt.ylabel command. We then rotate the dates along the bottom by 45 degrees with the plt.xticks command. Finally we show our graph with the plt.show() command.

When you do plotting, Pandas is just using matplotlib anyway. So what we have done is stepped back and done it outside of pandas. But it is still using the same libraries.

Also See: Python objects and classes!

Pivoting

Pivoting data is when you take the columns and make them the rows and vice versa. It is a good way to get a different perspective on your data. And it is better than simply tilting your head to the left. We will use the same dataset as the previous section in our examples. Just like before, we will start by importing our libraries:

import pandas as pd

Next we read our CSV file and create our data frame:

df = pd.read_csv("http://projects.fivethirtyeight.com/general-model/president_general_polls_2016.csv")

Next we want to see what Registered Voters are saying vs Likely Voters in our samples. So we are going to Pivot using the population column as our column list:

df.pivot(columns='population',values='adjpoll_clinton')

Your output should look similar to this:

"Console output displaying Pandas DataFrame, showing population data with NaN values for Python tutorial."

Using this pivot table you can see the approval ratings for Clinton among likely voters and registered voters. Those NaN’s get in the way, so let’s get the average of each column:

df.pivot(columns='population',values='adjpoll_clinton').mean(skipna=True)

In the above command we added the .mean() function with the skipna=True option. This takes the average of each column, but skips all of the NaN values.

Your output should look similar to this:

"Python script output showing population data series with float64 dtype using Pandas."

Here is all of our pivot table code consolidated:

import pandas as pd

df = pd.read_csv("http://projects.fivethirtyeight.com/general-model/president_general_polls_2016.csv")

#Filter to only show data from the state of California
df=df[(df.state=='California')]

#Pivot to show the lv/rv data as the columns
print(df.pivot(columns='population',values='adjpoll_clinton'))

#Show the averages for lv and rv (registered voters, likely voters)
print(df.pivot(columns='population',values='adjpoll_clinton').mean(skipna=True))

Summarizing

It can be taunting to look at a large dataset. However, Pandas gives you some nice tools for summarizing the data so you don’t have to try to take on the entire dataset at once.

To start, we have the min, max and median functions. These functions do as they say and return the minimum, maximum, and average values. You can see examples of each below using our Pivot Table from the previous section:

df.pivot(columns='population',values='adjpoll_clinton').mean(skipna=True)
df.pivot(columns='population',values='adjpoll_clinton').max(skipna=True)
df.pivot(columns='population',values='adjpoll_clinton').min(skipna=True)

Next it might be helpful to know the number of unique values you have in a dataset:

df.pivot(columns='population',values='adjpoll_clinton').nunique()

Or if you just want a quick summary, you can use the describe function:

df.pivot(columns='population',values='adjpoll_clinton').describe()

The output of the describe function is the most useful as it combines many of the previous functions we talked about. Your output will look similar to this:

Summary

Pandas is an essential tool for anyone working with data in Python. In this beginner-friendly tutorial, you learned how to install Pandas, understand its core concepts like DataFrames and Series, and use it to load and explore real-world datasets.

We covered how to:

Filter data to focus on what matters
Create visualizations using Pandas and Matplotlib
Use pivot tables to compare groups in your dataset
Summarize key statistics using built-in Pandas functions

Whether you’re analyzing polling data, sales figures, or survey results, Pandas makes it easy to wrangle and understand your data. If you’re just starting out, continue practicing with different datasets—try filtering, pivoting, and visualizing to deepen your understanding.

Elevate Your IT Efficiency with Expert Solutions

Transform Your Technology and Propel Your Business

Unlock advanced technology solutions tailored to your business needs. At Inventive HQ, we combine industry expertise with innovative practices to enhance your cybersecurity, streamline your IT operations, and leverage cloud technologies for optimal efficiency and growth.

Discover Our Services

Python Pandas Tutorial for Beginners: How to Analyze and Manipulate Data with Ease

What is Pandas?

Example:

Pandas Works Well With:

What is Numpy?

Why is NumPy useful?

Example:

Installing Pandas

Install with pip

Optional: Use Anaconda

Using Pandas

What You’ll Learn

Filtering Data in Pandas

View the Filtered Results

Plotting

Pivoting

Summarizing

Summary

Transform Your Technology and Propel Your Business

Related

What is Pandas?

Example:

Pandas Works Well With:

What is Numpy?

Why is NumPy useful?

Example:

Installing Pandas

Install with pip

Optional: Use Anaconda

Using Pandas

What You’ll Learn

Filtering Data in Pandas

View the Filtered Results

Plotting

Pivoting

Summarizing

Summary

Transform Your Technology and Propel Your Business

Share this:

Related