If you're just starting your journey into data analysis with Python, you'll soon become familiar with the Pandas library. Pandas is an essential tool for data manipulation and analysis. While everybody wants to get into analyzing and creating visualizations, knowing the components of your data is a crucial step to understanding how to manipulate this data to do whatever you want.
In this article, I explain what series, dataframes and CSV file are including sample codes to import and view our first data set.
Open Jupyter notebook from your terminal.
The first step to getting started with any Pandas project is importing Pandas (as pd). While you can import Pandas as whatever you like, Pd is the most common abbreviation globally. This will allow anybody from anywhere in the world to easily grab what you are doing when you work in a team remotely or physically.
Series
A Series is one of the core data structures in Pandas. It can be thought of as a one-dimensional array that can hold various data types, such as integers, strings, and more. To create a Series, you can use the pd.Series()
constructor, like this:
We will create a series of car makes which include BMW, Honda, and Toyota.
series = pd.Series(["BMW", "Toyota", "Honda"])
Key points about Series:
Series are one-dimensional.
They are like lists or arrays but with additional features for data analysis.
DataFrames
A DataFrame is a two-dimensional data structure that resembles a table or spreadsheet. It is the primary data structure used for data analysis in Pandas. You can create a DataFrame by passing a dictionary of Series objects to the pd.DataFrame()
constructor, like this:
# Creating a DataFrame
colors = pd.Series(["Red", "Blue", "White"])
car_data = pd.DataFrame({"Car Make": series, "Color": colors})
car_data
Output:
Car | Make | Color |
0 | BMW | Red |
1 | Toyota | Blue |
2 | Honda | White |
Key points about DataFrames:
DataFrames are two-dimensional.
They are similar to tables in a relational database.
DataFrames are highly versatile and can store various types of data.
3. CSVs (Comma-Separated Values)
CSV (Comma-Separated Values) is a widely used file format for storing tabular data. Pandas provides functions to read data from CSV files and export data to CSV files. To read data from a CSV file, you can use pd.read
_csv()
, and to export a DataFrame to a CSV file, you can use the to_csv()
method.
# Importing data from a CSV file
car_sales = pd.read_csv("car-sales.csv")
print(car_sales)
# Exporting a DataFrame to a CSV file
car_sales.to_csv("exported-car-sales.csv")
Key points about CSVs in Pandas:
- CSV files are a common format for data storage and exchange.
- Use pd.read
_csv()
to import data from CSV files.
- Use the to_csv()
method to export a DataFrame to a CSV file.
Anatomy of a DataFrame
A Pandas DataFrame consists of rows and columns, starting from the 0 index. Rows are referred to as the "index axis" (axis 0), while columns are referred to as the "columns axis" (axis 1). Each individual value within a DataFrame is referred to as data, and the headings of the columns are called column names.
Pandas is a powerful library for data analysis in Python, and understanding Series, DataFrames, and CSVs is fundamental for any data analyst. With these concepts, you can efficiently manipulate, analyze, and visualize data to gain valuable insights in your data analysis journey.