# elections.loc[0, "Candidate"] - Previous approach
elections.iloc[0, 1]
DataFrame is a collection of Series that all shares the same index.
Index doesn't have to be an integer, nor does it have to unique.
# this sets the index to the "Candidate" column
elections.set_index("Candidate", inplace=True)
elections.index
'''
Index(['~', '~',,,,'~"], dtype='object', name='Candidate', length=182)
'''
# resets the index to be the default list of integers
elections.reset_index(inplace=True)
Slicing in DataFrames
The API for the DataFrame class is enormous.
DataFrame API that allow us to extract subsets of data.
The simplest way to manipulate a DataFrame is to extract a subset of rows and columns, known as slicing.
- .loc
- .iloc
- []
Indexing with .loc
To grab data with .loc, we must specify the row and column label(s). where the data exists.
The row labels are the first argument to the .loc function; the column labels are the second.
For example, we can select the row labeled 0 and the column labeled Candidate from the elections DataFrames.
elections.loc[0, 'Candidate']
'''
'Andrew Jackson'
'''
elections.loc[0:3, 'Year':'Popular vote']
# every column value for the first four rows in the elections
election.loc[0:3, :]
We can change the order of columns
elections.loc[[0, 1, 2, 3], ['Year', 'Candidate', 'Party', 'Popular vote']]
We can also interchange list and slicing notation.
elections.loc[[0, 1, 2, 3], :]
Indexing with .iloc
# elections.loc[0, "Candidate"] - Previous approach
elections.iloc[0, 1]
# 1824
# select the first four rows and columns using .iloc
# elections.loc[0:3, 'Year':'Popular vote'] - Previous approach
elections.iloc[0:4, 0:4]
# elections.loc[[0, 1, 2, 3], ['Year', 'Candidate', 'Party', 'Popular vote']]
elections.iloc[[0, 1, 2, 3], [0, 1, 2, 3]]
Indexing with [ ]
The [ ] selection operator is the most baffling of all, yet the commonly used. It only takes a single argument, which may be one of the following:
- A slice of row numbers
- A list of column labels
- A single column label
That is, [ ] is context dependent.
# The first four rows of our elections DataFrame
elections[0:4]
# When we want the first four columns
elections[["Year", "Candidate", "Party", "Popular vote"]]
# A single column label - Candidate
elections["Candidate"]
'Computer Science 🌋 > Machine Learning🐼' 카테고리의 다른 글
Add & Remove Columns (0) | 2023.05.23 |
---|---|
Handy Utility Functions in Pandas (0) | 2023.05.23 |
Conditional Selection in Pandas (0) | 2023.05.23 |
Basics in Pandas (0) | 2023.05.22 |
Data Science Lifecycle (0) | 2023.05.22 |