Python list:
Pandas:
The word "index" refers to the collection of labels for each row.
groupby:
Harder Question
- What was the most popular male name during each year in the data?
- What are the three states with the most babies born?
By doing groupby, we can easily approach.
# avarage of percent, group by Party
df['%'].groupby(df['Party']).mean()
# return minimum value, group by Party
df['%'].groupby(df['Party']).min()
# return the size, group by Party
df['%'].groupby(df['Party']).size()
# groupby multiple columns
(df['%']
.groupby([df['Party'], df['Result']])
.mean()
)
Reset Index:
(df
.groupby(['Party', 'Result'])
.mean()
.reset_index()
)
# make index in order
Pivot Table:
df.pivot_table(
index='Party'
columns='Result',
values='%',
aggfunc=np.mena,
)
groupby/filter:
isin:
df[df["Party"] == "Democratic") | (df["Party"] == "Republican")] # Ugly
df[df["Party"].isin(["Republican", "Democratic"])] # Better
Other:
ca.pivot_table(
index='Sex', columns='Year',
values='Name',
aggfunc=lambda ns: ns.iloc[0])
.loc['M']
.value_counts()
.plot(kind='barh')
)
# number of unique woman name!
(ca.pivot_table(
index='Year', columns='Sex',
values='Name',
aggfunc=len)
.plot()
)
'Computer Science 🌋 > Machine Learning🐼' 카테고리의 다른 글
EDA Review (0) | 2023.05.30 |
---|---|
Data Cleaning Review (0) | 2023.05.28 |
Pandas part 1 Review (0) | 2023.05.26 |
Life Cycle and Design Review (0) | 2023.05.26 |
Feature Engineering (0) | 2023.05.25 |