GroupBy
# aggregate all rows in babynames for a given year
babynames.groupby("Year")
# Output: <pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000022E30AA5880>
※ The reason for strange output: calling .groupby has generated a GroupBy object!
.agg
'''
.agg method takes in a function as its argument; this function is then applied each
column of a "mini" grouped DataFrame. We end up with a new DataFrame with one
aggregated row per subframe
'''
# return the number of babies born in each year
babynames.groupby("Year").agg(sum).head(5)
However, we can see that "State", "Sex", and "Name" columns disappear.
sum the string data in these columns.
pandas will simply omit these columns when it performs the aggregation on the DataFrame. Since this happens implicitly, without the user specifying that these columns should be ignored. It's easy to run into troubling situations where columns are removed without the programmer noticing. It is better coding practice to select only the columns we care about before perfroming the aggregation.
# Same result, but now we explicitly tell pandas to only consider the "Count" column when summing
babynames.groupby("Year")[["Count"]].agg(sum).head(5)
'Computer Science 🌋 > Machine Learning🐼' 카테고리의 다른 글
Aggregation Data with Pivot Table in Pandas (0) | 2023.05.23 |
---|---|
Aggregation in Pandas (0) | 2023.05.23 |
Add & Remove Columns (0) | 2023.05.23 |
Handy Utility Functions in Pandas (0) | 2023.05.23 |
Conditional Selection in Pandas (0) | 2023.05.23 |