'Computer Science 🌋/Machine Learning🐼' 카테고리의 글 목록 (3 Page)

Aggregation in Pandas

2023.05.23

GroupBy(), Continued As we learned last lecture, a groupby operation involves some combination of splitting a DataFrame into grouped subframes, applying a function, and combining the results. Organizes all rows with the same year into a subframe for that year. Creates a new DataFrmae with one row representing each subframe year. Combines all integer rows in each subframe using the sum function. ..

Computer Science 🌋/Machine Learning🐼

Aggregating Data with GroupBy in Pandas

2023.05.23

GroupBy # aggregate all rows in babynames for a given year babynames.groupby("Year") # Output: ※ The reason for strange output: calling .groupby has generated a GroupBy object! .agg ''' .agg method takes in a function as its argument; this function is then applied each column of a "mini" grouped DataFrame. We end up with a new DataFrame with one aggregated row per subframe ''' # return the numbe..

Computer Science 🌋/Machine Learning🐼

Add & Remove Columns

2023.05.23

Add columns # specify the name of the new column -> dataframe["new_columns"] # Add a column named "name_lengths" that includes the length of each name babynames["name_lengths"] = babynames["Names"].str.len() babynames.head(5) Sort by the temporary column # Sort by the temporary column babynames = babynames.sort_values(by = "name_lengths", ascending=False) babynames.head() .map # First, define a ..

Computer Science 🌋/Machine Learning🐼

Handy Utility Functions in Pandas

2023.05.23

Numpy bella_counts = babynames[babynames["Name"] == "Bella"]["Count"] # Average number of babies named Bella each year np.mean(bella_counts) # Max number of babies named Bella born on a given year max(bella_counts) .shape & .size # return a tuple containing the number of rows and columns babynames.shape # return the total number of elements in a structure, equivalent to the number of rows times ..

Computer Science 🌋/Machine Learning🐼

Conditional Selection in Pandas

2023.05.23

Conditional Selection # Ask yourself: why is :9 is the correct slice to select the first 10 rows? babynames_first_10_rows = babaynames.loc[:9, :] # Notice how we have exactly 10 elements in our boolean array argument babynames_first_10_rows[[True, False, True, False, True, False, True, False, True, False]] To make things easier, we can instead provide a logical condition as an input to .loc or [..

Computer Science 🌋/Machine Learning🐼

Indexing in Pandas

2023.05.22

# elections.loc[0, "Candidate"] - Previous approach elections.iloc[0, 1] DataFrame is a collection of Series that all shares the same index. Index doesn't have to be an integer, nor does it have to unique. # this sets the index to the "Candidate" column elections.set_index("Candidate", inplace=True) elections.index ''' Index(['~', '~',,,,'~"], dtype='object', name='Candidate', length=182) ''' # ..

Computer Science 🌋/Machine Learning🐼

Basics in Pandas

2023.05.22

Data cleaning: Data cleaning corrects issues in the structure and formatting of data, including missing values and unit conversions. Exploratory data analysis (EDA): EDA describe the process of transforming raw data to insightful observations. It is open-ended analysis of transforming, visualizaing, and summarizing patterns in data. # 'pd' is the conventional alias for Pandas, as 'np' is for Num..

Computer Science 🌋/Machine Learning🐼

Data Science Lifecycle

2023.05.22

1. Ask a Question What do we want to know? A question that is too ambiguous may lead to confusion. What problems are we trying to solve? The goal of asking a question should be clear in order to justify your effors to stakeholders. What are the hypotheses we want to test? This gives a clear perspective from which to analyze final results. What are the metrics for our success? This gives a clear ..

티스토리툴바