GroupBy(), Continued As we learned last lecture, a groupby operation involves some combination of splitting a DataFrame into grouped subframes, applying a function, and combining the results. Organizes all rows with the same year into a subframe for that year. Creates a new DataFrmae with one row representing each subframe year. Combines all integer rows in each subframe using the sum function. ..
GroupBy # aggregate all rows in babynames for a given year babynames.groupby("Year") # Output: ※ The reason for strange output: calling .groupby has generated a GroupBy object! .agg ''' .agg method takes in a function as its argument; this function is then applied each column of a "mini" grouped DataFrame. We end up with a new DataFrame with one aggregated row per subframe ''' # return the numbe..
Add columns # specify the name of the new column -> dataframe["new_columns"] # Add a column named "name_lengths" that includes the length of each name babynames["name_lengths"] = babynames["Names"].str.len() babynames.head(5) Sort by the temporary column # Sort by the temporary column babynames = babynames.sort_values(by = "name_lengths", ascending=False) babynames.head() .map # First, define a ..
Numpy bella_counts = babynames[babynames["Name"] == "Bella"]["Count"] # Average number of babies named Bella each year np.mean(bella_counts) # Max number of babies named Bella born on a given year max(bella_counts) .shape & .size # return a tuple containing the number of rows and columns babynames.shape # return the total number of elements in a structure, equivalent to the number of rows times ..
Conditional Selection # Ask yourself: why is :9 is the correct slice to select the first 10 rows? babynames_first_10_rows = babaynames.loc[:9, :] # Notice how we have exactly 10 elements in our boolean array argument babynames_first_10_rows[[True, False, True, False, True, False, True, False, True, False]] To make things easier, we can instead provide a logical condition as an input to .loc or [..
Today, I solved the Missing number. Description: In this question, we are asked to find missing number in nums. Solution: In this method, we are gonna subtract sum of nums from sum of every number of n. For example 2: The length of nums: 2 Range: [0, 2] sum([0, 1, 2]) - sum([0, 1] = 2 Space Complexity: O(1) Time Complexity: O(n)
# elections.loc[0, "Candidate"] - Previous approach elections.iloc[0, 1] DataFrame is a collection of Series that all shares the same index. Index doesn't have to be an integer, nor does it have to unique. # this sets the index to the "Candidate" column elections.set_index("Candidate", inplace=True) elections.index ''' Index(['~', '~',,,,'~"], dtype='object', name='Candidate', length=182) ''' # ..
Data cleaning: Data cleaning corrects issues in the structure and formatting of data, including missing values and unit conversions. Exploratory data analysis (EDA): EDA describe the process of transforming raw data to insightful observations. It is open-ended analysis of transforming, visualizaing, and summarizing patterns in data. # 'pd' is the conventional alias for Pandas, as 'np' is for Num..
1. Ask a Question What do we want to know? A question that is too ambiguous may lead to confusion. What problems are we trying to solve? The goal of asking a question should be clear in order to justify your effors to stakeholders. What are the hypotheses we want to test? This gives a clear perspective from which to analyze final results. What are the metrics for our success? This gives a clear ..
Today, I solved Counting Bits. Description: In this problem, we are gonna return the array. The array length should ne "n+1", and the array show how many 1 bits each number has in their binary number. Solution: The length of dp array: [0] * (n + 1) offset = 1 (This tells where the numbers between 2 ^n-1, 2^n) we are gonna check the offset reach out 1, 2, 4, 8 ... each time in for loop. 0 -> 0000..