GroupBy(), Continued As we learned last lecture, a groupby operation involves some combination of splitting a DataFrame into grouped subframes, applying a function, and combining the results. Organizes all rows with the same year into a subframe for that year. Creates a new DataFrmae with one row representing each subframe year. Combines all integer rows in each subframe using the sum function. ..
GroupBy # aggregate all rows in babynames for a given year babynames.groupby("Year") # Output: โป The reason for strange output: calling .groupby has generated a GroupBy object! .agg ''' .agg method takes in a function as its argument; this function is then applied each column of a "mini" grouped DataFrame. We end up with a new DataFrame with one aggregated row per subframe ''' # return the numbe..
Add columns # specify the name of the new column -> dataframe["new_columns"] # Add a column named "name_lengths" that includes the length of each name babynames["name_lengths"] = babynames["Names"].str.len() babynames.head(5) Sort by the temporary column # Sort by the temporary column babynames = babynames.sort_values(by = "name_lengths", ascending=False) babynames.head() .map # First, define a ..
Numpy bella_counts = babynames[babynames["Name"] == "Bella"]["Count"] # Average number of babies named Bella each year np.mean(bella_counts) # Max number of babies named Bella born on a given year max(bella_counts) .shape & .size # return a tuple containing the number of rows and columns babynames.shape # return the total number of elements in a structure, equivalent to the number of rows times ..
Conditional Selection # Ask yourself: why is :9 is the correct slice to select the first 10 rows? babynames_first_10_rows = babaynames.loc[:9, :] # Notice how we have exactly 10 elements in our boolean array argument babynames_first_10_rows[[True, False, True, False, True, False, True, False, True, False]] To make things easier, we can instead provide a logical condition as an input to .loc or [..
# elections.loc[0, "Candidate"] - Previous approach elections.iloc[0, 1] DataFrame is a collection of Series that all shares the same index. Index doesn't have to be an integer, nor does it have to unique. # this sets the index to the "Candidate" column elections.set_index("Candidate", inplace=True) elections.index ''' Index(['~', '~',,,,'~"], dtype='object', name='Candidate', length=182) ''' # ..
Data cleaning: Data cleaning corrects issues in the structure and formatting of data, including missing values and unit conversions. Exploratory data analysis (EDA): EDA describe the process of transforming raw data to insightful observations. It is open-ended analysis of transforming, visualizaing, and summarizing patterns in data. # 'pd' is the conventional alias for Pandas, as 'np' is for Num..
1. Ask a Question What do we want to know? A question that is too ambiguous may lead to confusion. What problems are we trying to solve? The goal of asking a question should be clear in order to justify your effors to stakeholders. What are the hypotheses we want to test? This gives a clear perspective from which to analyze final results. What are the metrics for our success? This gives a clear ..
Today, I learned about the CSMA collisions, "Taking turns" MAC protocols and Cable Internet. Compared to ALOHA, CSMA is more polite protocol algorithm. CSMA: Carrier Sense Multiple access CSMA: listen before transmit: If channel sensed idle: transmit entire frame. If channel sensed busy, defer transmission. In human analogy, "don't interrupt others!" Collision can still occur: Due to propagation..