Bad Data All of these are commonly seen in the real world: Zeros replace missing values Spelling inconsistent(esp with human-entered data) Rows are duplicated Inconsistent date formats (e.g. 10/9/15 vs. 9/10/15) Units not specified Rectangular Data Easy to manipulate, visualize, and combine, Tables (DataFrames): Each labeled column has values of the same type. Manipulated using group, sort, join..
Python list: Pandas: The word "index" refers to the collection of labels for each row. groupby: Harder Question What was the most popular male name during each year in the data? What are the three states with the most babies born? By doing groupby, we can easily approach. # avarage of percent, group by Party df['%'].groupby(df['Party']).mean() # return minimum value, group by Party df['%'].group..
Today, I solved "Maximum Subarray." Description: Solution: As we can see, we don't have to include the first negative array. curSum is a cumulative sum of numbers. If curSum is negative, it means that subarray is not for maximum sum. I use max() to compare the curSum and maxSub.
SRS Review Q1: Suppose we have 6 people named A, B, C, D, E and F and we take an SRS of size 2. What is P(A in sample) = ? How about P(C and D in sample) = ? A: AB, AC, AD, AE, AF BC, BD, BE, BF CD, CE, CF DE, DF EF P(CD) = 1/15 P(A) = 1/3 Q2: We have two classrooms: D8 and D100. D8 has 10 students not named Sam. D100 has 4 students, one is named Sam. Suppose we flip a fair coin to pick a classr..
Today, I solved coin chnage. Description: In this problem, we should find the fewest number of coins that make up that amount. Solution: This is the brute force solution. Let's say coin is [1, 3, 4, 5], and amount = 7. dp[0] is always "0" (because no need to coin to get to 0) dp[1] = 1 dp[2] = 2 dp[3] = 1 -> one 3coin dp[4] = 1 -> one 4coin dp[5] = 1 -> one 5coin dp[6] = 2 -> one 1coin + one 5co..
Feature Engineering is the process of transforming the raw features into more informative features that can be used in modeling or EDA tasks. Feature Functions As number of features grows, we can capture arbitrarily complex relationships. Suppose we wish to develop a model to predict a vehicle's fuel efficiency ("mpg") as a function of its horsepower("hp"). Glancing at the plot below, we see tha..
Today, I solved the climbing staris. Description: Solution: Let's say the example(n = 5). On the 4th stair, we have only 1 way to get 5th stair. On the 3rd stair, we have (1 + 1) ways to get 5th stair. On the 2nd stair, we have (2 + 1) ways to get 5th stair. In my code, we set (n-1)th stair as "one", and nth stair as "two." We shift those one and two until we get to 0(start point). (n-1) and (n)..
import re text = "Moo" pattern = r"]+>" re.sub(pattern, '', text) # return 'Moo' Notice the r proceeding the regular expression pattern; this specifies the regular expression is a raw string. Raw string do not recognize escape sequences. This makes them useful for regular expressions, which often contain literal '\' chracters. data = {"HTML": ["Moo", \ "Link", \ "Bold text"]} html_data = pd.Data..