Computer Science 🌋/Machine Learning🐼

Computer Science 🌋/Machine Learning🐼

Gradient Descent

Gradient Derivatives We can also interpret the slope as: If I nudge x, how does y change? Pretend that multivariable funciton is univariate, then take derivative as normal. This is a partial derivative. The gradient extends the derivative to multiple variables. Vector-valued function (always outputs a vector). The gradient of f(θ) w.r.t. θ is a vector. Each element is the partial derivative for ..

Computer Science 🌋/Machine Learning🐼

Foundation of Machine Learning

Modeling Making Predition To make a prediction, we choose a model, Constant Model: Prediction: fθ(x) = θ (Recipe to compute the prediction) Simple Linear Model: fθ(x) = θ1x + θ0 ( Two model weights) The Constant Model Start simple: if constant model, how do we pick θ? Intuition: pick θ to be close to most of the values in data Model Loss Use x to denote what we use to make predictions Use y to d..

Computer Science 🌋/Machine Learning🐼

SQL in Pandas Review

Schemas Schema describes all relations and their attribute names & types. Granularity (what does one record in each table represent?) Primary and Foreign keys Representation CREATE TABLE users( id INTEGER PRIMARY KEY, name TEXT ) CREATE TABLE orders( item TEXT PRIMARY KEY, price NUMERIC, name TEXT ) GROUP BY and HAVING # SQL SELECT max(name), legs, weight FROM animals GROUP BY legs, weight HAVIN..

Computer Science 🌋/Machine Learning🐼

Text Fields Review

Text Fields and Data Cleaning / EDA Extract quantitative values from text: dates, times, positions, etc. Determine if missing values are denoted # split time_str = first.split('[')[1].split(' ', 1)[0] # '26/Jan/2014:10:47:58' day, month, rest = time_str.split('/') # ['26', 'Jan', '2014:10:47:58'] year, hour, minute, second = rest.split(':') # ['2014', '10', '47', '58'] year, month, day, hour, mi..

Computer Science 🌋/Machine Learning🐼

EDA Review

Goals of EDA Data Types: What kinds of data do we have? Granularity: How fine/coarse is each datum? Scope: How (in)complete are the data? Temporality: How are the data situated in time? Faithfulness: How accurately do the data describe the world? Data Type: Nominal Data: categories without natural ordering Ordinal Data: categories with natural ordering Numerical Data: amounts or quantities Compu..

Computer Science 🌋/Machine Learning🐼

Data Cleaning Review

Bad Data All of these are commonly seen in the real world: Zeros replace missing values Spelling inconsistent(esp with human-entered data) Rows are duplicated Inconsistent date formats (e.g. 10/9/15 vs. 9/10/15) Units not specified Rectangular Data Easy to manipulate, visualize, and combine, Tables (DataFrames): Each labeled column has values of the same type. Manipulated using group, sort, join..

Computer Science 🌋/Machine Learning🐼

Pandas part 2 Review

Python list: Pandas: The word "index" refers to the collection of labels for each row. groupby: Harder Question What was the most popular male name during each year in the data? What are the three states with the most babies born? By doing groupby, we can easily approach. # avarage of percent, group by Party df['%'].groupby(df['Party']).mean() # return minimum value, group by Party df['%'].group..

Computer Science 🌋/Machine Learning🐼

Pandas part 1 Review

Question werid = pd.DataFrame({1:["topdog","botdog"], "1":["topcat","botcat"]}) werid Try to predict the output of the following: weird[1] werid["1"] werid[1:] Name --> [ ] --> Series (Single Column Selection) List --> [ ] --> DataFrame (Multiple Column Selection) Numeric Slice -- > [ ] --> DataFrame (Multiple Raw Selection) Answer: weird[1] weird["1"], werid[['1']], werid['1'] weird[1: ] # bool..

Computer Science 🌋/Machine Learning🐼

Life Cycle and Design Review

SRS Review Q1: Suppose we have 6 people named A, B, C, D, E and F and we take an SRS of size 2. What is P(A in sample) = ? How about P(C and D in sample) = ? A: AB, AC, AD, AE, AF BC, BD, BE, BF CD, CE, CF DE, DF EF P(CD) = 1/15 P(A) = 1/3 Q2: We have two classrooms: D8 and D100. D8 has 10 students not named Sam. D100 has 4 students, one is named Sam. Suppose we flip a fair coin to pick a classr..

Computer Science 🌋/Machine Learning🐼

Feature Engineering

Feature Engineering is the process of transforming the raw features into more informative features that can be used in modeling or EDA tasks. Feature Functions As number of features grows, we can capture arbitrarily complex relationships. Suppose we wish to develop a model to predict a vehicle's fuel efficiency ("mpg") as a function of its horsepower("hp"). Glancing at the plot below, we see tha..

KB0129
'Computer Science 🌋/Machine Learning🐼' 카테고리의 글 목록