Data cleaning:
Data cleaning corrects issues in the structure and formatting of data, including missing values and unit conversions.
Exploratory data analysis (EDA):
EDA describe the process of transforming raw data to insightful observations. It is open-ended analysis of transforming, visualizaing, and summarizing patterns in data.
# 'pd' is the conventional alias for Pandas, as 'np' is for NumPy
import pandas as pd
1. Series : 1-dimensional array
import pandas as pd
s = pd.Series([-1, 10, 2)
print(s)
'''
0 -1
1 10
2 2
dtype: int64
'''
s. array # Data contained within the Series
'''
<PandasArray>
[-1, 10, 2]
Length: 3, dtype: int64
'''
s.index # the Index of the Series
'''
RangeIndex(start=0, stop=3, step=1)
'''
Indices change
s = pd.Series([-1, 10, 2], index = ["a", "b", "c"])
print(s)
'''
a -1
b 10
c 2
dtype: int64
'''
ser = pd.Series(4, -2, 0, 6], index = ["a", "b", "c", "d"])
print(ser)
'''
a 4
b -2
c 0
d 6
dtype: int64
'''
print(ser["a"])
# 4
ser[["a", "c"]] # This return value is another Sereies
'''
a 4
c 0
dtype: int64
'''
A Filtering Condition
ser > 0 #Filter condition: select all elements greater than 0
'''
a True
b False
c False
d True
dtype: bool
'''
ser[ser > 0]
'''
a 4
d 6
dtype: int 64
'''
Data Frames
import pandas as pd
elections = pd.read_csv("data/elections.csv")
elections
Creating a DataFrame
1. Using a list and columns names
2. From a dictionary
3. From a Series
df_list = pd.DataFrame([1, 2, 3], columns=["Numbers"])
df_list
'''
Numbers
0 1
1 2
2 3
'''
df_list = pd.DataFrame([[1, "one"], [2, "two"]], columns = ["Number", "Description"])
df_list
'''
Number Description
0 1 one
1 2 two
'''
From a Dictionary
df_dict = pd.DataFrame({"Fruit": ["Strawberry", "Orange"], "Price": [5.49, 3.99]})
df_dict
'''
Fruit Price
0 Strawberry 5.49
1 Orange 3.99
'''
From a Series
# Notice how our indices, or row labels, are the same
s_a = pd.Series(["a1", "a2", "a3"], index = ["r1", "r2", "r3"])
s_b = pd.Series(["b1", "b2", "b3"], index = ["r1", "r2", "r3"])
pd.DataFrame({"A-column": s_a, "B-column": s_b})
'''
A-column B-column
r1 a1 b1
r2 a2 b2
r3 a3 b3
'''
'Computer Science 🌋 > Machine Learning🐼' 카테고리의 다른 글
Add & Remove Columns (0) | 2023.05.23 |
---|---|
Handy Utility Functions in Pandas (0) | 2023.05.23 |
Conditional Selection in Pandas (0) | 2023.05.23 |
Indexing in Pandas (0) | 2023.05.22 |
Data Science Lifecycle (0) | 2023.05.22 |