Gather more data: Census
# 2010s census data
census_2010s_df = pd.read_csv("data/nst-est2019-01.csv", header=3, thousands=",")
census_2010s_df = (
census_2010s_df
.reset_index()
.drop(columns=["index", "Census", "Estimates Base"])
.rename(columns={"Unnamed: 0": "Geographic Area"})
.convert_dtypes() # "smart" converting of columns, use at your own risk
.dropna() # we'll introduce this next time
)
census_2010s_df['Geographic Area'] = census_2010s_df['Geographic Area'].str.strip('.')
# with pd.option_context('display.min_rows', 30): # shows more rows
# display(census_2010s_df)
census_2010s_df.head(5)
# census 2020s data
census_2020s_df = pd.read_csv("data/NST-EST2022-POP.csv", header=3, thousands=",")
census_2020s_df = (
census_2020s_df
.reset_index()
.drop(columns=["index", "Unnamed: 1"])
.rename(columns={"Unnamed: 0": "Geographic Area"})
.convert_dtypes() # "smart" converting of columns, use at your own risk
.dropna() # we'll introduce this next time
)
census_2020s_df['Geographic Area'] = census_2020s_df['Geographic Area'].str.strip('.')
census_2020s_df.head(5)
Join data on primary keys
# merge TB dataframe with two US census dataframes
tb_census_df = (
tb_df
.merge(right=census_2010s_df,
left_on="U.S. jurisdiction", right_on="Geographic Area")
.merge(right=census_2020s_df,
left_on="U.S. jurisdiction", right_on="Geographic Area")
)
tb_census_df.head(5)
But this one is messy.
Try this code:
# try merging again, but cleaner this time
tb_census_df = (
tb_df
.merge(right=census_2010s_df[["Geographic Area", "2019"]],
left_on="U.S. jurisdiction", right_on="Geographic Area")
.drop(columns="Geographic Area")
.merge(right=census_2020s_df[["Geographic Area", "2020", "2021"]],
left_on="U.S. jurisdiction", right_on="Geographic Area")
.drop(columns="Geographic Area")
)
tb_census_df.head(5)
'Computer Science 🌋 > Machine Learning🐼' 카테고리의 다른 글
Python String Method (0) | 2023.05.24 |
---|---|
Reproduce Data: Compute Incidence (0) | 2023.05.24 |
Record Granularity (0) | 2023.05.24 |
CSV files and field names (0) | 2023.05.24 |
Data Cleaning Structure (0) | 2023.05.24 |