Row 0 is rollup record.
The granularity of record 0 vs the rest of the records (States) is different.
# the sum of all state cases
td_df.sum(axis=0)
# If we sum over all rows, we should get 2x the total cases in each of our cases by year
# check out the column types
tb_df.dtypes
The commas cause all TB cases to be read as the object datatype, or storage type (close to the Python sting datatype), so pandas is concatenating stings instead of adding integers.
# improve readability: chaining method calls with outer parentheses/line breaks
tb_df = (
pd.read_csv("data/cdc_tuberculosis.csv", header=1, thousands=',')
.rename(columns=rename_dict)
)
tb_df.head(5)
tb_df.sum()
state_tb_df = tb_df[1:]
state_tb_df.head(5)
'Computer Science 🌋 > Machine Learning🐼' 카테고리의 다른 글
Reproduce Data: Compute Incidence (0) | 2023.05.24 |
---|---|
Gather more data & join data on primary keys (0) | 2023.05.24 |
CSV files and field names (0) | 2023.05.24 |
Data Cleaning Structure (0) | 2023.05.24 |
Joining Tables (0) | 2023.05.23 |