CSV files and field names

# check out the first three lines:
with open("data/cdc_tuberculosis.csv", "r") as f:
	i = 0
    for row in f:
    	print(row)
        i += 1
        if i > 3:
        	break

※ Python's print() prints each string (including the newline), and an additional newline on top of that.

# We can use the repr() function to return the raw sting with all special characters
with open("data/cdc_tuberculosis.csv", "r") as f:
	i = 0
    for row in f:
    	print(repr(row)) # print raw strings
        i += 1
        if i > 3:
        	break

# can cause wrangle the data. need to clean the data
# can cause "Unnamed" column names
tb_df = pd.read_csv("data/cdc_tuberculosis.csv")
tb_df.head()

# identify the row with the right header.
tb_df = pd.read_csv("data/cdc_tuberculosis.csv", header=1) # row index
tb_df.head(5)

The second result shows:

However, we can't differenciate between the "Number of TB cases" and "TB incidence" year column.

# We can do this manually with df.rename()
rename_dict = {'2019': 'TB cases 2019',
               '2020': 'TB cases 2020',
               '2021': 'TB cases 2021',
               '2019.1': 'TB incidence 2019',
               '2020.1': 'TB incidence 2020',
               '2021.1': 'TB incidence 2021'}
tb_df = tb_df.rename(columns=rename_dict)
tb_df.head(5)

'Computer Science 🌋 > Machine Learning🐼' 카테고리의 다른 글

Gather more data & join data on primary keys (0)	2023.05.24
Record Granularity (0)	2023.05.24
Data Cleaning Structure (0)	2023.05.24
Joining Tables (0)	2023.05.23
Aggregation Data with Pivot Table in Pandas (0)	2023.05.23

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

CSV files and field names

'Computer Science 🌋 > Machine Learning🐼' 카테고리의 다른 글

'Computer Science 🌋 > Machine Learning🐼' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역