1. Ask a Question
- What do we want to know?
- A question that is too ambiguous may lead to confusion.
- What problems are we trying to solve?
- The goal of asking a question should be clear in order to justify your effors to stakeholders.
- What are the hypotheses we want to test?
- This gives a clear perspective from which to analyze final results.
- What are the metrics for our success?
- This gives a clear point to know when to finish the project.
2. Obtain Data (Data acquisition, data cleaning)
- What data do we have and what data do we need?
- Define the units of the data (people, cities, points in time, etc) and what features to measure.
- How will we sample more data?
- Scrape the web, collect manually, etc.
- Is our data representative of the population we want to study?
- If our data is not representative of our population of interest, then we can come to incorrect conclusions.
3. Understand the Data (Exploratory data analysis, data visualization)
- How is our data organized and what does it contain?
- Knowing what the data says about the world helps us better understand the world.
- Do we have relevant data?
- If the data we have collected is not useful to the question at hand, then we mush collected more data.
- What are the biases, anomalies, or other issus with the data?
- These can lead to many false conclusions if ignored, so data scientists must always be aware of these issues.
- How do we transform the data to enable effective analysis?
- Data is not always easy to interpret at first glance, so a data scientist should reveal these hidden insights.
4. Understand the World (Model Creation, prediction, inference)
- What does the data say about the world?
- Given our models, the data will lead us to certain conclusions about the real world.
- Does it answer our questions or accurately solve the problem?
- If our model and data can not accomplish our goals, then we mush reform our question, model, or both.
- How robust are our conclusions and can we trust the predictions?
- Inaccruate models can lead to untrue conclusions.
'Computer Science 🌋 > Machine Learning🐼' 카테고리의 다른 글
Add & Remove Columns (0) | 2023.05.23 |
---|---|
Handy Utility Functions in Pandas (0) | 2023.05.23 |
Conditional Selection in Pandas (0) | 2023.05.23 |
Indexing in Pandas (0) | 2023.05.22 |
Basics in Pandas (0) | 2023.05.22 |