As enthusiastic Data Scientist

Irvan sikajudin
4 min readApr 17, 2022

--

As an enthusiactic data scientist and who especially not from a special educational background related to data, we must do more effort to become a full data scientist, my personal opinion there are several steps that must be understood.

Image Source : https://www.thetastatistik.com/mengenal-data-scientist-profesi-naik-daun-berprospek-cerah/

The steps are as follows:

  1. Learn Statistics
  2. Learn Python
  3. Data Collection
  4. Data Preprocessing (cleaning, manipulation)
  5. EDA( Exploratory Data Analysis)
  6. Machine Learning & Deep Learning
  7. Learn Deploying of ML model
  8. Real-World Testing
  9. Exploring and Practicing datasets on Kaggle.
  10. Analytical Curiosity
  11. Non-Technical Skills
  12. Study statistics.

1. Learn Statistics

The first step we must be able to understand statistics, statistics will help us understand the meaning of the existing data, for the initial stage we can learn about descriptive statistics and inferential statistics, then learn some common statistical methods such as simple linear regression and multiple linear regression.

2. learn python

The second step we must be able to understand the tool in writing commands so that the computer can understand the intentions and commands from us, the tool there are several types but for wider application and universal the most suitable tool is the python programming language, because it can be widely implemented on various media.

3. learn data collection

This is one of the key and important steps in the field of Data Science. This skill involves knowledge of various tools for importing data from both local systems, as CSV files, and scraping data from websites.

4. data prepocessing

This is the step where most of the time is spent as a Data Scientist. Data cleaning is about getting data, suitable for doing work & analysis, by removing unwanted values, missing values, categorical values, outliers, and incorrectly sent records, from raw data forms. Data Cleansing is very important because real-world data is falling apart and achieving it with the help of various python libraries (Panda and NumPy) is very important for aspiring Data Scientists.

5. EDA

EDA (Exploratory data analysis) is the most important aspect in the broad field of data science. This includes analyzing various data, variables, various data patterns, trends, and extracting useful insights from it with the help of various graph and statistical methods. EDA identifies a variety of patterns that machine learning algorithms may fail to identify. It includes all Data Manipulation, Analysis, and Visualization.

6. Machine Learning & Deep Learning

Machine learning is the core skill needed to become a Data Scientist. Machine learning is used to build various prediction models, classification models, etc., and is used by large companies, companies to optimize their planning as predicted. For example, car price predictions

Deployment is basically the process of making your Machine Learning Model available for end users to use. This is achieved by integrating the model with various existing production environments thus applying the practical use of ml models to a wide range of Business solutions.

There are many services to deploy your ML model such as Flask, Pythoneverywhere, MLOps, Microsoft Azure, Google Cloud, Heroku, etc.

8. Real-World Testing

Real-World Testing and Validation of Machine Learning Models after Deployment Should Be Done To check their effectiveness and accuracy. Testing is an Important Step In Data Science to keep the efficiency and effectiveness of ML models under control

9. Explore and Practice data sets in Kaggle

The World’s Largest Data Science Community like Kaggle is helpful for connecting with various data sets and can therefore be used to practice Various Data analysis techniques, machine learning algorithms. Competitions held in these communities are also useful for honing data science skills, thus helping us achieve our goal of becoming proficient in Data Science faster.

10. Analytical Curiosity

The field of data science is a field that develops at a higher speed, therefore it requires an ingrained curiosity to explore more about the field, update regularly and learn a variety of skills & techniques.

This is the main skill that will always help us in maintaining, updating new skills & concepts, thus preventing us from falling behind from various advances in Data Science technology.

11. Non-Technical Skills

Non-Technical includes Teamwork, Communication Skills, Task management, Business understanding, etc.

Teamwork plays an important role when delivering results to companies, companies where we work as data scientists.

Communication skills allow us to express our technical ideas, our concepts to various non-technical staff/authorities of the Company.

Task Management involves proper management and planning to provide solutions.

Understanding/business intelligence or understanding of the industry we are in is essential for a variety of effective analyses and solutions to problems in the industry.

--

--

Irvan sikajudin
Irvan sikajudin

Written by Irvan sikajudin

My degree is Bachelor of economics (S.E) with a concentration in business and economics, but I called myself an enthusiastic data scientist.