Step by Step procedure on how to start your first data science project?

So you want to write “Data Scientist” in your LinkedIn profile?

Well this article will give you a “Strategy” on “how to start a data science project” and few links to get started with. But wait before you start, you know that data science is much more than just using cool ML algorithms right?

I read a lot of articles online before actually starting my first data science project and they all pretty much said that there are n number of steps to follow and you are done.

But when I started I chose some complicated dataset and was left confused because they just asked me to choose the dataset of the topic which I like. So before we start I would love to suggest some new datasets to start with.

  1. For Classification Problem- Titanic Challenge on Kaggle
  2. For Regression Problem- Boston House Prices
  3. For Natural Language Processing- Bag of Words Meets Bags of Popcorn
  4. For Computer Vision Fundamentals- Digit Recognizer
  5. For Sentiment Analysis- IMDB Dataset
  6. For Image Processing- Identify the Digits

Now go through all the datasets and choose what dataset would you want to start with. I would recommend to start with Classification or Regression Problem in the beginning as they will lay certain basics and in the long term you will understand what exactly is going on in the dataset.

Now coming back to those certain “Steps” :-

The first step in this whole process is to choose your ideal IDE. Now there are many IDE’s in the market,

So start with any IDE of your choice.

You know they say, “Picking up a dataset is an art”. A dataset is something that would be defining your project. So pick up a dataset that matches your ultimate goal, it can be a long process sometimes. Some of the websites for getting datasets are -

So, if still you are super confused then blindly start with one of the datasets recommended above. Take your time to find that perfect one to start off with.

. So what comes under this? Well looks at the data and try to find answers to these questions-

Q1- What kind of data you have? Check whether it is structured, unstructured, numerical or any other type?

Q2- Does your data have any missing values? What to do with those missing values?

Q3- What is your ultimate goal with the problem? Like in Boston Dataset you should predict the prices of the houses based on various factors. So try to understand the factors size, tax etc.(feature variables), and try to get the house prices(Target variable).

Well if you are here now, try to have answers for all the questions above. So now you have the answers let’s jump straight into data preprocessing.

Data preprocessing the 4th major step in the process is the process where you start to play with data and gain insights from data. So there are a few processes in this step, let’s just have a glance over some of those processes-

Data cleaning- Ask a data scientist what’s the most boring part of being 1 and they’ll say data cleaning. The process of removing unnecessary, duplicate and missing data. However, in order to prevent wrongful predictions, it is important to get rid of any inconsistencies in the data.

EDA- EDA stands for “Exploratory Data Analysis”, which in simple words means a process to gain the insights from the data before modelling it. Perform univariate and multi-variate analysis on the datasets to find hidden insights and patterns in them.

This step is all about in order to solve the problem, build a model. Now in this there are few steps involved too. So let’s have a look at few steps-

Model Building- Try and test all possible models on the dataset before you choose the right one. Also use this step to try out and understand how the different algorithm works. Train and test model.

Model Evaluation- This is an important step in this process, you can get to know the accuracy of the model you built and based on that decide which algorithm best suits your algorithm.

So after this step you should be clear about what you built and how you did it.

So after all these steps what’s the last step? Fine tune your model and make it look presentable. Try to understand your result and check outcomes. Try and deploy your model or do whatever you want to do with it.

So these were some steps which can be followed to start you first machine learning project.

Signing off Palash Chaturvedi.