A tutorial on the functions and codes you will use in every project

Photo by CHUTTERSNAP on Unsplash

Whether you have a messy dataset to clean and analyze for insights, or you want to prepare data for a machine learning model, pandas is the library for you. It is simple to use, fast, and highly intuitive.

Pandas was authored specifically for data science and is packaged with additional features from other libraries. This means that you can work only in pandas without importing other libraries as we’ll see in the article.

A big disadvantage with pandas is that it fails with big data. This is because pandas stores its data structures in RAM which can run out of…


Understand how to reshape a Pandas DataFrame using practical examples

Photo by Pixabay from Pexels

Reshaping a dataframe usually involves converting columns into rows or vice versa.

There are a few reasons to reshape a dataframe;

  • To tidy up a messy dataset so that each variable is in its column and each observation in its row.
  • To prepare part of the dataset for analysis or visualization.

I used to mostly Google whenever I needed to use any of these functions and copy-paste the solution. Thanks stackoverflow!

In this article, I talk about Pandas .melt(), .stack(), and .wide_to_long(). …


How, when to use, and when not to use Lambda functions

Photo by Pixabay from Pexels

Introduction

When I first came across lambda functions in python, I was very much intimidated and thought they were for advanced Pythonistas. Beginner python tutorials applaud the language for its readable syntax, but lambdas sure didn’t seem user-friendly.

However, once I understood the general syntax and examined some simple use cases, using them was less scary.

Syntax

Simply put, a lambda function is just like any normal python function, except that it has no name when defining it, and it is contained in one line of code.

lambda argument(s): expression

A lambda function evaluates an expression for a given argument. You give…


How to obtain different format datasets for your data science and machine learning portfolio

Image by Gerd Altmann from Pixabay

Data extraction involves pulling data from different sources and converting it into a useful format for further processing or analysis. It is the first step of the Extract-Transform-Load pipeline (ETL) in the data engineering process.

As a data scientist, you might need to combine data that is available in multiple file formats such as JSON, XML, CSV and SQL.

In this tutorial, we will use python libraries such as pandas, json, and requests to read data from different sources and load them into a Jupyter notebook as a pandas dataframe.

1. CSV files

This refers to a ‘ comma-separated values’ file that is…


A practical guide to efficiently identify and clean messy data

Photo by The Creative Exchange on Unsplash

Data cleaning is the process of removing inconsistencies and errors from data that would undermine the efficiency of a machine learning model. Though time-intensive, the process is worth it because good quality data is better than fancy algorithms.

That said, different data cleaning procedures apply to different situations. A good understanding of the data and its properties is essential.

Below, I outline a ‘template’ you can use to identify unclean data and different ways to efficiently clean it. As you proceed, regularly check in with this article for specific python codes for exploratory analysis of the data.

1. Remove unwanted observations

These are the…


Using developer survey results to tackle three questions related to age of learning to code

Photo by olia danilevich from Pexels

“…By the time I would finish school I’ll be fifty? He smiled.
“You’re going to be fifty anyhow”

― Edith Eva Eger, The Choice: Embrace the Possible

We all pass through different phases where we seek to reinvent ourselves or start something that might completely change our life’s direction. Naturally, a drastic change of such magnitude is not only mind-boggling but might require a lot of time and possibly HARD work.

As a woman in my thirties and currently transitioning into data science, this question was especially relevant.

The data

The StackOverflow survey is an annual event that targets developers…


Learn how to upload and access large datasets on a Google Colab Jupyter notebook for training deep-learning models

Photo by Joshua Sortino on Unsplash

Google Colab is a free Jupyter notebook environment from Google whose runtime is hosted on virtual machines on the cloud.

With Colab, you need not worry about your computer’s memory capacity or Python package installations. You get free GPU and TPU runtimes, and the notebook comes pre-installed with machine and deep-learning modules such as Scikit-learn and Tensorflow. That being said, all projects require a dataset and if you are not using Tensorflow’s inbuilt datasets or Colab’s sample datasets, you will need to follow some simple steps to have access to this data.

Understanding Colab’s file system

The Colab notebooks you create are saved in…


A Beginner’s introduction to deep learning, neural networks, Tensorflow, and Keras with hands-on implementation

Image by Gerd Altmann from Pixabay

Deep learning is a branch of machine learning whereby you feed a machine with data and answers, and the machine figures out the rules by which the answers are derived. The answers are the labels for which the data represents for example for data about house prices, the label is the price and the data is the various aspects of a house that affect the price. Another example is image-data about cats and dogs, and the labels are whether an animal is a cat or a dog.

Defining key terminologies

Artificial neural networks, or ANNs, are the building blocks of deep learning. ANNs…


Explore any data set for a machine learning classification task with this quick guide

Photo by Jordan Whitt on Unsplash

Following my previous article on the 11 code blocks for EDA which covered a regression task (predicting a continuous variable), here are the 13 code blocks for performing EDA on a classification task (predicting a categorical or binary feature).

EDA or Exploratory Data Analysis is an important machine learning step that involves learning about the data without spending too much time or getting lost in it. Here, you get familiar with the structure and general characteristics of the dataset, and the independent and dependent features, and their interactions. …


How do you react to those little annoying unplanned interruptions?

Photo by Marcelo Chagas from Pexels

It was a bright Monday morning. Our wedding day was approaching and a meeting with the priest that would marry us off was in two hours.

But when my fiance’ opened the front door, he froze. He looked back at me as I struggled with my shoes with a bewildered and puzzled look. “Did we park our car here?” He asked. “Well, yes..” I answered back, thinking back to the previous night.

“The truck is not here.” “ What?” I squeezed my way through the door. “Huh! What the hell!?”

Living in a third-world country, having your vehicle stolen is…

Susan Maina

Data scientist, Machine Learning Enthusiast. LinkedIn https://www.linkedin.com/in/suemnjeri

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store