Whether you have a messy dataset to clean and analyze for insights, or you want to prepare data for a machine learning model, pandas is the library for you. It is simple to use, fast, and highly intuitive.
Pandas was authored specifically for data science and is packaged with additional features from other libraries. This means that you can work only in pandas without importing other libraries as we’ll see in the article.
Reshaping a dataframe usually involves converting columns into rows or vice versa.
There are a few reasons to reshape a dataframe;
I used to mostly Google whenever I needed to use any of these functions and copy-paste the solution. Thanks stackoverflow!
In this article, I talk about Pandas .melt(), .stack(), and .wide_to_long(). …
When I first came across lambda functions in python, I was very much intimidated and thought they were for advanced Pythonistas. Beginner python tutorials applaud the language for its readable syntax, but lambdas sure didn’t seem user-friendly.
However, once I understood the general syntax and examined some simple use cases, using them was less scary.
Simply put, a lambda function is just like any normal python function, except that it has no name when defining it, and it is contained in one line of code.
lambda argument(s): expression
A lambda function evaluates an expression for a given argument. You give…
Data extraction involves pulling data from different sources and converting it into a useful format for further processing or analysis. It is the first step of the Extract-Transform-Load pipeline (ETL) in the data engineering process.
As a data scientist, you might need to combine data that is available in multiple file formats such as JSON, XML, CSV and SQL.
In this tutorial, we will use python libraries such as pandas, json, and requests to read data from different sources and load them into a Jupyter notebook as a pandas dataframe.
This refers to a ‘ comma-separated values’ file that is…
Data cleaning is the process of removing inconsistencies and errors from data that would undermine the efficiency of a machine learning model. Though time-intensive, the process is worth it because good quality data is better than fancy algorithms.
That said, different data cleaning procedures apply to different situations. A good understanding of the data and its properties is essential.
Below, I outline a ‘template’ you can use to identify unclean data and different ways to efficiently clean it. As you proceed, regularly check in with this article for specific python codes for exploratory analysis of the data.
These are the…
“…By the time I would finish school I’ll be fifty? He smiled.
“You’re going to be fifty anyhow”
― Edith Eva Eger, The Choice: Embrace the Possible
We all pass through different phases where we seek to reinvent ourselves or start something that might completely change our life’s direction. Naturally, a drastic change of such magnitude is not only mind-boggling but might require a lot of time and possibly HARD work.
As a woman in my thirties and currently transitioning into data science, this question was especially relevant.
The StackOverflow survey is an annual event that targets developers…
Google Colab is a free Jupyter notebook environment from Google whose runtime is hosted on virtual machines on the cloud.
With Colab, you need not worry about your computer’s memory capacity or Python package installations. You get free GPU and TPU runtimes, and the notebook comes pre-installed with machine and deep-learning modules such as Scikit-learn and Tensorflow. That being said, all projects require a dataset and if you are not using Tensorflow’s inbuilt datasets or Colab’s sample datasets, you will need to follow some simple steps to have access to this data.
Deep learning is a branch of machine learning whereby you feed a machine with data and answers, and the machine figures out the rules by which the answers are derived. The answers are the labels for which the data represents for example for data about house prices, the label is the price and the data is the various aspects of a house that affect the price. Another example is image-data about cats and dogs, and the labels are whether an animal is a cat or a dog.
Following my previous article on the 11 code blocks for EDA which covered a regression task (predicting a continuous variable), here are the 13 code blocks for performing EDA on a classification task (predicting a categorical or binary feature).
EDA or Exploratory Data Analysis is an important machine learning step that involves learning about the data without spending too much time or getting lost in it. Here, you get familiar with the structure and general characteristics of the dataset, and the independent and dependent features, and their interactions. …
It was a bright Monday morning. Our wedding day was approaching and a meeting with the priest that would marry us off was in two hours.
But when my fiance’ opened the front door, he froze. He looked back at me as I struggled with my shoes with a bewildered and puzzled look. “Did we park our car here?” He asked. “Well, yes..” I answered back, thinking back to the previous night.
“The truck is not here.” “ What?” I squeezed my way through the door. “Huh! What the hell!?”