DATA SCIENCE. ANALYTICS. PYTHON

Basic and Advanced Techniques for the 21st-Century Data Scientist

Photo by Matt Walsh on Unsplash

As we mentioned in the first article in a series dedicated to the study of missing data, the knowledge of the mechanism or structure of “missingness” is crucial because our handling method would primarily depend on it.

In Handling “Missing Data” Like a Pro — Part 1 — Deletion Methods, we have discussed deletion methods.

In Handling “Missing Data” Like a Pro — Part 2: Imputation Methods, we discussed simple imputation methods. While some imputation methods are deemed appropriate for a specific type of data, e.g. normally distributed data, MCAR missingness, etc., these methods are criticized mostly for biasing our…


Data Science. Analytics. Statistics. Python.

Basic and Advanced Techniques for the 21st-century Data Scientist

Photo by Jon Tyson on Unsplash

As we mentioned in the first article in a series dedicated to missing data, the knowledge of the mechanism or structure of “missingness” is crucial because our responses would depend on them.

In Handling “Missing Data” Like a Pro — Part 1 — Deletion Methods, we have discussed deletion methods.

For this part of the article, we will be focusing on imputation methods. We will be comparing the effects on the dataset, as well as the advantages and disadvantages of each method.

LOAD THE DATASET AND SIMULATE MISSINGNESS

Load the Adult dataset and simulate an MCAR dataset found in this article.

IMPUTATION METHODS

Now that we have a…


Data Science. Analytics. Statistics. Python.

Basic and Advanced Techniques for the 21st Century Data Scientist

Photo by Emily Morter on Unsplash

As we mentioned in the first article in a series dedicated to missing data, the knowledge of the mechanism or structure of “missingness” is crucial because our responses would depend on them.

While the list of techniques is growing for handling missing data, we discuss some of the most basic to the most celebrated techniques below. These techniques include data deletion, constant single, and model-based imputations, and so many more.

Before we begin discussing them, please note that the application of these techniques requires discernment from the data scientist. …


DATA SCIENCE. ANALYTICS. DATA ENGINEERING.

And Why It’s Important To Know Them

Photo by Matt Walsh on Unsplash

INTRODUCTION

If you ask data scientists what is the one problem in data they wish they can avoid but cannot, chances are they will all respond with missing data.

You know how they say that the only thing certain in life are death and taxes? Well for Data Scientists, missing data is probably the third on that list.

Missing data, in general, restricts the effectiveness of our machine learning (ML)models, especially when applied to real-world use cases.

If we think about it, one of the most popular suggestions to making ML models better is getting more data and more observations. Therefore…


FINANCE. PORTFOLIO ANALYTICS. DATA SCIENCE. OPTIMIZATION

Using Investpy and the Monte Carlo Method to Determine the Optimal Portfolio Allocation

Photo by Clay Banks on Unsplash

If you have tried investing in the stock market, then you are most likely faced with multiple investment decisions such as “which stock to choose”, “which industry to focus on” and “how much should you allocate to each stock”.

Fortunately, Harry Markowitz provided an answer to the last question which is also considered as one of the most difficult problems in investing: portfolio security selection. His Moden Portfolio Theory (MPT) won him a Nobel Prize and introduced the ideas of portfolio investing and how securities’ risks and correlations impact the portfolio as a whole.

So you might think that there…


Finance. Analytics. Data Science.

Analysis of Linear vs Compounded Returns

Photo by Luke Chesser on Unsplash

INTRODUCTION: PRICES VS. RETURNS

Financial returns are at the core of financial analytics. Returns allow variables to be represented comparably. This comparability, therefore, allows the true analytical relationship between securities and assets. But what about prices?

To see the difference between prices and returns more clearly, a 10-cent gain for a stock that sells 0.20 cents (price) represents 50% gain (return) while a 10-cent gain for a stock that sells at $20.00 (price) is only a 0.5% (return) gain. Because the originating values of the two securities are far apart from each other, an absolute gain of 10-cents has a differing level of impressiveness.


PYTHON. DATA VISUALIZATION. ANALYTICS. MARKETING.

Generating a WordCloud Visualization Through Python

Image by Author

WordCloud displays the most frequent words used in a text where the size of the text is proportional to the frequency it was used in the text; the larger the font, the more times the word appeared in the document.

So for an exploratory data analysis, WordCloud may provide some interesting insights to follow up on or investigate

For this exercise, let us try the Tripadvisor_Hotel_Review dataset. This one can be downloaded or accessed from Kaggle.

PRELIMINARIES

If you haven’t installed the WordCloud package, you can do so by opening your terminal and typing:

pip install wordcloud

The following are our…


PYTHON. OPTIMIZATION. FINANCE.

Calculate the Yield of Corporate Bonds Using Market Information

Photo by Katie Harp on Unsplash

Knowing the yield of a bond is important for comparison with other investments. For example, the bond’s yield can be used to compare it with dividend yields of equity investments since both measures the cash flow an investor would periodically receive.

In addition, the yield of a bond can be used to compare it with the bonds having different maturity.

In this article, we’ll try to solve the bond yield using Newton’s method. We’ll create the most efficient code that is capable of calculating the yield, given varying cash flows and time periods.

For those who simply want the code…


PYTHON. DATA VISUALIZATION. CHARTS

Create Professional Looking Graphs Using Only the Matplotlib Package

Photo by Clay Banks on Unsplash

INTRODUCTION

FiveThirtyEight (sometimes written as 538), is a website that analyzes data on poll topics such as politics, economics, and sports. It takes its name from the number of electors in the United States electoral college.

But one of the things that contribute to their popularity is the effectiveness of their visualization in relaying poll results.

We can learn a thing or two in improving our visualization to make it more professional-looking and captivating.

We will refer to the graphs created by FiveThirtyEight as FTE graphs for the rest of the article.

In this article, we will transform this plain-looking graph:


Calculating Monthly IRR Using Python and the Binary Search Algorithm

Photo by Heriberto Arias on Unsplash

Introduction

Have you ever stumbled upon an error in the calculation of IRR using Excel solvers? Luckily, programming can help us arrive at and calculate complex cash flows’ IRR and arrive at a solution. In this article, we will try to accomplish this through an algorithm called the Binary Search Algorithm.

The binary search algorithm is best used for finding a solution within a given array or range of possible solutions. Other terms for the binary search algorithm are the half-interval search, the logarithmic search and, the binary chop.

We will discuss the basics of algorithms. For those who are only…

Francis Adrian Viernes

A passionate analytics leader interested in real estate, finance, and economics, contributing to the world, one cup of coffee and a story at a time.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store