WordCloud displays the most frequent words used in a text where the size of the text is proportional to the frequency it was used in the text; the larger the font, the more times the word appeared in the document.
So for an exploratory data analysis, WordCloud may provide some interesting insights to follow up on or investigate
For this exercise, let us try the Tripadvisor_Hotel_Review dataset. This one can be downloaded or accessed from Kaggle.
If you haven’t installed the WordCloud package, you can do so by opening your terminal and typing:
pip install wordcloud
The following are our…
Knowing the yield of a bond is important for comparison with other investments. For example, the bond’s yield can be used to compare it with dividend yields of equity investments since both measures the cash flow an investor would periodically receive.
In addition, the yield of a bond can be used to compare it with the bonds having different maturity.
In this article, we’ll try to solve the bond yield using Newton’s method. We’ll create the most efficient code that is capable of calculating the yield, given varying cash flows and time periods.
For those who simply want the code…
FiveThirtyEight (sometimes written as 538), is a website that analyzes data on poll topics such as politics, economics, and sports. It takes its name from the number of electors in the United States electoral college.
But one of the things that contribute to their popularity is the effectiveness of their visualization in relaying poll results.
We can learn a thing or two in improving our visualization to make it more professional-looking and captivating.
We will refer to the graphs created by FiveThirtyEight as FTE graphs for the rest of the article.
In this article, we will transform this plain-looking graph:
Have you ever stumbled upon an error in the calculation of IRR using Excel solvers? Luckily, programming can help us arrive at and calculate complex cash flows’ IRR and arrive at a solution. In this article, we will try to accomplish this through an algorithm called the Binary Search Algorithm.
The binary search algorithm is best used for finding a solution within a given array or range of possible solutions. Other terms for the binary search algorithm are the half-interval search, the logarithmic search and, the binary chop.
We will discuss the basics of algorithms. For those who are only…
Suppose you are an investment officer for your company and you are tasked to come up with the least amount of money now to cover a future stream of liabilities.
Say, for example, a real estate developer who has an accurate forecast of cash outflow for a development project. Cash outflows extend to multiple years but are known from the beginning of these projects. Or maybe your company just wants to ensure they can cover the future pension payments that are arising in future years. …
Linear regression models play a huge role in the analytics and decision-making process of many companies, owing in part to their ease of use and interpretability.
There are instances, however, that the presence of certain data points affects the predictive power of such models. These data points are known as influential points.
We note in the previous paragraph that influential data points affect the predictive power of linear regression models. And influential data points do so, by greatly influencing the regression coefficient/s.
It is easy to mistake these points with “outliers”, however, they have different definitions. Not all outliers are…
Exploratory data analysis (EDA) is the process of exploring data and investigating its structure to discover patterns and spot anomalies from said patterns.
EDA would then involve summarizing the data with the use of statistics and visualization methods to spot non-numerical patterns.
Ideally, EDA should bring out insights and realizations from data that cannot be obtained through formal modeling and hypothesis testing.
When done properly, EDA can dramatically simplify or advance your data science problem and may even solve it!
A proper EDA hopes to accomplish several goals:
Cash flow plays a key role in the success of the company’s operations. While cash flow obligations may be fixed, there are multiple ways to meet these such as borrowing from a line of credit or raising short-term commercial paper.
Each action, however, has a corresponding cost and/or return associated with it and the combination of available actions may make it difficult to choose the best one.
Luckily, linear programming and Python can help us solve this problem.
Suppose for example that your company has the following projected cash flow:
In my previous article, we saw that the optimal portfolio that considering the returns and riskiness of stocks, there can only be one combination that can be considered optimal. For proof and theoretical discussion, please refer to my previous article.
Using the fastquant package, we can generate this easily. I recommend you try this as fastquant’s process of generating the optimal portfolio is in accordance with the theoretical and mathematical foundation of the optimal portfolio.
pip install fastquant
Let us verify two things in this article:
The first bar chart races can be traced to 2017 but it started to become popular, sometime in 2018 with a bar chart race depicting the top 15 global brands between 2000 and 2018.
While these earlier bar charts are done using JavaScript and D3.js, a new package in Python makes it easier to create one and it is so easy!
For this exercise, let us the GDP dataset we can download from World Bank.
So let’s start making one!
pip install bar_chart_race
import bar_chart_race as bcr
import pandas as pd
import numpy as np#Supress Warning
import warnings
warnings.filterwarnings("ignore"…
A passionate analytics leader interested in real estate, finance, and economics, contributing to the world, one cup of coffee and a story at a time.