Exploratory data analysis (EDA) is the process of exploring data and investigating its structure to discover patterns and spot anomalies from said patterns.
EDA would then involve summarizing the data with the use of statistics and visualization methods to spot non-numerical patterns.
Ideally, EDA should bring out insights and realizations from data that cannot be obtained through formal modeling and hypothesis testing.
When done properly, EDA can dramatically simplify or advance your data science problem and may even solve it!
A proper EDA hopes to accomplish several goals:
Cash flow plays a key role in the success of the company’s operations. While cash flow obligations may be fixed, there are multiple ways to meet these such as borrowing from a line of credit or raising short-term commercial paper.
Each action, however, has a corresponding cost and/or return associated with it and the combination of available actions may make it difficult to choose the best one.
Luckily, linear programming and Python can help us solve this problem.
Suppose for example that your company has the following projected cash flow:
In my previous article, we saw that the optimal portfolio that considering the returns and riskiness of stocks, there can only be one combination that can be considered optimal. For proof and theoretical discussion, please refer to my previous article.
Using the fastquant package, we can generate this easily. I recommend you try this as fastquant’s process of generating the optimal portfolio is in accordance with the theoretical and mathematical foundation of the optimal portfolio.
pip install fastquant
Let us verify two things in this article:
The first bar chart races can be traced to 2017 but it started to become popular, sometime in 2018 with a bar chart race depicting the top 15 global brands between 2000 and 2018.
While these earlier bar charts are done using JavaScript and D3.js, a new package in Python makes it easier to create one and it is so easy!
For this exercise, let us the GDP dataset we can download from World Bank.
So let’s start making one!
pip install bar_chart_race
import bar_chart_race as bcr
import pandas as pd
import numpy as np#Supress Warning
import warnings
warnings.filterwarnings("ignore"…
If you have tried investing in the stock market, then you are most likely faced with multiple investment decisions such as “which stock to choose”, “which industry to focus on” and “how much should you allocate to each stock”.
Fortunately, Harry Markowitz provided an answer to the last question which is also considered as one of the most difficult problems in investing: portfolio security selection. His Moden Portfolio Theory (MPT) won him a Nobel Prize and introduced the ideas of portfolio investing and how securities’ risks and correlations impact the portfolio as a whole.
So you might think that there…
One of the mistakes I made as a rookie data scientist was placing heightened importance on the accuracy metric. Now, this is not to dismiss the importance of accuracy as a measure of machine learning (ML) performance. In some models, we aim to have high accuracy. After all, this metric is the one most understood by executive and business leaders.
But let me give you a real-life scenario when such a metric may be misleading:
“You are working in a successful fintech company. Your boss tasks you with a model that identifies fraudulent financial transactions.
Armed with your knowledge of…
“Life is like a landscape. You live in the midst of it but can describe it only from the vantage point of distance” — Charles Lindbergh
Distance metrics are essential in understanding a lot of machine learning algorithms and therefore resolution of real-world problems. There are numerous distance metrics out there and data scientists should be able to understand most of them to make models more meaningful.
For a geospatial data scientist, there is an added benefit to this exercise: the feature creation from longitude and latitude.
Longitude and Latitude, while represented as floats, are more similar to categorical or…
The Fourth Industrial Revolution has, in no doubt, awakened the importance of having a data strategy for companies. Companies are either figuring out whether to build their own or to outsource data science (DS) capabilities.
In this article, I’ll try to combine the three different domains of Real Estate, Data Science, and Finance. It will be a daunting task and we need to ensure that we do so in a way that enhances all these domains in writing.
We will be using the fastquant package for this. Fastquant is a package developed by a team of Filipino developers. I am very proud of this and proves what I have been thinking all along: that we do not lack talent. So let’s all support our fellow Filipinos and use this package!
To begin, we need to…
For most of us, coffee has become a non-negotiable in our morning routine.
But in a lot of urban planning studies, the number of coffee shops has become indicative of the level of development of a place or a city. It is not really surprising as coffee shops are avenues for business and client meetings and are found in business districts where busy employees need quick access to a coffee fix.
As such, just by knowing the number of coffee shops in a location, may tell us something valuable. But how do we proceed?
So now that we know the…
A passionate analytics leader interested in real estate, finance, and economics, contributing to the world, one cup of coffee and a story at a time.