One of the mistakes I made as a rookie data scientist was placing heightened importance on the accuracy metric. Now, this is not to dismiss the importance of accuracy as a measure of machine learning (ML) performance. In some models, we aim to have high accuracy. After all, this metric is the one most understood by executive and business leaders.
But let me give you a real-life scenario when such a metric may be misleading:
“You are working in a successful fintech company. Your boss tasks you with a model that identifies fraudulent financial transactions.
Armed with your knowledge of classification algorithms, you design and implement an algorithm that returns a whopping 98% accuracy. You make your pitch successfully, and the management was impressed at how quickly you came up with this model. …
“Life is like a landscape. You live in the midst of it but can describe it only from the vantage point of distance” — Charles Lindbergh
Distance metrics are essential in understanding a lot of machine learning algorithms and therefore resolution of real-world problems. There are numerous distance metrics out there and data scientists should be able to understand most of them to make models more meaningful.
For a geospatial data scientist, there is an added benefit to this exercise: the feature creation from longitude and latitude.
Longitude and Latitude, while represented as floats, are more similar to categorical or nominal data. Increasing or decreasing them in magnitude may not give you or your model something meaningful. …
The Fourth Industrial Revolution has, in no doubt, awakened the importance of having a data strategy for companies. Companies are either figuring out whether to build their own or to outsource data science (DS) capabilities.
In this article, I’ll try to combine the three different domains of Real Estate, Data Science, and Finance. It will be a daunting task and we need to ensure that we do so in a way that enhances all these domains in writing.
We will be using the fastquant package for this. Fastquant is a package developed by a team of Filipino developers. I am very proud of this and proves what I have been thinking all along: that we do not lack talent. So let’s all support our fellow Filipinos and use this package!
To begin, we need to import fastquant. To install this, simply type in your powershell (I am using Windows). …
For most of us, coffee has become a non-negotiable in our morning routine.
But in a lot of urban planning studies, the number of coffee shops has become indicative of the level of development of a place or a city. It is not really surprising as coffee shops are avenues for business and client meetings and are found in business districts where busy employees need quick access to a coffee fix.
As such, just by knowing the number of coffee shops in a location, may tell us something valuable. But how do we proceed?
So now that we know the importance of this data, how then should we collect it? …
Big data is incredible! The way Big Data manages to bring sciences and business domains to new levels is almost sort of magical. It allows us to tap into a variety of avenues to access the information we normally do not access in order to gain fresh insights.
Satellite images are an amazing unconventional source to tap into for these fresh insights. Urban planners, environmentalists, and geospatial data scientists usually turn to satellite images for a big picture perspective and find insights from this exercise that brings their intended solutions to a whole new level.
Crash Course in Satellite Image Analysis a.k.a. …
Population density is a crucial concept in urban planning. Theories on how it affects economic growth are divided. Some claim, as Rappaport does, that an economy is a form of “spatial equilibrium”: that net flows of residents and employment gradually move to be balanced with one another.
The thought that density has some sort of relationship with economic growth has long been established by multiple studies. But whether the same theory holds for the Philippines and to what predates what (density follows urban development or urban development follows density) is a classic data science problem.
Before we can test out any models, however, let’s do a fun exercise and visualize our dataset. …
If you want to do some real estate analytics, the GeoPandas package offers an amazing way to manipulate geographic information. It extends the datatypes used by pandas to allow spatial operations on geometric types.
Examples of spatial operations are map overlay (combining two maps together), simple buffering but more popularly, GeoPandas can be used to do Geovisualization.
Installation
While GeoPandas is a powerful package for spatial manipulation, installation is a bit challenging for some. It requires dependencies that are difficult to harmonize.
I found one way to effectively install this on my laptop and reproducing the following steps may work for you as well. …
Ask anyone you know and chances are, real estate occupies a significant portion of their expenses or investments. Whether rented, bought, or mortgaged, real estate is a big chunk of our budget.
And because of the costs associated with it, we spend a lot of time researching our property of choice.
Real estate belongs to the class of investments known as “Alternative Investments”. Some of the characteristics of alternative investments are the following: low liquidity, lower transparency, and usually involves a high cash outlay.
Let’s discuss what those characteristics mean in the context of real estate. Low liquidity for real estate means that it takes a lot of time to sell it. …
About