Add Linear Regression Webinar

950c103a · Asim · 3c3a792c · 950c103a · 950c103a · 950c103a
Commit 950c103a authored Apr 21, 2021 by Asim
6 changed files
--- a/Linear Regression from Scratch/Linear_Regression_From_Scratch.ipynb
+++ b/Linear Regression from Scratch/Linear_Regression_From_Scratch.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "patent-citizen",
+   "metadata": {},
+   "source": [
+    "# Coding up a Linear Regression Algorithm from scratch\n",
+    "This notebook will walk you through all the concepts required to code up a linear algorithm that works a lot like the scikit-learn implementation. As a pre-requisite to coding up the algorithm, this notebook will walk you through some basic concepts required to make everything work. Particularly, this notebook will walk you through:\n",
+    "- Vectorized Operations using Numpy\n",
+    "- Object Oriented Programming in Python"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "reserved-cache",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import seaborn as sns\n",
+    "%matplotlib inline\n",
+    "import matplotlib.pyplot as plt\n",
+    "plt.style.use('seaborn')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "lovely-proof",
+   "metadata": {},
+   "source": [
+    "## Loading the Data\n",
+    "We will be using a very simple dataset that can be used without any real feature engineering. Of course, to produce better results on this dataset some feature engineering will **always** be helpful, but since the goal here is to walk through a specific algorithm, we want to preserve time and pick a dataset that is already built for machine learning."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "polish-frank",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_boston\n",
+    "boston = load_boston()\n",
+    "data = pd.DataFrame(boston.data)\n",
+    "data.columns = boston.feature_names\n",
+    "data['PRICE'] = boston.target\n",
+    "data.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "military-laundry",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "demanding-coupon",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Check if the data types are all good for an ML problem\n",
+    "data.dtypes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "treated-morning",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Check if there are no null values\n",
+    "data.isnull().sum()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "commercial-victory",
+   "metadata": {},
+   "source": [
+    "In a real data science process, we would spend far more time on exploring this dataset and inferring what each of the columns mean. For now, we will be skipping this step because the values are all float, and there are no null values. Thus, we know that the data is technically okay to be used for linear regression. Again, in a real-world scenario, **feature engineering is essential**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "spoken-military",
+   "metadata": {},
+   "source": [
+    "### Convert Data to Numpy Arrays"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "restricted-executive",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "Y = data[['PRICE']].to_numpy()\n",
+    "X = data[boston.feature_names].to_numpy()\n",
+    "print(X.shape)\n",
+    "print(Y.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fuzzy-cooperation",
+   "metadata": {},
+   "source": [
+    "## Vectorized Implementation\n",
+    "Most mathematical formulae that you learn about for Linear Regression are in scalar format. While it is easy to convert those into Python functions using simple for loops, we will take a minute to understand why vectorized implementations are so important and how to do them intuitively. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "antique-friday",
+   "metadata": {},
+   "source": [
+    "### Numpy Operations\n",
+    "Numpy arrays are different from lists, and we can do many matrix operations using Numpy arrays. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "digital-purchase",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# define two numpy arrays\n",
+    "a = np.array([[1,2],[3,4]])\n",
+    "b = np.array([[1,1],[1,1]])\n",
+    "\n",
+    "display(a)\n",
+    "display(b)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "chief-scanning",
+   "metadata": {},
+   "source": [
+    "#### Basic Operations"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "raising-liberty",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Numpy Array Addition \n",
+    "display(a + b)\n",
+    "\n",
+    "# Numpy Array Subtraction\n",
+    "display(a - b)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "systematic-decimal",
+   "metadata": {},
+   "source": [
+    "#### Multiplication\n",
+    "In linear algebra, \"multiplication\" as a general term is rarely used because there are multiple ways to get the product of two matrices. The first is element-wise multiplication, called the Hadamard Product, the second, more common method is Dot Product, which is what we study when we learn the basics of linear algebra. For matrices, the dot product is the multiplication of the rows of matrix A with the columns of Matrix B. We'll have a look at both of these below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "decimal-compensation",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "b"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "streaming-dodge",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Element-wise Multiplication\n",
+    "a*b"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "horizontal-adoption",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Dot Product\n",
+    "a@b"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "caring-hungarian",
+   "metadata": {},
+   "source": [
+    "Note that there are many ways to do the dot product using Numpy, ever since the latest version in Python 3 the one above is the most common, but do not get confused if you see any of the example below!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "experimental-passage",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a.dot(b)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "distinguished-birmingham",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "np.dot(a, b)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "equal-aircraft",
+   "metadata": {},
+   "source": [
+    "### Linear Regression Example\n",
+    "Here we will be looking at the most basic linear regression that uses root means squared error as the loss function. While we are not building a full class yet, it is important to know the difference between using a for-loop-based implementation versus a vectorized implementation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "alike-atmosphere",
+   "metadata": {},
+   "source": [
+    "#### The Hypothesis (Prediction)\n",
+    "A linear regression is simply the multiplication of each weight with the corresponding variable. Formally, the hypothesis is as follows:\n",
+    "\n",
+    "![hypothesis](images/hypothesis.png)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "logical-suggestion",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get the shape of the data\n",
+    "data.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "electoral-above",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Initialize weights\n",
+    "weights = np.ones(shape=(X.shape[1], 1))\n",
+    "\n",
+    "weights.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "assisted-japan",
+   "metadata": {},
+   "source": [
+    "##### Hypothesis using For Loop"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "neither-mainstream",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "all_predictions = []\n",
+    "for data_point in X:\n",
+    "    hypothesis = 0\n",
+    "    for theta, x in zip(weights, data_point):\n",
+    "        hypothesis += theta*x\n",
+    "    \n",
+    "    all_predictions.append(hypothesis)\n",
+    "    \n",
+    "all_predictions[0:10]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "equal-punch",
+   "metadata": {},
+   "source": [
+    "##### Hypothesis using Vectorized Operations\n",
+    "If you look at the code closely, you will notice that all we are doing is we are going over each **row** of the data, and multiplying the weights with all the columns. This means that if the data is of shape (m, n), and the weights are of shape (n, 1), then we can simply do a dot product and get the exact same result."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aware-demonstration",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "all_predictions = X @ weights\n",
+    "\n",
+    "all_predictions[0:10]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "intensive-observation",
+   "metadata": {},
+   "source": [
+    "## Object-Oriented Programming in Python\n",
+    "We will not be going into the details of OOP, that is an entire topic that requires a lot of theory, we will just be sticking to the basics for now. The first step in Object Oriented Programming is creating a **class**. A class is like a blueprint for how something should be defined. Once a class is defined, we can create **objects** of that class, which have all the essential *things* that are required in that class. What are those *things*?\n",
+    "\n",
+    "Firstly, every class has some **attributes**, which are simply variables that exist within the class. Beyond that, a class can have **methods**, which are functions that only the class has access to.\n",
+    "\n",
+    "The essentials for creating a class are:\n",
+    "- An \\_\\_init\\_\\_ function that is run whenever an object is created. The attributes created here are called *instance attributes*\n",
+    "- Other attributes can be created outside the \\_\\_init\\_\\_ function "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ahead-saskatchewan",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class Dog:\n",
+    "    # Every class must have a __init__ function that is run automati\n",
+    "    def __init__(self, name, age):\n",
+    "        self.name = name\n",
+    "        self.age = age\n",
+    "        \n",
+    "    def print_attributes(self):\n",
+    "        print(self.name)\n",
+    "        print(self.age)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "secret-thriller",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "doge = Dog(\"Tony\", 10)\n",
+    "\n",
+    "doge.print_attributes()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ecological-coffee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "doge.name = \"Ezekiel\"\n",
+    "\n",
+    "doge.print_attributes()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "black-result",
+   "metadata": {},
+   "source": [
+    "If you ever want to know what attributes exist in a class (let's say you are using a scikit-learn class), you can use the built-in function called *dir()*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "defensive-future",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dir(doge)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "black-venture",
+   "metadata": {},
+   "source": [
+    "Woah hold up! Only the last three things are something we created ourselves. Where did the rest of the methods/attributes come from? We will not be getting into all of them, but they are all automatically created if we want to do more things with the class. A particularly useful method is the \\_\\_dict\\_\\_ method. Let's see what it does!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "wireless-beads",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "doge.__dict__"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "beginning-costa",
+   "metadata": {},
+   "source": [
+    "It simply created a dictionary of the object and its attributes and returned it to us! This can be useful if we want to inspect the attributes of a particular class or store them."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "charming-discrimination",
+   "metadata": {},
+   "source": [
+    "### Static Methods vs Instance Methods\n",
+    "Have you noticed how there are a lot of scikit-learn functions that we can use without necessarily instantiating an object? These are called static methods. To run these, we do not need to create an object, we can simply use the class name to run those methods. See the example below!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "celtic-acquisition",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class random_functions:\n",
+    "    @staticmethod\n",
+    "    def print_hello_world():\n",
+    "        print(\"Hello World\")\n",
+    "        \n",
+    "    def instance_method(self):\n",
+    "        print(self)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "thrown-methodology",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "random_functions.print_hello_world()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "varying-coaching",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "random_functions.instance_method()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "brutal-hardware",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "random_object = random_functions()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "interior-memorabilia",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "random_object.instance_method()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "recreational-secretariat",
+   "metadata": {},
+   "source": [
+    "This is how we are able to use functions such as \"train_test_split\" without creating any sort of scikit-learn object. We simply import the library (the class), or we import specific methods that we can use. \n",
+    "\n",
+    "Did you notice how when we created an object, the *self* parameter was automatically provided? This is precisely the difference between a static method and an instance method"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "informational-sphere",
+   "metadata": {},
+   "source": [
+    "### Other Concepts\n",
+    "There are many other concepts that can be used in Object Oriented Programming, such as public, private and protected attributes, inheritance, polymorphism, etc. Since the goal here is not to teach a webinar on OOP, rather do a practical exercise, you can use [this link](https://stackabuse.com/object-oriented-programming-in-python/) if you are interested in more details about object oriented programming in Python."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "based-remainder",
+   "metadata": {},
+   "source": [
+    "## The Linear Regression Class\n",
+    "This is something we can code up from the ground up! But before we get into this, let's figure out all the steps required to make Linear Regression work.\n",
+    "\n",
+    "The first is the .predict() luckily we already know how to make this. \n",
+    "\n",
+    "The second most important thing is the .fit(). Let's do a small recap of how gradient descent works using the Root Means Squared loss function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "developing-wilson",
+   "metadata": {},
+   "source": [
+    "### Gradient Descent process\n",
+    "There are many resources out there to understand Gradient Descent intuitively. We, however, will be focusing on converting the mathemetical equations of gradient descent to code. In its most simplest form, gradient descent can be carried out in two steps. \n",
+    "1. Calculate the derivative of the loss function with respect to each weight\n",
+    "2. Update each weight\n",
+    "\n",
+    "Let's dive into each of these using the Mean Squared Error function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "decent-prior",
+   "metadata": {},
+   "source": [
+    "####  Mean Squared Error\n",
+    "The equation is as follows:\n",
+    "\n",
+    "![root_mean_squared_loss](images/mse_loss_function.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "endangered-tournament",
+   "metadata": {},
+   "source": [
+    "As you can probably tell, all this is doing is that it is finding the difference between each prediction and true value, squaring it, and then finding the mean (with an extra factor of 2 in the denominator). How would this look in code?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "crazy-piano",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def sample_mse_function(Y, Y_pred):\n",
+    "    return np.mean((Y - Y_pred) ** 2)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "referenced-tennis",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "Y = np.array([1, 0, 0, 1])\n",
+    "Y_Pred = np.array([1, 0, 0, 0])\n",
+    "\n",
+    "error = sample_mse_function(Y, Y_Pred)\n",
+    "error"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "powerful-brain",
+   "metadata": {},
+   "source": [
+    "#### MSE Derivative\n",
+    "We will not go into how we get to this derivative. If you are interested you can have a look at [this link](https://towardsdatascience.com/gradient-descent-from-scratch-e8b75fa986cc). However, the formula is as follows:\n",
+    "\n",
+    "![mse_derivative](images/mse_derivative.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "potential-government",
+   "metadata": {},
+   "source": [
+    "Let's try to dissect what is happening here. For each weight, we are multiplying the data point associated with that weight, with the difference between the predicted values and the real values. This sounds awful lot like a dot product. \n",
+    "\n",
+    "Consider this, if our X is a dataset of shape (n, m) where we have *n* data points and *m* features, then our Y values would be of size (n, 1), and our weights would be of size (m, 1). Thus, to get the partial derivative, we know that the shape of the partial derivative should be (m, 1). This gives us our first hint. The second thing we notice is that **for a particular weight** only the relevant data point is multiplied by the value of the difference. We'll look at this in more detail below"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "three-latest",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X = np.array([[2, 3, 4], [1, 2, 3], [5, 6, 7], [10, 11, 12]])\n",
+    "\n",
+    "X.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dressed-hours",
+   "metadata": {},
+   "source": [
+    "Our toy data has 3 features, and 4 data points. Let's keep that in mind. Time to initialize the weights! Our weights should be of length **3** for our example here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "biological-design",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "weights = np.ones(shape=(X.shape[1], 1))\n",
+    "\n",
+    "weights"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "nutritional-criterion",
+   "metadata": {},
+   "source": [
+    "Now let's get our predictions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "informational-niger",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "Y_pred = X@weights\n",
+    "\n",
+    "Y_pred"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "valued-attitude",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Randomly choosing values for Y_true\n",
+    "Y_true = np.array([[8], [7], [19], [32]])\n",
+    "\n",
+    "Y_true.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bright-lloyd",
+   "metadata": {},
+   "source": [
+    "Let's calculate the loss here, so we can compare it later"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "blind-compound",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sample_mse_function(Y_true, Y_pred)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "individual-sequence",
+   "metadata": {},
+   "source": [
+    "Now let's actually implement the function! The most important thing when dealing with linear algebra in Python is to keep a track of the *shapes* of the matrices, and keep in mind what our goals are. We will see this in action"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "assured-effectiveness",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "difference = Y_true - Y_pred\n",
+    "\n",
+    "difference"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "after-transsexual",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "manufactured-baghdad",
+   "metadata": {},
+   "source": [
+    "Let us intuitively look at what we want to do. According to the equation above, we want to multiply the entire first column with the difference, and then divide it by *n*. We want to repeat this for all the columns, and eventually we can get a matrix of shape (m, 1), or in this case, (3, 1).\n",
+    "\n",
+    "How do we do this? We simply take the transpose of X, and then we can do a simple dot product"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "understood-secretary",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X.T"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "egyptian-experience",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "partial_derivative = -X.T @ difference\n",
+    "partial_derivative"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "changed-folks",
+   "metadata": {},
+   "source": [
+    "#### Weight update equation\n",
+    "This is the simplest bit. All we have to do is update the weights by subtracting the partial derivative from the weights after multiplying it with the learning rate"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "abstract-cream",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "learning_rate = 0.001\n",
+    "weights -= learning_rate*partial_derivative\n",
+    "\n",
+    "weights"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "hydraulic-nightmare",
+   "metadata": {},
+   "source": [
+    "Let's see if our loss got any better?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "retained-night",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sample_mse_function(Y_true, X@weights)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "gross-acquisition",
+   "metadata": {},
+   "source": [
+    "In just one iteration, we reduce our loss by 0.05. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "classical-oasis",
+   "metadata": {},
+   "source": [
+    "### Other Loss Functions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "pursuant-stuart",
+   "metadata": {},
+   "source": [
+    "#### Mean Absolute Error\n",
+    "Another popular loss function is the Mean Absolute Error function. We will go over this because the derivative of the MAE function is a step-wise function, which has to be programmed in a slightly different way. The equation for MAE is:\n",
+    "\n",
+    "![mean_absolute_error](images/mae_loss_function.png)\n",
+    "\n",
+    "The interesting thing here is that the MAE is not differentiable at y_true == y_pred. However, you can create a step-wise function as follows:\n",
+    "\n",
+    "![mean_absolute_error_derivative](images/mae_derivative.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "quantitative-congo",
+   "metadata": {},
+   "source": [
+    "Now this may look very complicated, but step-wise functions are much easier to program than other functions! All we have to do is that if the prediction is greater than the true value, we have to put in a 1, if the prediction is smaller, we have to put in a 0. However, what do we do if the prediction is exactly the same as the true value? **Hint:** If the prediction and the true value is the same, do we want to change our weight? No, thus, if they are the same, we should keep it as zero. Let's have a look at how to implement these functions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "continuous-relief",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def sample_mean_absolute_error(Y, Y_pred):\n",
+    "    return np.mean(np.absolute(Y - Y_pred))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "interpreted-dance",
+   "metadata": {},
+   "source": [
+    "There are many ways to implement a step-wise function using numpy. The most basic would be to write small functions and apply them across the entire matrix. However, a much simpler way is to use the [np.where](https://numpy.org/doc/stable/reference/generated/numpy.where.html) function. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "considerable-capitol",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.array([1, 2, 3, 4, 5])\n",
+    "\n",
+    "# First parameter is the condition\n",
+    "# Second parameter is the value if true\n",
+    "# Third parameter is the value if false\n",
+    "np.where(a < 3, 0, a)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "spread-forum",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def sample_mae_derivative(X, Y, Y_pred):\n",
+    "    # Get the difference\n",
+    "    difference = Y_pred - Y\n",
+    "\n",
+    "    # If difference is 0, then substitute the values with 0, otherwise keep the same value\n",
+    "    abs_derivative = np.where(difference == 0, 0, difference)\n",
+    "    \n",
+    "    # If the difference is positive, that means Y_pred > Y\n",
+    "    abs_derivative = np.where(abs_derivative > 0, 1, abs_derivative)\n",
+    "    \n",
+    "    # If the difference is negative, that means Y_pred < Y\n",
+    "    abs_derivative = np.where(abs_derivative < 0, -1, abs_derivative)\n",
+    "    return X.T @ abs_derivative"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "duplicate-discussion",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X = np.array([[2, 3, 4], [1, 2, 3], [5, 6, 7], [10, 11, 12]])\n",
+    "weights = np.ones(shape=(X.shape[1], 1))\n",
+    "Y_pred = X@weights\n",
+    "Y_true = np.array([[7], [6], [20], [30]])\n",
+    "\n",
+    "display(Y_pred)\n",
+    "display(Y_true)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "minor-fitting",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sample_mean_absolute_error(Y_true, Y_pred)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cosmetic-walter",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "partial_derivative = sample_mae_derivative(X, Y_true, Y_pred)\n",
+    "\n",
+    "partial_derivative"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "greatest-press",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "weights -= 0.001 * partial_derivative\n",
+    "Y_pred = X@weights\n",
+    "sample_mean_absolute_error(Y_true, Y_pred)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "charged-worse",
+   "metadata": {},
+   "source": [
+    "### Putting it all together"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "chubby-stack",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class LinearRegression():\n",
+    "    def __init__(self, \n",
+    "                 loss_function= 'mse', max_iter=1000, learning_rate = 0.001,\n",
+    "                fit_intercept=True, tolerance=0.001):\n",
+    "        # Before we know the shape of the data, we cannot initialize weights\n",
+    "        self._weights = None\n",
+    "        self._max_iter = max_iter\n",
+    "        self._learning_rate = learning_rate\n",
+    "        self._fit_intercept = fit_intercept\n",
+    "        self._tolerance = tolerance\n",
+    "        self._loss_function = loss_function\n",
+    "        self._loss_dict = {\n",
+    "            'mse': self.mean_squared_loss,\n",
+    "            'abs': self.mean_absolute_loss,\n",
+    "        }\n",
+    "        self._derivative_dict = {\n",
+    "            'mse': self.mean_squared_loss_derivative,\n",
+    "            'abs': self.mean_absolute_loss_derivative,\n",
+    "        }\n",
+    "        self._loss_history = None\n",
+    "        self._weights_history = None\n",
+    "        \n",
+    "    # Trailing underscore indicates that a method is for internal use\n",
+    "    def _init_weights(self, num_features):\n",
+    "        # It is good practice to use a normal distribution with mean 0 for weight initialization\n",
+    "        # There are more complicated methods too! But this works\n",
+    "        self._weights = np.random.normal(size=(num_features,1))\n",
+    "        \n",
+    "    def fit(self, X, Y):\n",
+    "        assert len(X) == len(Y), \"X and Y should be the same length\"\n",
+    "        # If we want to add a \"bias\" term, or y_intercept\n",
+    "        if self._fit_intercept:\n",
+    "            X = np.concatenate((X, np.ones(shape=(len(X), 1))), axis=1)\n",
+    "        \n",
+    "        # Initialize the weights\n",
+    "        self._init_weights(X.shape[1])\n",
+    "        \n",
+    "        # Initialize two lists to store the history\n",
+    "        loss_history = []\n",
+    "        weights_history = []\n",
+    "        \n",
+    "        # Used for early stopping on model convergence\n",
+    "        previous_loss = np.inf\n",
+    "        converged = False\n",
+    "        for i in range(self._max_iter):\n",
+    "            # Get Prediction\n",
+    "            Y_pred = self.predict(X)\n",
+    "            \n",
+    "            # Calculate loss to monitor preformance\n",
+    "            loss = self._loss_dict[self._loss_function](Y, Y_pred)\n",
+    "\n",
+    "            # Stop Gradient Descent if model has converged\n",
+    "            if np.abs(loss - previous_loss) < self._tolerance:\n",
+    "                converged = True\n",
+    "                break\n",
+    "            previous_loss = loss\n",
+    "            \n",
+    "            loss_history.append(loss)\n",
+    "            # Because a Numpy Array is an object that gets updated each time\n",
+    "            # We need to use .copy() to append it to get the weights\n",
+    "            weights_history.append(self._weights.reshape(-1).copy())\n",
+    "            \n",
+    "            # Calculate Partial Derivative\n",
+    "            partial_derivative = self._derivative_dict[self._loss_function](X, Y, Y_pred)\n",
+    "            \n",
+    "            # Update the weights\n",
+    "            self._weights -= self._learning_rate * partial_derivative\n",
+    "    \n",
+    "        if converged:\n",
+    "            print(\"Model Converged\")\n",
+    "        else:\n",
+    "            print(\"Warning: Max iterations reached, model did not converge\")\n",
+    "        \n",
+    "        self._loss_history = np.array(loss_history)\n",
+    "        self._weights_history = np.array(weights_history)\n",
+    "        \n",
+    "    def predict(self, X):\n",
+    "        Y_pred = X @ self._weights\n",
+    "        return Y_pred\n",
+    "\n",
+    "    def get_coefficients(self):\n",
+    "        return self._weights\n",
+    "    \n",
+    "    def get_training_history(self):\n",
+    "        return self._loss_history, self._weights_history\n",
+    "    \n",
+    "    # MSE Losses\n",
+    "    @staticmethod\n",
+    "    def mean_squared_loss(Y, Y_pred):\n",
+    "        return np.mean((Y - Y_pred)**2)\n",
+    "    \n",
+    "    def mean_squared_loss_derivative(self, X, Y, Y_pred):\n",
+    "        return (-X.T @ (Y - Y_pred)) / len(X)\n",
+    "    \n",
+    "    # Absolute Losses\n",
+    "    @staticmethod\n",
+    "    def mean_absolute_loss(Y, Y_pred):\n",
+    "        return np.mean(np.absolute(Y-Y_pred))\n",
+    "    \n",
+    "    def mean_absolute_loss_derivative(self, X, Y, Y_pred):\n",
+    "        difference = Y_pred - Y\n",
+    "        # If difference is 0, then substitute the values with 0, otherwise keep the same value\n",
+    "        abs_derivative = np.where(difference == 0, 0, difference)\n",
+    "        # If the difference is positive, that means Y_pred > Y\n",
+    "        abs_derivative = np.where(abs_derivative > 0, 1, abs_derivative)\n",
+    "        # If the difference is negative, that means Y_pred < Y\n",
+    "        abs_derivative = np.where(abs_derivative < 0, -1, abs_derivative)\n",
+    "        return X.T @ abs_derivative"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "palestinian-insight",
+   "metadata": {},
+   "source": [
+    "#### Preprocess Data\n",
+    "We will need to run a scaler on the data to make sure we get reasonable values and weights. In scikit-learn, this is usually handled within the class itself"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "weekly-thomas",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.preprocessing import StandardScaler"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "improved-article",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "Y = data[['PRICE']].to_numpy()\n",
+    "X = data[boston.feature_names].to_numpy()\n",
+    "scaler = StandardScaler()\n",
+    "X = scaler.fit_transform(X)\n",
+    "print(X.shape)\n",
+    "print(Y.shape)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "tested-circuit",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Instantiage the LinearRegression object\n",
+    "Regressor = LinearRegression(loss_function='mse', learning_rate=0.01)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "damaged-rover",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "Regressor.fit(X, Y)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "chinese-speed",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loss_history, weights_history = Regressor.get_training_history()\n",
+    "print(loss_history.shape)\n",
+    "print(weights_history.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "coral-bubble",
+   "metadata": {},
+   "source": [
+    "### Visualize Performance"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "naked-system",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.figure(dpi=150)\n",
+    "iterations = range(1, len(loss_history) + 1)\n",
+    "plt.title(\"Training Loss\")\n",
+    "plt.plot(iterations, loss_history)\n",
+    "plt.xlabel(\"Epoch\")\n",
+    "plt.ylabel(\"Loss\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "legendary-actress",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.figure(dpi=150)\n",
+    "plt.title(\"Weight Update\")\n",
+    "# for i in range(weights_history.shape[1]):\n",
+    "#     plt.plot(iterations, weights_history[:, i])\n",
+    "plt.plot(iterations, weights_history[:, 1])\n",
+    "plt.xlabel(\"Epoch\")\n",
+    "plt.ylabel(\"Weight\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "seventh-exposure",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "animal-process",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/Linear Regression from Scratch/images/hypothesis.png
+++ b/Linear Regression from Scratch/images/hypothesis.png
--- a/Linear Regression from Scratch/images/mae_derivative.png
+++ b/Linear Regression from Scratch/images/mae_derivative.png
--- a/Linear Regression from Scratch/images/mae_loss_function.png
+++ b/Linear Regression from Scratch/images/mae_loss_function.png
--- a/Linear Regression from Scratch/images/mse_derivative.png
+++ b/Linear Regression from Scratch/images/mse_derivative.png
--- a/Linear Regression from Scratch/images/mse_loss_function.png
+++ b/Linear Regression from Scratch/images/mse_loss_function.png