NN-based Regression

1 Observe the data from statistical department

1
2


#load the dataimport csvimport numpy as np# Declare lists for storing datasetoperationCost = np.array([])profit = np.array([])# Load dataset from csv filereader = csv.reader(open('./dataForCEO-cost&profit.csv','r'))for row in reader: #put two dataset into to row, the type is float    operationCost = np.append(operationCost, float(row[0]))#row 1    profit = np.append(profit, float(row[1]))#row 2

plot the trainset

1
2


%matplotlib inlineimport matplotlib.pyplot as pltdef plotData(x, y, color=None): #set rhe function    plt.scatter(x.tolist(), y.tolist(), marker='x', color=color)#tolist:turn from dataframe to list    plt.xlabel('Operation Cost in $')  #the label on the graph    plt.ylabel('Profit in $')    #plot itplt.figure(figsize=(6,4)) #plot the backgroundplotData(operationCost, profit) plt.show() #show:plot the data

2 Divide dataset and preprocess dataset

2.1 Divide dataset

First, you should divide the dataset into a trainset and a testset. As an example, we can define 80%80% of the raw dataset is trainset and the other 20%20% is testset.

1
2


dataset_x = operationCost dataset_y = profit num = len(dataset_x)#find the length of the dataset_xsplitLine = int(0.8*num)# generate a random list to select data randomlyindex = [i for i in range(0, num)]import randomrandom.shuffle(index)# divide into trainset and testsettrain_x = dataset_x[index[0:splitLine]] #get from index from 0 to splitlinetrain_y = dataset_y[index[0:splitLine]] test_x = dataset_x[index[splitLine:num]]test_y = dataset_y[index[splitLine:num]]

observe the trainset and testset in different color

1	plt.figure(figsize=(6,4))plotData(train_x, train_y) #plotData(x-axis,y-axis)plotData(test_x, test_y)plt.legend(["Trainset","Testset"])plt.show()

2.2 Preprocess dataset

Before training, we need to preprocess our trainset.
Because the values in dataset are too big, computation of loss may be difficult. So, one method of preprocessing is normalization.
In this case, we can do normalization by

Normalized Value=RawValue/NormalizeFactor

1
2


normFactor = np.mean(train_y) # for exampledef normalize(val):    return val / normFactor #the functionnormedTrain_x = normalize(train_x)normedTrain_y = normalize(train_y)normedTest_x = normalize(test_x)normedTest_y = normalize(test_y)plt.figure(figsize=(6,4))plt.scatter(normedTrain_x, normedTrain_y, marker='x')plt.scatter(normedTest_x, normedTest_y, marker='x')plt.legend(["Normalized Trainset","Normalized Testset"])plt.show()

3 Describe your neural network

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

describe the model


import tensorflow as tffrom tensorflow import kerasfrom tensorflow.keras import layers# describe the modelmodel = keras.Sequential([    layers.Dense(16, activation='relu', input_shape=(1,)), #create layers    layers.Dense(16, activation='relu'),    layers.Dense(1)])model.summary()

Model: "sequential_1"_________________________________________________________________Layer (type)                 Output Shape              Param #   =================================================================dense_3 (Dense)              (None, 16)                32        _________________________________________________________________dense_4 (Dense)              (None, 16)                272       _________________________________________________________________dense_5 (Dense)              (None, 1)                 17        =================================================================Total params: 321Trainable params: 321Non-trainable params: 0_________________________________________________________________

#We can use Mean Square Error (MSE) as loss function, and use Stochastic Gradient Descent (SGD) as the process we train the model (i.e. "optimizer").model.compile(loss='mse', optimizer='sgd')#compile(source, filename, mode[, flags[, dont_inherit]])

4 Training

We declare a variable to record the number of iteration.

1
2


iters = 0# actual trainingfor i in range(1000):    iters = iters + 1#count the times    cost = model.train_on_batch(normedTrain_x, normedTrain_y)    if iters % 100 == 0: #show every 100 times        print("After %d iteration(s), train cost = %f "%(iters, cost))#show how many times# plot resultplt.figure(figsize=(6,4))plt.scatter(normedTrain_x, normedTrain_y, marker='x')temp = np.arange(min(normedTrain_x), max(normedTrain_x), 0.01)plt.plot(temp, model.predict(temp),color = 'cyan')plt.legend(["Model Prediction Now", "Normalized Trainset"])#the label on the topplt.show()

5 Evaluation of the trained model

1
2


cost = model.evaluate(normalize(test_x), normalize(test_y))print('test cost:', cost)W, b = model.layers[0].get_weights()#show the weightsprint('Weights=', W, '\nbiases=', b)W, b = model.layers[1].get_weights()print('Weights=', W, '\nbiases=', b)W, b = model.layers[2].get_weights()print('Weights=', W, '\nbiases=', b)

6 Post-processing and final prediction

Before predicting, we need to post-process the output of the trained model.
In above, we use normalization to preprocess. So, we use denormalization to post-process the output of the trained model and get the final predict results.

First, we define a function to denormalize.

Prediction=(Normalized Value)×(Normalize Factor)

1
2


def denormalize(val):    return val * normFactortemp = np.arange(min(normedTrain_x), max(normedTrain_x), 0.01)pred_y = denormalize(model.predict(temp))pred_xAxis = denormalize(temp)#plot the graphplt.figure(figsize=(6,4))plotData(train_x, train_y)plotData(test_x, test_y)plt.plot(pred_xAxis, pred_y, color='cyan')plt.legend(["Prediction","Trainset","Testset"])plt.show()