Day 1: Introduction

We would like to predict whether a student should pass or fail a course based on four factors: attendance, citizenship, test scores and homework.

We have lots of historical data showing the four variables and which students passed or failed. We want to use that data to make predictions about future grades for students.

Our records include an Excel spreadsheet with this information.

We are going to use Python's prediction tools to analyze this data.

A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the data set in question. The data set lists values for each of the variables, such as attendance grade, citizenship grade, test grade and homework grade for each member of the data set.¹

We are going to look at a very familiar dataset to both students and teachers: grades. The first one we are going to look at is an Excel spreadsheet.

Each column represents a variable that goes into calculating grades:attendance, citizenship, tests and homework.
The last column represents whether the student pased or failed the class. A '0' means failure and a '1' represents a passing grade.
Each row represents a student's scores from 0 to 4.0
There are 108 student grades in the dataset.
Click on the link to open the spreadsheet.

Excel spreadsheet with grades

For example, the student with the ID of 5 got a 3.0 on attendance, a .30 for citizenship, a .25 grade on tests, a .55 grade for homework and passed the class.

I am not sure that I would have passed this student, but our job now is to follow the data.

Student #2 got a 3.0, .40, .75.25, and failed.

We are going to use this dataset in a Python Random Forest application to predict which students in the future should pass or fail a class based on the four variables: attendance, citizenship, tests and homework.

Save this spreadsheet file in your working directory. Call it excelGradesNew.xlsx

Write down the folder where this is file is located.

Before understanding Random Forest algorithums, we need to understand what a decision tree is.

Definition of decision tree : a tree diagram which is used for making decisions in business or computer programming and in which the branches represent choices with associated risks, costs, results, or probabilities.

To see how a decision tree work, look at the lesson on decision trees on our website.

janetbelch.com

Random Forest is an algorithum in the Python library that utilizes multiple decision trees

There a number of ways to work with data sets in Python. We will look at three of them in this tutorial.

The Python code is essentially the same for spreadsheets, comma separated values and internal datasets.

Day 2: Python Random Forest Model using a spreadsheet as dataset.

You need to have already installed Python and Anaconda3 with Jupyter Notebook for this lesson.

We will look at the code a section at a time.

Frame 1

IN THIS PROJECT, WE ARE GOING TO CREATE SEPARATE CELLS FOR SECTIONS OF OUR CODE AND THEN RUN AND TEST IT OUT ONE CELL AT A TIME.

CLICK ON THE CELL YOU WANT TO RUN THEN CLICK RUN ON THE MENU BAR.

There are a number of ways to run your code.

You can experiment with these.

Copy library code to clipboard.
Click on Jupyter Notebook(anaconda3) program.
Click on New.
Select python3 as the file type.
Click in the first frame.
Press CTRL V to paste text into Python.
Click on file and save it as gradesExcel.ipynb

Pandas is a package commonly used to deal with data analysis.

It simplifies the loading of data from external sources such as text files and databases.

It also provides ways of analyzing and manipulating data once it is loaded into your computer.

It puts data into rows and columns like a spreadsheet for the purpose of analyzing the data.

Sklearn is needed to train the model and do the confusion matrix.

Training the machine learning model for grades involves its evaluation.

We will set apart a test set for this.

The test set is a dataset that the trained model has never seen before.

Using it allows you to test whether the model has overfit, or adapted to the training data too well.

Overfitting is a common explanation for the poor performance of a predictive model.²

Testing the random sample of data allows you to ensure that your model does performs well on new data.

Numpy is a numerical programming package for the Python.

Matplotlib is used for visualization or graphing the information.

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

Frame 2

Copy code that reads the spreadsheet to the clipboard.
Click on the plus sign on the menu bar (insert cell below) to add a new frame to the project.
Click in the second frame.
Press CTRL V to paste text into python.
Click on file and save it as gradesExcel.ipynb

Frame 2 code creates a variable called data and reads the Excel file into it.

Notice the double forward slashes in the file location.

Next, the head, first five rows and tail of the spreadsheet, last five rows are printed out.

Frame 3

Create a new frame and key in print(data.shape)

The frame will show the size of the dataset which contains 108 rows and 6 columns.

Save and run frames 1,2,3.

Frame 4

This frame prints the data set.

Create a new frame and key in:

print(df)

Frame 5

Create a new frame and key in df.describe()

This code describes key elements in the data set including mean, median, standard deviation high and low numbers.

Save and run. Your screen should look like the image below.

Frame 6

Click the plus sign to create a new frame and type in the following information. X = df[['attendance','citizenship','tests', 'homework']]

This line assigns the independent variables in the data frame to the letter X. Make sure that it is an uppercase X.

Save and run to make sure the syntax is correct. This line does not print anything on the screen.

Frame 7

Click the plus sign to create a new frame and type in the following information. y = df['grade_awarded']

This line assigns the grade awarded to the letter y. The letter y is lowercase.

Save and run to make sure the syntax is correct. This line does not print anything on the screen.

Frame 8

Click the plus sign to create a new frame and type in the following information. from sklearn.model_selection import train_test_split

This imports sklearn from Python's library.

Sklearn contains tools for machine learning and statistical modeling.

Frame 9

Click the plus sign to create a new frame and type in the following information. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

This line divides the data into training and test sets. Eighty percent is used for training and 20 percent makes up the test set which is used for testing the algorithum.

Frame 10

Copy code that reads creates a variable clf and fits it to our model.
Click on the Plus sign on the menu bar (insert cell below) to add a new frame to the project.
Click in that frame.
Press CTRL V to paste text into Python.
Click on file and save it as gradesExcel.ipynb

Now it is time to make some predictions from the test data.

Frame 11

Copy code that will make predictions about which students should pass or fail.
Click on the Plus sign on the menu bar (insert cell below) to add a new frame to the project.
Click in that frame.
Press CTRL V to paste text into Python.
Click on file and save it as gradesExcel.ipynb
Now go to frame one,click and run.
After each click you will advance one frame, continue running each frame until you get prediction output that looks like the image below.

Let's look at our results.

Twenty two records were selected at random from the dataset containg 108 records.
A list of grades for each independent variable were given for each.
At the bottom of the list are the predictions for each randomly selected one.
The passing and failing grades are enclosed in square brackets.
[1 1 1 0 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0]
A "1" means pass and a "0" means fail.
Five are predicted to fail.
Record 2 original data was .25,.50,.75,.25 with a failing grade.
Our model predicted that with these marks, student 2 should fail
Student 107 should also fail.
Student 86 should also fail based on our model

If you do not specify the random state in code, then everytime you execute your program, a new random value is generated and the train and test data sets would have different values.

Confusion Matrix

To evaluate the performance of a random forest classifier, we will use a confusion matrix.

A confusion matrix is a mold or container that allows you to visualize the performance of the classification machine learning models. With this visualization, you can get a better idea of how your machine learning model is performing.

A matrix, in mathematics, is a rectangular array of quantities or expressions set out by rows and columns.

A confusion matrix shows true positives, true negatives, false positivies and false negatives.

In our model a true positive is one that was declared passing in the actual number and the prediction also.

A true negative, in our example, is one that shows a student failing in the actual data and also failing in the prediction.

A false positive is one in which the actual data showed the student failing and the prediction had them passing.

In our model a false negative means that the actual data shows a student failing and the prediction shows them passing.

Here is the confusion matrix for our Random Forest Classifier model.

There four True Negatives.
There 15 True Positives.
There 2 False Positives.
There is one False Negative.

The table below shows how these numbers were determined.

Record #	Prediction	Actual	Type
84	1	1	TP
10	1	1	TP
75	1	1	TP
2	0	0	TN
24	1	0	FP
100	1	1	TP
107	0	0	TN
7	1	1	TP
16	1	1	TP
86	0	0	TN
68	0	0	TN
22	1	0	FP
45	1	1	TP
60	1	1	TP
76	1	1	TP
52	1	1	TP
13	1	1	TP
73	1	1	TP
85	1	1	TP
54	1	1	TP
103	1	1	TP
8	0	1	FN

Here is the code that produces the confusion matrix.

Frame 12

The accuracy of the model was determined by adding True Positive (17) and True Negative (5) and dividing by the total number of randomly selected items (22) to get accuracy of 1.0.

Here is the code to find the accuracy of the model.

Frame 13

Copy code that will determine the accuracy of the model
Click on the Plus sign on the menu bar (insert cell below) to add a new frame to the project.
Click in that frame.
Press CTRL V to paste text into python.
Click on file and save it as gradesExcel.ipynb
Now go to frame one,click and run.
After each click you will advance one frame, continue running each frame until you get the accuracy.
You can also get a single predicted result by entering a student's scores in the prediction = clf.predict([[0,0,0,0]]) line of code.

Next we need to see which of the independent variable, features of importance, contribute the most to grade determination.

The variables responsible for determining the students' grades are attendance, citizenship, tests and homework. Which of these is the most important?

Here is the code needed to make the prediction.

The first line of this code allows the user to enter one student's grades to see if they should pass or fail.

Frame 14

Copy code that will determine the features of importance
Click on the Plus sign on the menu bar (insert cell below) to add a new frame to the project.
Click in that frame.
Press CTRL V to paste text into Python.
Click on file and save it as gradesExcel.ipynb
Now go to frame one,click and run.
After each click you will advance one frame, continue running each frame until you get feature importances output that looks like the list below.

Results will be similar with these.

Most important criteria for DETERMINING PASS OR FAIL

2 0.464878
1 0.405681
0 0.117716
3 0.011725

As you can see they are ranked most important to least important.

Now let's analyze what this model has predicted here.

Remember the array numbers for the independent variables.

Array #	Variable	Importance
0	Attendance	0.117716
1	Citizenship	0.405681
2	Tests	0.464878
3	Homework	0.011725

According to our results, the most important feature influencing grades awarded is tests.

The least important variable is homework.

The model also provides the capability of entering just one set of independent variables and when the model is run, it will predict if those values mean pass or fail.

The line is :prediction = clf.predict([[4.0,3.0,1,0]])

These numbers should produce a passing grade of 1.

Frame15

Copy code that will determine the features of importance to make a graph
Click on the Plus sign on the menu bar (insert cell below) to add a new frame to the project.
Click in that frame.
Press CTRL V to paste text into Python.
Click on file and save it as gradesExcel.ipynb
Now go to frame one,click and run.
After each click you will advance one frame, continue running each frame until you get the graph of the features of importance.

After our model has determined the most important variables influencing the grade awarded, a visualization of that information would be helpful.

The bar graph below does just that.

The red bar shows the most important feature: tests.
The green bar shows how citizenship affects our grade.
The yellow bar shows the influence of attendance on the grade.
The blue bar shows the influence of homework.

Day 3: Comma Separated Values

Another way to read data into a Python program, is by using comma separated values.

You can use a simple text editor program to create the file or use a spreadsheet program and save your file as a .csv file.

Start out by keying in the independent variable names separated by commas and then the dependent, Y variable title.

The file appears below for you to copy.

Open Notepad and paste in the information Save the file in your working folder. Name the file 'csFile.csv'.

Make the following changes to your program.

Use the location of where you stored the Excel file.

Save and run your program and answer the questions on the worksheet.

Random Forest worksheet

Day 4: JSON for Random Forest

A collection of name/value pairs. Different programming languages support this data structure in different names.

Like object, record, struct, dictionary, hash table, keyed list, or associative array.

An ordered list of values. In various programming languages, it is called as array, vector, list, or sequence.

Since data structure supported by JSON is also supported by most of the modern programming languages, it makes JSON a very useful data-interchange format.

Another way to use data in an application is to have the grading information in the body of the program.

Start a new Python project and paste the following information into the first cell.

We are going to put all the code in one csll.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier 
from sklearn import metrics
import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import sys
sys.__stdout__ = sys.stdout

grades = {'attendance': [4.0,4.0,4.0,3.5,3.75,4.0,3.5,4.0,3.25,4.0,4.0,3.75,4.0,4.0,4.0,3.5,3.75,4.0,3.5,4.0,3.25,4.0,4.0,3.75,3.25,3.5,3.0,4.0,3.50], 
         
        'citizenship': [1.0,4.0,3.75,3.25,.50,2.0,1.75,1.0,.75,4.0,0,3.75,1.0,4.0,3.75,3.25,.50,2.0,1.75,1.0,.75,4.0,0,3.75,.50,3.25,.75,4.0,1.0],
         
        'tests':[1.0,4.0,4.0,3.50,.25,2.75,2.50,1.0,0,4.0,0,3.75,1.0,4.0,4.0,3.50,.25,2.75,2.50,1.0,0,4.0,0,3.75,0,3.0,.50,4.0,0],
         
        'homework':[2.0,4.0,4.0,3.75,.25,3.0,2.0,1.0,0,4.0,0,3.75,2.0,4.0,4.0,3.75,.25,3.0,2.0,1.0,0,4.0,0,3.75,0,2.75,0,4.0,0],
         
        'grade_awarded':[1,1,1,1,0,1,1,1,0,1,0,1,1,1,1,1,0,1,1,1,0,1,0,1,0,1,0,1,1]}
         #'grade_awarded':['P','P','P','P','F','P','P','P','F','P','F','P','P','P','P','P','F','P','P','P','F','P','F','P','F','P','F','P','P']}

#29 should have failed with.375 as average .5, 1.0 ,0,0 scores

df=pd.DataFrame(grades, columns=['attendance', 'citizenship','tests', 'homework', 'grade_awarded'])
print('Original DataFrame')

print(df.shape)
print(df)
x = df[['attendance', 'citizenship', 'tests','homework']]
y = df['grade_awarded']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20,random_state=0)

clf= RandomForestClassifier(n_estimators=100)
clf.fit(x_train,y_train)
y_pred=clf.predict(x_test)
print('Predicted Values')
print (x_test) # test dataset without the actual outcome
print (y_pred) # predicted values

#predictions = mlp_classifieer.predic(X_test)
#print(confusion_matrix(y_test,predictions)
     
confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Actual'], colnames=['Predicted'])
sn.heatmap(confusion_matrix, annot=True)

print('Accuracy: ',metrics.accuracy_score(y_test,y_pred))
plt.show()

prediction = clf.predict([[0,0,0,0]])
print()
print()
print('Predicted Single Result: ', prediction)
print()
print()
print('Most important criteria for DETERMINING PASS OR FAIL')
print()
featureImportances = pd.Series(clf.feature_importances_).sort_values(ascending=False)
#featureImportances = pd.Series(clf.feature_importances_)
print(featureImportances)

After pressing the copy button, open your Python random forest application paste the information into cell 1

Save your application and run it to make sure that it runs without any errors.

Anwer the questions in the worksheet.

Day 5: Working with Pandas Data Frames and file formats

JSON stands for Java Script Notation. It was and is still used by javascript and other programming languages.

It supports two widely used data structures:

A collection of name/value pairs

It can be called a number of names: object, records, dictionary, hash table, keyed list or associative array.

An Ordered list of values

Can be called an array, vector list or sequence

Pandas Dictionary

A dictionary is a structure that maps arbitrary keys to a set of arbitrary values.

Pandas Series is a one-dimensional array of indexed data.

It can be created using a list or an array.

Pandas Series can be thought of as a special case of a Python dictionary.

Above is an example of pandas series dictionary built into the application.

The dictionary contains data for a booth judging competition.

There are 12 schools.

Each school is judged on 5 different areas: Content, Presentation, Materials, Professionalism and Final Grade.

The name of the object is grades.

The object contains five array: Name, Content, Presentation, Materials, Professionalsm and Final.

Let's look at the first array.

The school is Dos Pueblos, my old school.

They got the following scores: Content 17, Presentation 15, Materials 20, Professionalsm 10 and Final score of 62.

San Marcos got the following scores: Content 7, Presentation 12, Materials 11, Professionalism 10, Final score 47.

Ways to get the data into a program

There a number of ways to obtain the data necesary for the application: As a part of the program, comma separated vlues file, Spreadsheet file.

Below is a comma value separated file with identical information as above.

The file is read into the application.

Highlight the lines of the file, Copy the information.

Open Notepad.

Paste the file contents into Notepad.

Save the file as 'BoothScores.csv' in your working folder.

Spreadsheet Method

Another way to read in data into an application is by using a Excel spreadsheet containing the data.

Click on the link below to see how the data is presented in the spreadsheet.

booth data

After opening the file in Excel, save it in your working folder. Call the file 'boothData.xslx'.

The next line is how you would read in the spreadsheet file.

Day 6: creating three applications for booth evaluations

Here is the information for the first application with the data contained in the application.

After pressing the copy button, start a new Python application and paste the information into the first frame..

Save and run.

Application using csv file

After pressing the copy button, start a new Python application and paste the information into the first frame..

Save and run.

Application using an excel file.

After pressing the copy button, start a new Python application and paste the information into the first frame..

Save and run.

All programs do the following:

Print out whole data frame
Show mean, media, mode for all: content, presentation, materials, professionalism and final score
Shows standard deviation for all

Day 7 : Booth Sales

Machine learning can be used to help us evaluate the effectiviness of our marketing strategy for trade show days.

We have compiled sales data and survey data during a trade booth sales day. We are going to evaluate the following independent variables and use Random Forest Algorithum to determine which variable has the most influence on our sales.

The data is organized in an Excel spreadsheet

Product Excel worksheet

At our last trade show, we asked our customers to give each category a "1" or a "0". based on its influence on them buying the item.

Our survey contained 135 respondents.

A "1" means that it is important and "0" means that it is not important.

They also indicated if they bought our product.

This trade show we are considering selling, features a 75 inch ultra-high-def smart televison.

We want to know what price to charge.

We have two models of the 75 inch high def TV

Are we going to have a special deal for the trade show.

Should we feature the higher quality TV at a higher cost or the less expensive one that is not quite as good.

We already have a very competitive price for the products in question.

Our regular price is $1,395 and $1,595 for the more expensve one.

We are considering a trade show discount that is approxmately 25%, $350.00 on the cheaper TV making the trade show price of $1,045.00.

On the higher quality TV, the discount would be about $400, making it sell for $1,195

We are going to analyze the results of our survey using a Python machine leaning classification algorithum, Random Forest.

Criteria on the survey

Quality
Price
Promotional Deal
Effect of salesperson on the sale
Predict which combinations of the above factors will result in a sale.

Getting the file

Click on the link for the spreadsheet and save it in your working diretory as 'productPurchased.xlsx'.

Cell 1: Getting the Python Libraries

After pressing the copy button, start a new Python application and paste the information into the first frame.

Save and run.

Cell 2 Reading in the spreadsheet file

After pressing the copy button, add a new cell and paste the information into the second frame.

Save and run.

Results/Expected Output

Cell 3: print shape of our file(rows and columns

After pressing the copy button, add a new cell and paste the information into the third frame.

Save and run.

Results/Expected Output

(135, 5) These numbers show us the number of rows and cols contained in the file.

Cell 4 : print out dataframe

After pressing the copy button, add a new cell and paste the information into the fourth frame.

Save and run.

Results/Expected Output

This printout is identical to cell 2 results. The difference is that it displays the df that was set equal to the data.

Cell 5: print description of dataframe

After pressing the copy button, add a new cell and paste the information into the fifth frame.

Save and run.

Results/Expected Output

Here the dataset is described: Mean, standard deviation for each of the independent variables as well as, max, min and count numbers are displayed.

Cell 6: Assigning column names to independent variables

After pressing the copy button, add a new cell and paste the information into the sixth frame.

Save and run.

Cell 7: Assigning column name to dependent variable

After pressing the copy button, add a new cell and paste the information into the seventh frame.

Save and run.

Cell 8: Importing train test split from sklearn

After pressing the copy button, add a new cell and paste the information into the eighth frame.

Save and run.

Cell 9: Training the module to solve classification problem

After pressing the copy button, add a new cell and paste the information into the ninth frame.

Save and run.

Cell 10: Number of trees

After pressing the copy button, add a new cell and paste the information into the tenth frame.

Save and run.

Cell 11: Making Predictions

After pressing the copy button, add a new cell and paste the information into the eleventh frame.

Save and run.

Results/Expected Output

Here are the predicted values.

A random forest is a data construct applied to machine learning that develops large numbers of random decision trees analyzing sets of variables. This type of algorithm helps to enhance the ways that technologies analyze complex data.

Every tree should predict the category under whcih the new record belongs y/n.

Cell 12: Confusion Matrix

After pressing the copy button, add a new cell and paste the information into frame 12.

Save and run.

Results/Expected Output

Here is the confusion matrix.

It shows True Positive, True Negative, False Positives and False Negatives.

True Positives and True Negatives is the responses we want.

False Positive and False Negatives show errors where the predictions did not match actual data.

A true positive occurs when the predicted test data and actual data are both yes answers.

There 19 of these: 120, 45, 126, 52, 104, 22, 97, 24, 76, 90, 54, 131, 7, 26, 95, 10, 62, 100, 16

True negatives happen when the predicted test data and actual data are both no responses.

There 6 of these: 83, 8, 113, 48, 43, 68

A false positive occurs when the actual data shows that no sale was made

and the test prediction indicates that a purchaase was made

There is one of these #33.

A false negative is when the actual data indicates that a purchase was made and the prediction is no.

There is one of these #92

Cell 13: Accuracy of the Model

After pressing the copy button, add a new cell and paste the information into frame 13.

Save and run.

Results/Expected Output

The accuracy determined is 92%, which s an excellent result.

Cell 14: Most important reasons for purchase

After pressing the copy button, add a new cell and paste the information into frame 14.

If you want to see just one prediction, you can enter the numbers in prediction = clf.predict([{}]} save it and run it.

Try entering a 0 for price, a 1 for quality, a 0 for deal and a 1 for salesperson. Does this oombination result in a sale?

Save and run.

Results/Expected Output

Results are similar to those listed below.

Item number	Item Name	% Influence
0	Price	0.335179
1	Quality	0.158100
2	Deal	0.385833
3	Salesperson	0.120888

This data shows us which variables had the most influence on the customer purchasing our product.

The data are arranged 0-3.

If you add up all the numbers you will get 100%.

The higher percentage indicates a greater influence on purchasing the product.

The 0 score of 0.3335179 is the influence the customer placed on the price of the product
The 1 score of 0.158100 is the influence that quality had to do with the purhases.
The 2 score of 0.385833 is the deal influence.
The 3 score of 0.120888 is the influence of the salesperson decision to buy the product

Analyzing the results of the Influence data

As you can see, the most important factor influencing the purchase of the product is the deal offered at the trade show (.385833).

The second most influential factor is price (.335179).

The third most important factor is the quality of the product, .158100.

The least important factor is the influence of the salesperson, .120888.

Cell 15: Visisualization of the model

After pressing the copy button, add a new cell and paste the information into frame 15.

Save and run.

Results/Expected Output

Conclusions and Action Plan for Upcoming Trade Show

Price and deal offered were the two most important influencing factors.
Quality was not that important compared to price paid for item.
Feature the cheaper TV, $1,395 since quality was not a very large influencing factor.
Offer the cheaper TV discounted at 25 percent, 350.00, sales price $1,045
The sales force at the last trade show did not make much of an influence on the sale
The sales force either needs to be reduced or retrained to become more effective.

Projected revenue

There were 135 sales. We are assuming that that number will be similar.

135 times 1,045 equals $141,075

Our projected revenue for the show is $141,075.

Record #	Prediction	Actual	Type
84	1	1	TP
10	1	1	TP
75	1	1	TP
2	0	0	TN
24	1	0	FP
100	1	1	TP
107	0	0	TN
7	1	1	TP
16	1	1	TP
86	0	0	TN
68	0	0	TN
22	1	0	FP
45	1	1	TP
60	1	1	TP
76	1	1	TP
52	1	1	TP
13	1	1	TP
73	1	1	TP
85	1	1	TP
54	1	1	TP
103	1	1	TP
8	0	1	FN

Record #	Prediction	Actual	Type
84	1	1	TP
10	1	1	TP
75	1	1	TP
2	0	0	TN
24	1	0	FP
100	1	1	TP
107	0	0	TN
7	1	1	TP
16	1	1	TP
86	0	0	TN
68	0	0	TN
22	1	0	FP
45	1	1	TP
60	1	1	TP
76	1	1	TP
52	1	1	TP
13	1	1	TP
73	1	1	TP
85	1	1	TP
54	1	1	TP
103	1	1	TP
8	0	1	FN

Record #	Prediction	Actual	Type
84	1	1	TP
10	1	1	TP
75	1	1	TP
2	0	0	TN
24	1	0	FP
100	1	1	TP
107	0	0	TN
7	1	1	TP
16	1	1	TP
86	0	0	TN
68	0	0	TN
22	1	0	FP
45	1	1	TP
60	1	1	TP
76	1	1	TP
52	1	1	TP
13	1	1	TP
73	1	1	TP
85	1	1	TP
54	1	1	TP
103	1	1	TP
8	0	1	FN