Data Analysis

Day 5: Importing Excel Data and working with Pandas

With Pandas you can work with files in various formats: text files, comma separated value (csv) files and MS Excel files. The elements in a Pandas series are assigned indexes. The first element is assigned an index of 0, the second one 1 the third one 2 and so on. The last element is assigned an index of N-1 with N denoting the total number of items.

Most of you have probably worked with Microsoft's Excel spreadsheet. We are able to import an Excel spreadsheet into a Pandas dataframe. Many business records are stored in Excel sheets: Customer information, Accounting and financial records, etc.

We are going to work with some customer information for a sporting goods store.

First you need to download the Excel spreadsheet and save it into your working folder.

This spreadsheet is quite small. Normally customer records would have thousands of records. I am using this small sample, just to give you an idea what it would be like to use Python's Pandas library to do data analysis.

Customer Orders

Now open Spyder text editor and key in the following lines. Save the code using a file name with a .py extension. Run the file using F5 or F9

Answer the worksheet questions relating to this section.

import pandas as pd

Import sys

sys.__stdout__= sys.stdout

data = pd.read_excel('CustomerOrdersDept.xlsx')


Add the following line to the end of your code. data.describe() Save and run.

This line describes the data base and gives very important information: Number of records, the mean or average sale, tax and total, the lowest and highest sales, tax and total.

Suppose you were looking for best sellers?

Add this line. data[data.ProductDescription == 'Nike Shorts']# search for individual items purchased

Save and run. What did you find out?

Suppose you wanted to see buying habits of a particular customer, add this line at the end of your code.

data[data.CustomerLastName == 'Hall']# find all purchases by Hall

When you run the code, you will find two orders by a customer named Hall. Check the record numbers and you can then look up the entire transaction on the spreadsheet.

To see all records by Last name, description and department, key in the following line.

print(data.loc[:,['CustomerLastName','ProductDescription','Department']])# printout all data customer last name, product description and department of the item

To see customer last names, items purchased and departments for records 0,3,6, add the following line.

print(data.loc[[0,3,5],['CustomerLastName','Department']])# select 0,3 5 array elements and display customer last name and department purchased item belongs to

Knowing department sales is an important part of data analysis. Python's Pandas library has a functioon to give us that information.

Add these lines of code, one for each department. Save and run.

data[data.Department == 'Apparel'] # show all apparel sales

data[data.Department == 'RollerSports'] # show all roller sports sales

data[data.Department == 'Baseball'] # show all baseball department sales

data[data.Department == 'SportingArms'] # show all gun sales

data[data.Department == 'Football'] # show all footballl sales

data[data.Department == 'Basketball'] # show all basketballl sales

data[data.Department == 'Soccer'] # show all soccer sales

data[data.Department == 'Outdoors'] # show alloutdoors sales

data[data.Department == 'Fitness'] # show all fitness sales

data[data.Department == 'WaterSports'] # show all water sports sales

data[data.Department == 'Golf'] # show all golf sales

data[data.Department == 'Knives'] # show all knives sales

data[data.Department == 'Tennis'] # show all tennis sales

The rows and columns information at the end of the list gives the total number of records or rows in that given department.

Answer the questions on the worksheet regarding department sales.

You can select very specific parts of each record using the following code examples.

print(data.Price[15]) print(data.Total[50])

Add these lines of code and try then out. Answer the worksheet questions pertaining to these two lines of code.

Add this line of code next.

print(data.Department)# prints all invoices and the departments associated with the sale.

To get customer last name, product description and price add this line of code.

print(data.loc[:,['CustomerLastName','ProductDescription','Price']])# printout all data customer last name, product description and amount spent on the item

You can also use the Pandas library to sort things. Let's say you want to see a sorted list form low to high of all prices of your products. Try this code.


Suppose you are looking for regular and repeat customers, try sorting customners' last names with the following two lines of code.

If their name appears more than once they are repeat customers and you should reward tem for their repeat business.

CustomerList=sorted(data.CustomerLastName) print(CustomerList)

The full-listing of the code appears in the box above.

Day 6: Data Visualization

Many times it is easier to analyze data if you can see it in a graphic form.

Pandas has some tools that make it quite easy to turn an Excel spreadsheet and csv files into graphs.

Let's take an example. One of the lessons on our web page.

Starting Your Own Business

In starting your own business, examining price is an important thing that needs to be addressed in the business plan.

When determining what price to put on an item, you need to know the costs involved. In this particular case, the product is a beef slider sold from a food truck.

The costs involved are the cost of the meat, the bun, the other ingredeients.

The graphs below were generated from the data in the Food Costs Spreadsheet.

Food Costs Excel spreadsheet

Save this file using the same name in your working folder.

This line graph shows the increase in the cost of ingredients to make a beef slider over the past year.

This graph is an unstacked bar graph showing the same infomation as the graph above.

This graph also show the same data in a slightly different form of bar graph.

Here is the code that creates these graphs.

Key the above information into Spyder, save it and execute the code.

What can you tell about the price of beef?, buns?, other ingredients?, total overall cost?

Experiment with changing the data in the spreadsheet to see how the visualization changes.

Now its your turn to create a spreadsheet and graph the information.

Copy and paste the above code into Spyder and save it with a .py extension in your working folder.

Now create your own Excel Spreadsheet. Open the Excel template.xls file and enter the following data into Excel and Save it as insideSlider.xlsx

The numbers represent sales of food truck items by the day of the week.

300.00 200.00 100.00 75.00 150.00 150.00 166.00 136.00 78.00 136.00 175.00 165.00 88.00 67.00 136.00 110.00 167.00 90.00 50.00 100.00 169.00 150.00 99.00 65.00 110.00 190.00 175.00 125.00 71.00 135.00 325.00 210.00 146.00 80.00 200.00

Each row represents a day of the week. There are five food items represented by the colums:

Run your Python code and answer the following questions after examining the graph.

By now you can see the benefits of using graphs for analyzing data.

It is a useful tool in presenting the information to others in the organization.