What is the offer in Starbucks?

Farah hamad
5 min readAug 13, 2019

This project is part of Udacity’s Data Scientist Nanodegree program(Capstone Project).

Photo by Adriannaca on pexels

project overview:

Keep customers satisfied is one of the most successful roles of business, there is no doubt Starbuckswork to increase customers loyalty. Furthermore, analyzing data is a method to follow the customer’s behaviour and guarantee to strive for their satisfaction.

Problem Statement:

The aim goal of this project to understand customers behave and interact with the type of offers. First of all, we will answer questions about the customer’s demographics and transactions and offer.

  • Type of the customers and their age group.
  • Rate of customer’s income and their transcripts.
  • Types of offers and customer interaction with it.

Metrics:

In this project, the metric we will use (MSE) The mean squared error is by far the most used metric for optimization in regression problems and R2 is another common metric when looking at regression values for evaluating the model.

Data Exploration and Preprocessing:

This project encompasses three Data Sets:

  • portfolio.json — containing offer ids and metadata about each offer (duration, type, etc.)
  • profile.json — demographic data for each customer
  • transcript.json — records for transactions, offers received, offers viewed, and offers completed.

After discovering our files, we just did some data wranglings, which are:

  • In the Portfolio table: we split the type of channel in the Portfolio into the different attributes.
  • In the Transcript table: we split the Value column into offer_id, amount, and reward attributes.
  • In the Profile table: we filled N/A for NaNs values in Gender attribute, and we filled Mode by NaNs values in income attribute. In age attribute we have noticed there is outliers values in age attributes such as 101 and 118, that impossible as age’s value, so we dropped age values that greater than 95.

Before making data ready for analyzing and visualizing, we merged all data sets in one table.

In this part, we will answer business questions by conducting univariate and multivariate analysis:

What are the rates of profiles per age on Starbucks?

In the previous chart, we want to find the age group that has the highest rate of profile. we can say that the elderly age group has the lowest rate for having a profile, then youth group. On the other hand, the adult age group recorded the highest rate of 4000 profiles.

What is the age groups per gender that include in Starbucks profiles?

As we can notice from the chart the most of the ages in the profile data frame are between 35 and 70 for, and the median in the female is 58 approximately, 55 for male and other.

What are the rates of incomes per ages in Starbucks profiles?

From the previous chart, we can say that 49, 51, 52, and 64 recorded the highest rate of incomes as an adult and elderly with approximately 70000, in the other hand, 23 and 34 recorded the lowest income rate with approximately 40500.

Are there any increases in the number of profiles every month that depends on the rates of income for members?

From the above plot, we can say that the number of profiles every month that depends on the rates of income for members was suddenly decreased and increased each month. Furthermore, we can note that most rates of new profiles occurred in Aug with 9562730.0 profiles approximately, and the lowest rate occurred in Feb.

What are the rates of Starbucks members rewards every year?

As we can see from the previous plot, the rewards were increased over the years from 2013 to 2017, and we can note that the year 2017 has occurred the highest rate of rewards for members. However, it was suddenly decreased in 2018.

What are the rates of events In Transcripts?

The given bar chart illustrates the rate of events In transcripts recorded the highest rate in the type of offer which is (offer received) with 70000 approximately, then offer viewed then offer completed.

What is the highest Offers Type chosen by gender?

As we can find from the previous plot, males record the highest number of using BOGO and Discount promotions with approximately 35000. In comparison, females recorded the lowest rate of using promotions with approximately 27000 for BOGO and Discount promotions

What are the rates of completed promotion for each offer types?

What is the rate of offer type which is a complete offer and type of promotion which is a Bogo promotion?

From the above plot, the rate of offer type which is a Complete Offer and type of promotion which is a Bogo, we can say that the offer Id: (9b98b8c7a33c4b65b9aebfe6a799e6d9) has approximately 4200 number of completions.

What is the rate of offer type which is a complete offer and type of promotion which is a Discount promotion?

From the above plot, the rate of offer type which is a Complete Offer and type of promotion which is a Discount, we can say that the offer Id: (fafdcd668e3743c1bb461111dcafc2a4) has approximately 4900 number of completions.

Data Modeling:

In this part, we will use many types of models such as GaussianNB, DecisionTreeClassifier, LinearRegression, and KNeighborsClassifier to find the highest accuracy of determining the best type of offers:

before starting modelling, we do some steps:

  • One Hot Encoding for Event and Gender columns.
  • Replace offer types to 1 for BOGO, 2 for discount, 3 for Informational.

Then we found that GaussianNB the best model with 61% of R2 and 19% of mean squared error.

Conclusion:

In this project, we analyzed Starbucks customers, first of all, we start cleaning the data, the assessment process, after that, we did data visualization to get the results from our analysis. Moreover, we found that the Male is recorded at the highest rate of Starbucks customers. The adult age group which has the highest rate of incomes have the highest rate of having a profile as s Starbucks member. Also, males recorder the highest rate using promotions especially: BOGO, and discount promotions type. Finally, we found the best model for the best offer for customers which is GaussianNB classifier with61% of R2 and 19% of mean squared error, the metric we used (MSE) Mean squared error and R2 for evaluating the model, for optimizing a model we should have the lowest MSE and highest R2 value.

For finding more about this analysis, take a look at this Github link.

--

--