Approach

First we collected all of the spend data from the stakeholders responsible for those budgets. We were able to collect data for spend for the past 4 years however the smallest level of granularity available for Brand TV spend was at the month level. With this in mind we built two models, one without Brand TV spend that could be built at the week level of granularity focused on digital campaigns, and one with Brand TV spend built at the month level of granularity. I was responsible for building the Brand TV spend model.

Because my model would be based on TV spend, the next step was to create the Ad Stock feature. The effects of TV spend accumulate over time which is why it is common practice to represent TV spend with Ad stock which represents that accumulation. First I multiplied the Brand TV spend by the GRP which represents the percentage of our target audience that was reached with our TV ads. Next I used a linear regression using Brand TV spend and applications. Finally I was able to calculate the Ad Stock rate (the rate at which Brand TV spend accumulates) by reducing the Sum of Squared Errors of that model by changing the ad stock rate, a variable in the regression equation. This rate is then used to create the final Brand Tv Spend representing Ad Stock. 

The next step was to check for collinearity between features. With nearly 50 possible features there were several instances of collinearity, furthermore with only 4 years of data available at the month level it was necessary to reduce the number of features to input into the model

To help with relevant feature selection I ran a random forest regression model for feature selection which narrowed the features to be included to 11, with the Ad Stock Spend variable as most significant, followed by Inflation, Google Search Volumes, and High Spend Digital Campaigns. This aligned perfectly with the focus of my model which was on macro level factors.

I then trained a multiple linear regression model using the selected features. For the final model I also used fbprophet to forecast Google Search Volumes as they were an important feature in the final model. Additionally I made predictions about GRP and inflation to be used in the final model.

My final model was combined with the micro focus model made by my teammate and is currently predicting future applications more accurately than the departments original ARIMA model.

Code Demo

#importing the libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

In [ ]:

#importing the data
df = pd.read_csv("SpendData.csv")

In [ ]:

df.dropna(inplace=True)

In [ ]:

df.columns

In [ ]:

# checking for coliniarity
df_corr = df.corr()

In [ ]:

sns.heatmap(df_corr, cmap='seismic_r')
plt.show()
df_corr

In [ ]:

#Random Forest Feature Selection

from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import RandomForestClassifier

In [ ]:

labels= df.iloc[:,1:].columns

labels = pd.DataFrame(columns=labels)

In [ ]:

X = df.iloc[:,1:].values
y = df.iloc[:,0].values

In [ ]:

sel = SelectFromModel(RandomForestClassifier(n_estimators=100))

In [ ]:

sel.fit(X,y)

In [ ]:

#number of featuers selected
selected_feat = labels.columns[(sel.get_support())]
len(selected_feat)

In [ ]:

#name of selected featuers
print(selected_feat)

In [ ]:

#Create the final df with the selected featuers
testdf = df[['Conversions','Selected Featuer 1','Selected Featuer 2',
             'Selected Featuer 3','Selected Featuer 4',
            'Selected Featuer 5','Selected Featuer 6',]

In [ ]:

#splitting the final df into train and test sets
X = testdf.iloc[:,1:].values
y = testdf.iloc[:,0].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =0.2)

In [ ]:

#linear Regression Model
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()

In [ ]:

regressor.fit(X_train, y_train)

In [ ]:

y_pred = regressor.predict(X_test)