The applications of ANOVA (One-Way ANOVA, Two-Way ANOVA) and regression techniques in the context of e-commerce firms

E commerce business is one of data centric industry which is taking advantage of technology advancement and big data. They are analyzing data to advance profit, sale and lowering risks to increase market share and values.

ANOVA and regression techniques are part of Predictive Analytics. Predicting a certain outcome and implementing the required steps can help in rapid growth of company.

Companies like Amazon, Flipkart are capable of building predictive algorithms being executed in real time on big data environment. These models can be based on various statistical calculations.

Now lets talk about methods applications and assumptions:

Analysis of variance known as ANOVA is tool for analysis of data. It is statistical method to compare the population means of two or more groups by analyzing variance. The variance would differ only when there is significant difference in means.

One- Way ANOVA:-

  • It is hypothesis test in which only one categorical variable or single factor is taken into consideration.
  • With the help of F-distribution it enables us to compare the means of three or more samples.
  • Null hypothesis is “All population means should be equal” whereas Alternate hypothesis is “There should be the difference in at least one mean”

Assumptions:-

  • Populations from which the samples are drawn are approximately normally distributed.
  • The populations from which the samples are drawn have the same variance.
  • The samples drawn from different populations are random and independent.

Applications:-

  • Gender as categorical variable impacting state wise sales of ecomm sites e.g. Filpkart or Amazon.
  • Different level of Blood pressure in 3 groups of populations
  • Measure glycogen content for multiple samples of heart, liver, kidney, lungs etc
  • Examines the effect of two independent factors on dependent variable
  • Also studies the inter-relationship between independent variables influencing the values of the dependent variables, if any.

Assumptions:-

  • Populations from which the samples are drawn are approximately normally distributed.
  • The categorical independent group should have the same size.
  • Two or more than two categorical independent groups in two factors.
  • Measurement of dependent variables at continuous level.

Applications:-

  • Analyzing the test score of a class based on gender and age. Here test score is the dependent variable and gender and age are the independent variables.
  • Measure response to three different drugs in both men and women. Drug treatment is one factor and gender is the other.

Lets have an example:

  1. E-commerce site can compare various sellers available on the site for the best delivery time and recommendation of the best seller to customer.

Eg.

We can use one-way ANOVA to select best dealer based on least delivery time estimation.

H0: each dealer taking same as others. ( mean of each dealer is same)

H1: There is a time difference in delivery the product to the dealer.

Based on the analysis e-commerce site suggest best dealer to deliver the which product.

We can also do two-way ANOVA for the identifying which state would impact the delivery time. Further it can be drilled down into cities for more effective relation identification.

2. E-commerce site can categorize the product segment, mode of payment and sales end result for the sales analysis.

Here we can-do two-way ANOVA to determine for what mode of payment, product segment and end result for the sales.

Here we can find which segment what mode of payment is more effective and is it affecting the overall sales.

Which product segment gets most impacted by the end result as mentioned above in the table.

Conclusion:

More the Independent variables the better the predictions in large E-commerce setup. If you apply against a State alone there may be differences one can derive based on data of course. There could be a possibility that one Dealer is performing better in one/few states compared to overall in all states.

I guess the more we deep dive we get more insights to specific conditions, situations and areas.

I am a 9yrs+ experienced Senior Consultant in Analytics and Model development with domain expertise in BFSI.