Note: This project is part of the RevoU Fullstack Data Analytics Program's assignments.
This project is one of the top advanced assignments in the RevoU FSDA Program's SQL weeks.
In this project, I conducted an analysis on the sales of an e-commerce business to identify products with the lowest revenue growth. Based on this analysis, I decided which product to deprioritize. After analyzing the sales data, I calculated the retention rate using cohort analysis.
For this analysis, I used Google BigQuery. Additionally, I utilized Google Colab to automate the queries for the cohort analysis.
The dataset used in this project is "TheLook E-Commerce," which is a public dataset provided by Google.
To identify products with the lowest growth, I am using this query.
with step1 as ( SELECT p.category , COALESCE(SUM(oi.sale_price), 0) as revenue FROM `sql-project-376612.thelook_ecommerce.order_items` oi JOIN `sql-project-376612.thelook_ecommerce.products` p ON oi.product_id = p.id JOIN `sql-project-376612.thelook_ecommerce.orders` o ON oi.order_id = o.order_id WHERE DATE(o.created_at) BETWEEN '2021-01-01' AND '2021-12-31' AND lower(o.status) = 'complete' GROUP BY 1 ) , step2 as ( SELECT p.category , COALESCE(SUM(oi.sale_price), 0) as revenue FROM `sql-project-376612.thelook_ecommerce.order_items` oi JOIN `sql-project-376612.thelook_ecommerce.products` p ON oi.product_id = p.id JOIN `sql-project-376612.thelook_ecommerce.orders` o ON oi.order_id = o.order_id WHERE DATE(o.created_at) BETWEEN '2022-01-01' AND '2022-12-31' AND lower(o.status) = 'complete' GROUP BY 1 ) SELECT step1.category , step1.revenue as revenue_2021 , step2.revenue as revenue_2022 , CAST((step2.revenue - step1.revenue) as decimal) / step1.revenue * 100 as Growth FROM step1 FULL OUTER JOIN step2 ON step1.category = step2.category order by 4
I am using this SQL query as the base query for cohort analysis.
WITH aggregated_view AS ( SELECT oi.user_id , oi.order_id , oi.created_at FROM `sql-project-376612.thelook_ecommerce.order_items` oi JOIN `sql-project-376612.thelook_ecommerce.products` p ON oi.product_id = p.id WHERE p.category = DESIRED CATEGORY AND lower(oi.status) = 'complete' AND EXTRACT(YEAR FROM oi.created_at) IN (2022) ) , cohort_items AS ( SELECT user_id , MIN(DATE(DATE_TRUNC(created_at,MONTH))) AS cohort_month FROM aggregated_view GROUP BY 1 ) , user_activities AS ( SELECT av.user_id , DATE_DIFF( DATE(DATE_TRUNC(av.created_at,MONTH)), cohort.cohort_month, MONTH ) AS month_number FROM aggregated_view av LEFT JOIN cohort_items cohort ON av.user_id = cohort.user_id WHERE EXTRACT(YEAR FROM cohort.cohort_month) IN (2022) GROUP BY 1, 2 ) , cohort_size AS ( SELECT cohort_month , count(1) AS num_users FROM cohort_items GROUP BY 1 ORDER BY 1 ) , retention_table AS ( SELECT c.cohort_month , a.month_number , COUNT(1) AS num_users FROM user_activities a LEFT JOIN cohort_items c ON a.user_id = c.user_id GROUP BY 1, 2 ) -- our final value: (cohort_month, size, month_number, percentage) SELECT retention_table.cohort_month , cohort_size.num_users AS cohort_size , retention_table.month_number , retention_table.num_users AS total_users , CAST(retention_table.num_users AS decimal)/ cohort_size.num_users * 100 AS percentage FROM retention_table LEFT JOIN cohort_size ON retention_table.cohort_month = cohort_size.cohort_month WHERE retention_table.cohort_month IS NOT NULL ORDER BY 1, 3
Because the query needs to be run for each product category for cohort analysis, I'm using Google Colab to automate the query. The notebook can be found at this link:
Thank you for reviewing my project on E-commerce Inventory Optimization and Retention Rate Analysis. I hope you found the information presented to be informative and helpful. If you have any questions or feedback, please don't hesitate to reach out to me.