Ecommerce Sales Analysis
Overview
This project is a Data Analysis Project performed using SQL. This project will answer several questions regarding the Brazilian Ecommerce Public Dataset by Olist.
The dataset contains information on 100,000 orders made at multiple marketplaces in Brazil from 2016 to 2018. Its features allow viewing an order from multiple dimensions: from order status, price, payment and freight performance to customer location, product attributes, and customer reviews.
The below image is the Data Schema provided by Olist on Kaggle.

Data Schema
Table name | Rows | Datatype changes |
---|---|---|
olist_order_dataset → olist_order | 99,441 | VARCHAR(100) to TIMESTAMP: ‘order_purchase_timestamp’ ‘order_approved_at’ ‘order_delivered_carrier_date’ ‘order_delivered_customer_date’ ‘order_estimated_delivery_date’ |
olist_order_items_dataset → olist_items | 112,650 | VARCHAR(100) to TIMESTAMP: ‘shipping_limit_date’ |
olist_customers_dataset → olist_customers | 99,441 | INTEGER to VARCHAR(50): ‘customer_zip_code_prefix’ |
olist_order_payment_dataset → olist_payment | 103,886 | |
olist_products_dataset → olist_products | 32,951 | |
olist_sells_dataset → olist_sells | 3,095 | INTEGER to VARCHAR(50): ‘seller_zip_code_prefix’ |
olist_geolocation_dataset → olist_geolocation | 1,000,163 | INTEGER to VARCHAR(50): ‘geolocation_zip_code_prefix’ |
olist_order_reviews_dataset → olist_reviews | 100,000 | See note |
product_category_name | 71 |
To change the datatypes I tried using several ALTER commands but as I’m using DB Browser it does not support the ALTER Commands.
So I created new tables with the specified datatype changes and dropped the existing tables.


To get a better understanding of the datasets we can look at a ER Diagram.

The olist_products_dataset has a column olist_product_names that are in brazillian.
So I updated the olist_products_dataset to add a new column product_category_name_english from product_category_name_translation, while the ALTER Command would be ideal as I am using DB Browser, which does not support the ALTER command.

I noticed that there are missing product_category_name’s in olist_products_dataset.
So I updated the missing values to NA


Similarly for olist_product_category_name_english I changed the missing values to NA

Furthermore in the olist_products_dataset I noticed that there are few ‘N/A’ in product_category_name_english which are not ‘N/A’ in product_category_name
I updated two categories, ‘portable kitchens and food preparers’ and ‘pc-game’, in product_category_name_english’.
In the end, there are 74 distinct product categories and 610 rows of ‘N/A’.

Data Analysis
In this section there are 14 questions that I will focus on answering.
- What is the total revenue generated by Olist, and how has it changed over time?
- How many orders were placed on Olist, and how does this vary by month or season?
- What are the most popular product categories on Olist, and how do their sales volumes compare to each other?
- What is the average order value (AOV) on Olist, and how does this vary by product category or payment method?
- How many sellers are active on Olist, and how does this number change over time?
- What is the distribution of seller ratings on Olist, and how does this impact sales performance?
- How many customers have made repeat purchases on Olist, and what percentage of total sales do they account for?
- What is the average customer rating for products sold on Olist, and how does this impact sales performance?
- What is the average order cancellation rate on Olist, and how does this impact seller performance?
- What are the top-selling products on Olist, and how have their sales trends changed over time?
- Which payment methods are most commonly used by Olist customers, and how does this vary by product category or geographic region?
- How do customer reviews and ratings affect sales and product performance on Olist?
- Which product categories have the highest profit margins on Olist, and how can the company increase profitability across different categories?
- Geolocation has high customer density. Calculate customer retention rate according to geolocations.
What is the total revenue generated by Olist, and how has it changed over time?
First lets find the timeframe of the dataset. It dates from 2016-09-04 to 2018-10-17

I then checked order_status, there are 8 types of order status

To calculate the revenue, the payment would have gone through only if the order is delivered to the customer.
I notice that there are missing values in order_delivered_customer_date even when the order_status is approved this can be because of various reasons like the customer cancelling the order after delivery date, or stock unavailability.
We use order_delivered_customer_date to calculate revenue because it confirms that the order was successfully completed. Orders without a delivery date may not have resulted in actual revenue, making them unreliable for financial analysis.
From the below image we can notice the invalid orders with their order_status

Some key takeaways:
- High Number of Unfulfilled Orders – Many orders in shipped (1,107), processing (301), and invoiced (314) statuses were never delivered, possibly due to logistics issues or missing updates.
- Cancellations & Stock Issues – Canceled (619) and unavailable (609) orders indicate a significant number of failed transactions.
- Data Inconsistency in Delivered Orders – 8 “delivered” orders have NULL delivery dates, suggesting data entry errors.
- Most Approved Orders Progressed – Only 2 “approved” orders remained stagnant, meaning most were either processed or canceled.
To calculate the revenue my first thought was to excluded the rows in ‘order_status’ that are ‘canceled’ and ‘order_delivered_customer_date’ that are null. But after looking at the dataset several time I noticed some orders in “shipped,” “processing,” and “invoiced” statuses also have NULL order_delivered_customer_date. These orders may not be canceled, but they also haven’t been delivered, meaning they shouldn’t count as revenue yet.
So instead of just excluding canceled, I’m including orders where order_status = ‘delivered’ and order_delivered_customer_date IS NOT NULL.
This will ensure I’m only counting truly completed transactions where the customer actually received the order.

From the above image we can clearly notice that the total revenue is 15.4 Million USD.
Now lets analyze the revenue per year and quarter


Q2: How many orders were placed on Olist, and how does this vary by month or season?
There were 96, 470 orders places on the olist website.



Over here we can see how the order count and revenue has changed over the 2 years. The most orders were in the 2018 Q1 but the highest revenue was during 2018 Q2.
Overall it looks like the ecommerce site was gradually making more sales each year.
Q3: What is the average order value (AOV) on Olist, and how does this vary by product category or payment method?
There are 74 product categories, below I’ve attached the image with some of the product categories and their name in English.

We can JOIN the order items dataset with the products dataset on the product_id to get the top 10 product category names with the most sales volume.
From the image below we can see that health and beauty products was the most selling product after which came gifts, bath tables, sports items, computer accessories and more.


Q3: What are the most popular product categories on Olist, and how do their sales volumes compare to each other?
By finding the total sales and total order per product category we can calculate the most popular product categories and their sales volumes.
From the below image we can see that health_beauty products have the highest sales followed by watches, and bed bath tables.
After changing the ORDER BY with the total_order instead of the total_sales I noticed that the most ordered products are – bed bath tables, health and beauty and sport leisure products




Q4: What is the average order value (AOV) on Olist, and how does this vary by product category or payment method?
We can JOIN the order items dataset with the products dataset on the product_id to get the average order value similar to how we found the top 10 best selling products.
From the image below we can see that most orders were computers.


AOV = total revenue / total orders
CPO = total cost of product / total orders
Profit Per Order = AOV – CPO
Average Order Value (AOV) is a key performance indicator (KPI) that measures the average amount of money spent by each customer on a website or ecommerce store, calculated by dividing total revenue by the total number of order
The Cost Per Product (CPO) is a key performance indicator (KPI) that measures the total cost of production (direct materials, direct labor, and manufacturing overhead) and then divide that total by the number of units produced
And the Profit Per Order is the difference between the AOV and CPO.
To calculate these using SQL, I used 2 Joins between the olist_orders_dataset, olist_order_items_dataset, and olist-order_payment_dataset.
From the result we can notice:
- Average Order Value is 204.99
- Cost Per Product is 167.0
- Profit Per Order 86.0

Now to get the AOV, CPO and Profit Per Order with the Payment Type

From here we can see the 4 Payment types and the Average Order Value, Cost Per Order and Profit Per Order by each of the Payment Types
I notice that the profit per order for Voucher is -31.0 this makes sense as vouches are usually promotions, and marketing strategiest
Now to get the AOV, CPO and Profit Per Order with the Product Category Name


Q5: How many sellers are active on Olist, and how does this number change over time?


Q6: What is the distribution of seller ratings on Olist, and how does this impact sales performance?
Sellers with a “Review Score 5” receive the highest number of orders (63,312) and generate the most revenue (8,910,279.16). Whereas, sellers with lower scores (1 & 2) receive significantly fewer orders and generate much lower revenue.
This indicates that sellers with better ratings are trusted more by customers, leading to higher sales.


Q7: How many customers have made repeat purchases on Olist, and what percentage of total sales do they account for?
As we can see there are 9635 customers who have made more than one purchase, this has resulted in 2390783.41 in Revenue.
The revenue resulted accounts for 15.09% of the total revenue

Q8: What is the average customer rating for products sold on Olist, and how does this impact sales performance?
The average customer rating is 4.09

To get the average customer rating per product

Now lets see the number of orders and revenue generated with their rating

We can see that for review scores 4, and 5 the number of orders and total revenue is lot higher.
Showing that products with higher ratings are purchased more often and generate a higher revenue
Q9:What is the average order cancellation rate on Olist, and how does this impact seller performance?
The average order cancelation rate is 0.63

The seller with the highest orders have a cancelation rate of 0.39.
There are sellers with a seller cancellation rate as 100 these sellers have only completed one order which was cancelled.

Q10: What are the top-selling products on Olist, and how have their sales trends changed over time?

The best selling products are from the categories bed bath table, health and beauty, and sports leisure. From the result of the above SQL code we can see their total_orders and total_revenue.