Merging Restaurant Order Datasets and Mastering Data Visualization with Python Charts
Dataset Link
Table of Contents
ToggleThis dataset have two table
menu_items Table:
- menu_item_id: Unique identifier for each menu item. This is often used as the primary key.
- item_name: The name of the menu item, providing a human-readable description.
- category: Indicates the category or type of cuisine to which the menu item belongs.
- price: The cost of the menu item, usually in US Dollars.
order_details Table:
- order_details_id: A unique identifier for each item within an order. This could be the primary key for this table.
- order_id: Identifies the order to which the item belongs. This likely corresponds to a unique order placed by a customer.
- order_date: The date when the order was placed, usually in the format MM/DD/YY.
- order_time: The time when the order was placed, typically in the format HH:MM:SS AM/PM.
- item_id: This field links to the “menu_item_id” in the “menu_items” table, establishing a relationship between the two tables. It indicates which menu item was included in the order.
Read CSV files
import pandas as pd order_details = pd.read_csv(r'C:\Users\yogesh\Downloads\Restaurant+Orders+CSV\order_details.csv') # Read menu details from CSV file Menu_detail = pd.read_csv(r'C:\Users\yogesh\Downloads\Restaurant+Orders+CSV\menu_items.csv') # Display order details DataFrame print("Order Details:") print(order_details) # Display menu details DataFrame print("\nMenu Details:") print(Menu_detail)
Merge DataFrames
# Merge DataFrames based on 'item_id' and 'menu_item_id' merged_df = pd.merge(order_details, Menu_detail, left_on='item_id', right_on='menu_item_id', how='inner') # Display the resulting merged DataFrame print("Merged DataFrame:") print(merged_df)
pd.merge(order_details, Menu_detail, left_on=’item_id’, right_on=’menu_item_id’, how=’inner’):
- This line uses the pd.merge function from the pandas library to merge two DataFrames (order_details and Menu_detail) based on specific columns.
order_details is the left DataFrame, and Menu_detail is the right DataFrame. - The left_on parameter specifies the column in the left DataFrame to use for the merge operation (‘item_id’ in this case).
- The right_on parameter specifies the column in the right DataFrame to use for the merge operation (‘menu_item_id’ in this case).
- The how parameter specifies the type of merge to be performed. In this case, it’s an inner join (how=’inner’), which means only the common rows between the two DataFrames will be included in the result.
print(“Merged DataFrame:”): - This line simply prints a descriptive message indicating that the following output is the merged DataFrame.
print(merged_df): - This line prints the resulting merged DataFrame (merged_df) that was obtained from the pd.merge operation.
In summary, this code is merging two DataFrames (order_details and Menu_detail) based on the ‘item_id’ and ‘menu_item_id’ columns using an inner join and then printing the resulting merged DataFrame. The inner join ensures that only the rows with matching ‘item_id’ and ‘menu_item_id’ values in both DataFrames are included in the final result.
merged_df.info()
merged_df.info()
It seems like you’ve mentioned merged_df.info() in Python, which is likely a command to display information about a DataFrame using the Pandas library. The info() method provides a concise summary of a DataFrame, including information about the data types, non-null values, and memory usage.
Here’s a breakdown of what each part of the output typically includes:
- Index: Information about the index, including its type and the number of non-null values.
- Columns: A list of all columns in the DataFrame, along with the count of non-null values and the data type of each column.
- dtypes: Data types of each column.
- Memory Usage: Total memory usage by the DataFrame.
- Non-Null Count: Number of non-null (non-missing) values in each column.
Price for Analysis
Total_Sales=merged_df['price'].sum() print(Total_Sales)
The variable Total_Price will contain the sum of all the values in the ‘price’ column.
Answer is 159217.89999999997
Average Price
Average_Price=merged_df['price'].mean() print(Average_Price)
13.16176738034223
total number of Orders
Total_order=merged_df['order_details_id'].count() print(Total_order)
12097
Number of orders by Items
item_counts=merged_df.groupby('item_name')['order_details_id'].count().reset_index() print("No of order by Item") print(item_counts)
This code appears to be written in Python and is likely part of a data analysis or manipulation script, particularly using the pandas library. Let me break down what this code is doing:
- merged_df: This is presumably a pandas DataFrame that has been previously defined or loaded. The code is referring to a column named ‘item_name’ and ‘order_details_id’ within this DataFrame.
- groupby(‘item_name’): This is a pandas operation that groups the DataFrame by unique values in the ‘item_name’ column. Essentially, it’s creating groups where each group contains rows with the same ‘item_name’.
- [‘order_details_id’].count(): For each group, it’s counting the number of occurrences of ‘order_details_id’. This essentially gives the number of orders for each unique item.
- .reset_index(): After the grouping and counting, the result is a new DataFrame with ‘item_name’ as an index and the count of orders as a column. The reset_index() method is used to move ‘item_name’ back to a regular column, and a new default integer index is assigned.
- print(“No of order by Item”): This line is simply printing a header or title for the output to indicate that what follows is the count of orders by item.
- print(item_counts): Finally, this line prints the DataFrame item_counts, which now contains two columns: ‘item_name’ and the count of orders for each item.
In summary, the code is grouping the DataFrame by ‘item_name’, counting the number of orders for each item, and then printing the resulting DataFrame that shows the count of orders for each unique item.
Top 5 items by orders
top_5_items =item_counts.sort_values(by='order_details_id',ascending=False).head(5) print(top_5_items)
Let me explain each line:
- sort_values(by=’order_details_id’, ascending=False): This line is sorting the DataFrame item_counts based on the ‘order_details_id’ column in descending order (ascending=False). This means that items with the highest number of orders will appear at the top of the DataFrame.
- .head(5): After sorting, this line selects the first 5 rows of the DataFrame, effectively giving you the top 5 items with the highest order counts.
- print(top_5_items): Finally, this line prints the resulting DataFrame top_5_items, which now contains the top 5 items with the highest order counts.
So, when you run this code after the previous one, it will display the top 5 items based on the number of orders, with the highest count appearing first.
BAR CHART FOR TOP 5 ITEMS
import matplotlib.pyplot as plt plt.bar(top_5_items.item_name,top_5_items.order_details_id) plt.xlabel("name of items") plt.ylabel("Numbers of orders") plt.title("Top 5 items order ") plt.show()
- Importing Matplotlib:import matplotlib.pyplot as pltImports the Matplotlib library and aliases it as plt.
Creating the Bar Chart:
- plt.bar(top_5_items.item_name, top_5_items.order_details_id)
Uses the bar function to create a bar chart.
Takes ‘item_name’ as x-axis values and ‘order_details_id’ as y-axis values from the top_5_items DataFrame.
Setting X and Y Labels: - plt.xlabel(“Name of Items”)
Sets the label for the x-axis as “Name of Items”.
plt.ylabel(“Number of Orders”)
Sets the label for the y-axis as “Number of Orders”.
Setting the Title: - plt.title(“Top 5 Items Orders”)
Sets the title of the plot as “Top 5 Items Orders”.
Displaying the Plot: - plt.show()
Displays the plot.
Bottom 5 Items by orders
print("bottom 5 Items by orders") bottom_5_items =item_counts.sort_values(by='order_details_id',ascending=True).head(5) print(bottom_5_items) plt.bar(bottom_5_items.item_name,bottom_5_items.order_details_id) plt.xlabel("name of items") plt.ylabel("Numbers of orders") plt.title("bottom_5_items ") plt.show()
- Print Statement: print(“bottom 5 Items by orders”)Prints a message indicating that the following information is about the bottom 5 items by orders.
- Sorting and Selecting Bottom 5 Items: bottom_5_items = item_counts.sort_values(by=’order_details_id’, ascending=True).head(5)item_counts is assumed to be a DataFrame with information about items and order details.
- Sorts the DataFrame based on the ‘order_details_id’ column in ascending order.Selects the first 5 rows (bottom 5 after sorting) using head(5).
Stores the result in the bottom_5_items variable.
Printing the Bottom 5 Items: - print(bottom_5_items) Prints the DataFrame containing information about the bottom 5 items by orders.
Creating and Displaying the Bar Chart: - plt.bar(bottom_5_items.item_name, bottom_5_items.order_details_id)
Creates a bar chart using Matplotlib. - Uses ‘item_name’ column as x-axis values and ‘order_details_id’ column as y-axis values from the bottom_5_items DataFrame.
plt.xlabel(“name of items”) - Sets the label for the x-axis as “name of items”.
- plt.ylabel(“Numbers of orders”)
Sets the label for the y-axis as “Numbers of orders”. - plt.title(“bottom_5_items “)
Sets the title of the plot as “bottom_5_items “.
plt.show()
Displays the created bar chart
Number of Order by Category
print("Number of Order by Category") Category_order_count=merged_df.groupby('category')['order_details_id'].count().reset_index() print(Category_order_count) plt.barh(Category_order_count.category,Category_order_count.order_details_id) plt.xlabel("orders") plt.ylabel("Name of category ") plt.title("Order numbers ") plt.show()
- Data Aggregation: Category_order_count = merged_df.groupby(‘category’)[‘order_details_id’].count().reset_index(): This line groups the merged dataset (merged_df) by the ‘category’ column and counts the number of occurrences of ‘order_details_id’ in each category. The result is stored in a new DataFrame called Category_order_count.
- Printing the DataFrame: print(Category_order_count): This line prints the Category_order_count DataFrame, displaying the number of orders for each category.
- Plotting a Horizontal Bar Chart: plt.barh(Category_order_count.category, Category_order_count.order_details_id): This line creates a horizontal bar chart using Matplotlib (plt). It takes the category names on the y-axis (Category_order_count.category) and the corresponding order counts on the x-axis (Category_order_count.order_details_id).
- Customizing the Chart: plt.xlabel(“orders”): Sets the label for the x-axis as “orders.”
- plt.ylabel(“Name of category”): Sets the label for the y-axis as “Name of category.” plt.title(“Order numbers”): Sets the title of the chart as “Order numbers.”
- Displaying the Chart: plt.show(): Finally, this line displays the generated horizontal bar chart with the specified labels and title.
In summary, the code calculates and visualizes the number of orders for each category in a horizontal bar chart, providing a quick and visually accessible way to understand the distribution of orders across different categories.
Remember to ensure that you have the necessary libraries imported at the beginning of your script, such as:
Category wise chart on matplotlib
plt.barh(Category_order_count.category, Category_order_count.order_details_id):
This line creates a horizontal bar chart using the barh function from Matplotlib.
The first argument, Category_order_count.category, represents the categories and is used for the y-axis.
The second argument, Category_order_count.order_details_id, represents the number of orders and is used for the x-axis.
plt.xlabel(“orders”):
This line sets the label for the x-axis to “orders.”
plt.ylabel(“Name of category “):
This line sets the label for the y-axis to “Name of category.”
plt.title(“Order numbers “):
This line sets the title of the chart to “Order numbers.”
plt.show():
Finally, this line displays the generated horizontal bar chart with the specified labels and title.
In summary, this code segment is a concise way to visualize the number of orders for each category in a horizontal bar chart, with clear labels and a title for better interpretation.
Convert hours group from hours minutes seconds
merged_df['order_time']=pd.to_datetime(merged_df['order_time']) merged_df['Hours']=merged_df['order_time'].dt.hour print(merged_df['Hours'])
Convert ‘order_time’ to Datetime Format:
- merged_df[‘order_time’] = pd.to_datetime(merged_df[‘order_time’])
Converts the ‘order_time’ column in the DataFrame merged_df to a datetime format using the pd.to_datetime() function.
Extract Hour Component: - merged_df[‘Hours’] = merged_df[‘order_time’].dt.hour
Creates a new column ‘Hours’ in the DataFrame merged_df.
Utilizes the dt.hour accessor to extract the hour component from the ‘order_time’ column.
Print the Extracted Hour Values: - print(merged_df[‘Hours’])
Prints the values in the newly created ‘Hours’ column, displaying the hour component of each order’s timestamp.
In essence, these steps are preparing and enhancing the dataset by converting the order timestamps to datetime format and extracting the hour component for further analysis or insights related to the timing of the restaurant orders.
Hourse data with items orders
Hours_data=merged_df.pivot_table(index='item_name',columns='Hours',values='order_details_id',aggfunc='count') print(Hours_data)
- Create Pivot Table: Hours_data = merged_df.pivot_table(index=’item_name’, columns=’Hours’, values=’order_details_id’, aggfunc=’count’)
- Uses the pivot_table function to organize data. ‘item_name’ becomes the index, ‘Hours’ becomes the columns, and counts of ‘order_details_id’ are the values.
- Print the Pivot Table: print(Hours_data)
- Displays the resulting pivot table.
Shows counts of orders (‘order_details_id’) for each menu item (‘item_name’) at different hours (‘Hours’).
CREATE HEATMAP ON ITEM AND ORDERS
import seaborn as sns
plt.figure(figsize=(12,8))
sns.heatmap(heatmap_data, cmap='YlGnBu', annot=True, fmt='g')
plt.title("order detail heatmap by hours")
plt.show()
- Import Libraries: import seaborn as sns: Imports the Seaborn library for statistical data visualization.
- Set Figure Size: plt.figure(figsize=(12, 8)): Sets the size of the Matplotlib figure to be 12 units in width and 8 units in height.
- Create Heatmap: sns.heatmap(heatmap_data, cmap=’YlGnBu’, annot=True, fmt=’g’): Generates a heatmap using Seaborn.
- heatmap_data: The data to be visualized in the heatmap.
cmap=’YlGnBu’: Specifies the color map for the heatmap (Yellow-Green-Blue). annot=True: Displays the numerical values in each cell of the heatmap.
fmt=’g’: Specifies the format for the displayed values (general numeric format).
Set Title: - plt.title(“order detail heatmap by hours”): Sets the title of the heatmap to “order detail heatmap by hours”.
Show the Plot: - plt.show(): Displays the created heatmap.
In summary, this code creates a heatmap using Seaborn to visualize the ‘heatmap_data’. The figure size is set, the color map is chosen, numerical values are annotated, and a title is added to enhance the clarity of the visualization. The resulting heatmap provides insights into order details at dImport Libraries: - import seaborn as sns: Imports the Seaborn library for statistical data visualization.
Set Figure Size: - plt.figure(figsize=(12, 8)): Sets the size of the Matplotlib figure to be 12 units in width and 8 units in height.
Create Heatmap:
sns.heatmap(heatmap_data, cmap=’YlGnBu’, annot=True, fmt=’g’): Generates a heatmap using Seaborn.
heatmap_data: The data to be visualized in the heatmap.
cmap=’YlGnBu’: Specifies the color map for the heatmap (Yellow-Green-Blue).
annot=True: Displays the numerical values in each cell of the heatmap.
fmt=’g’: Specifies the format for the displayed values (general numeric format).
Set Title:
plt.title(“order detail heatmap by hours”): Sets the title of the heatmap to “order detail heatmap by hours”.
Show the Plot:
plt.show(): Displays the created heatmap.
In summary, this code creates a heatmap using Seaborn to visualize the ‘heatmap_data’. The figure size is set, the color map is chosen, numerical values are annotated, and a title is added to enhance the clarity of the visualization. The resulting heatmap provides insights into order details at d
Vista Academy Master Program in Data Analytics
Vista Academy’s Master Program in Data Analytics equips you with advanced skills in data analysis, machine learning, and visualization. With practical experience in tools like Python, SQL, Tableau, and Power BI, this program prepares you for high-demand roles in data science and analytics.