Table of Contents
ToggleCorrelation measures how closely two variables are numerically related, while causality implies that one variable directly affects another. A high correlation does not necessarily mean causality, and a low correlation does not mean that there is no effect.
During summer, the sale of ice cream increases, and so do drowning accidents. Does this mean ice cream sales directly cause drowning? No. The common factor here is summer weather, which influences both variables independently.
import pandas as pd
import matplotlib.pyplot as plt
data = {
"Drowning_Accident": [20, 40, 60, 80, 100, 120, 140, 160, 180, 200],
"Ice_Cream_Sale": [20, 40, 60, 80, 100, 120, 140, 160, 180, 200]
}
Drowning = pd.DataFrame(data)
Drowning.plot(
x="Ice_Cream_Sale",
y="Drowning_Accident",
kind="scatter",
color='blue',
title="Ice Cream Sales vs. Drowning Accidents"
)
plt.xlabel("Ice Cream Sales")
plt.ylabel("Drowning Accidents")
plt.show()
correlation_beach = Drowning.corr()
print(correlation_beach)
The scatter plot shows a perfect positive correlation between ice cream sales and drowning accidents. However, this does not mean that one causes the other. Instead, both are influenced by summer weather.
Always critically analyze the context of your data. Correlation does not confirm causation, and hidden variables or external factors often influence observed relationships. Use domain knowledge and logical reasoning for accurate conclusions.
