What do data analysts do?
A data analyst reviews data to identify key insights into a business’s customers and ways the data can be used to solve problems. They also communicate this information to company leadership and other stakeholders.
What is data wrangling?
Data Wrangling is the process of cleaning data from raw data so it can be converted into a structured and desirable format so it can be used for decision-making.
Large amounts of data that have been taken from several sources can be turned into a more usable format using this technique. The data is analyzed using methods like merging, grouping, concatenating, joining, and sorting. After that, it prepares to be utilised with a different dataset.
What is your process for cleaning data?
While the techniques used for data cleaning may vary according to the types of data your company stores, you can follow these basic steps
Step 1: Remove duplicate or irrelevant observations
Remove unwanted observations from your dataset, including duplicate observations or irrelevant observations
Step 2: Fix structural errors
When you measure or transfer data and find odd naming practises, mistakes, or wrong capitalization, such are structural faults. Mislabeled categories or classes may result from these inconsistencies. For instance, you might see both “N/A” and “Not Applicable” appear; nonetheless, they should be considered to be part of the same category.
Step 3: Filter unwanted outliers
Often, there will be one-off observations where, at a glance, they do not appear to fit within the data you are analyzing. If you have a legitimate reason to remove an outlier, like improper data-entry, doing so will help the performance of the data you are working with.
Step 4: Handle missing data
As a first option, you can drop observations that have missing values, but doing this will drop or lose information, so be mindful of this before you remove it.
As a second option, you can input missing values based on other observations; again, there is an opportunity to lose integrity of the data because you may be operating from assumptions and not actual observations.
As a third option, you might alter the way the data is used to effectively navigate null values.
What is the difference between Data Mining and Data Profiling?
Data Profiling is a process of evaluating data from an existing source and analyzing and summarizing useful information about that data.
Data mining refers to a process of analyzing the gathered information and collecting insights and statistics about the data. It is also called data archaeology.
What is the data analysis process?
Data Analytics is the process of collecting, cleaning, sorting, and processing raw data to extract relevant and valuable information to help businesses. An in-depth understanding of data can improve customer experience, retention, targeting, reducing operational costs, and problem-solving methods.
What do you mean by data visualization?
Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data
What are some of the common problems Data analytics faces?
- Inability to define user requirements properly.
- Carrying out system changes without considering the impact on data of other departments.
- Lack of a unified corporate picture.
- Collecting meaningful data to the agreed standard
What is the significance of Exploratory Data Analysis (EDA)?
The main purpose of EDA is to help look at data before making any assumptions. It can help identify obvious errors, as well as better understand patterns within the data, detect outliers or anomalous events, find interesting relations among the variables.
What is the difference between descriptive, predictive, and prescriptive analytics?
It answers the question “what has happened” by offering historical data.
knows what might happen in the future to provide insight
Suggest various courses of action to answer “what should you do”
Uses data aggregation and data mining techniques
utilises forecasting methods and statistical models
advises potential outcomes using modeling techniques and optimization strategies.
For instance, you could utilize descriptive analytics to identify the apparel categories that are selling well and those that are not, or to determine the locations with the highest sales. Descriptive analytics can also be used to understand consumer behavior, such as which client categories are purchasing particular goods.
Predictive analytics can be used, for instance, to forecast sales for the upcoming quarter or to identify the clients who are most likely to purchase a specific product. By revealing potential future outcomes, predictive analytics can assist you in making smarter business decisions.
The best inventory levels for each product at each location are suggested by a grocery store using data on sales, weather trends, and other variables. Making data-driven decisions about inventory management, cutting waste, and boosting earnings is done using this paradigm.
What is normal Distribution?
A normal distribution refers to a probability distribution where the values of a random variable are distributed symmetrically. These values are equally distributed on the left and the right side of the central tendency. Thus, a bell-shaped curve is formed.
What is Time Series Analysis ?
Time series analysis is a specific way of analyzing a sequence of data points collected over an interval of time. In time series analysis, analysts record data points at consistent intervals over a set period of time rather than just recording the data points intermittently or randomly.
Data Analyst Interview Questions: Excel
What is the difference between CountA, CountBlank, and CountIf in Excel?
COUNT function returns the count of numeric cells in a range
COUNTA function counts the non-blank cells in a range
COUNTBLANK function gives the count of blank cells in a range
COUNTIF function returns the count of values by checking a given condition
How do and functions work in Excel?
The AND function is used to check more than one logical condition at the same time.
The AND function returns TRUE if all its arguments evaluate to TRUE, and returns FALSE if one or more arguments evaluate to FALSE. One common use for the AND function is to expand the usefulness of other functions that perform logical tests.
What is a PivotTable on Excel used for?
A PivotTable is a powerful tool to calculate, summarize, and analyze data that lets you see comparisons, patterns, and trends in your data. PivotTables work a little bit differently depending on what platform you are using to run Exce
Which function in Excel to get the current date and time?
The NOW function and today function is useful when you need to display the current date and time on a worksheet or calculate a value based on the current date and time, and have that value updated each time you open the worksheet.
Can you tell what is a waterfall chart and when do we use it?
A waterfall chart shows a running total as values are added or subtracted. It’s useful for understanding how an initial value (for example, net income) is affected by a series of positive and negative values. The columns are color coded so you can quickly tell positive from negative numbers
What is a Pivot Table, and what are the different sections of a Pivot Table?
PivotTable areas are a part of PivotTable Fields Task Pane. By arranging the selected fields in the areas, you can arrive at different PivotTable layouts. As you can simply drag the fields across areas, you can quickly switch across the different layouts, summarizing the data, in a way you want.
A Pivot table is made up of four different sections:
- Values Area: Values are reported in this area
- Rows Area: The headings which are present on the left of the values.
- Column Area: The headings at the top of the values area makes the columns area.
- Filter Area: This is an optional filter used to drill down in the data set.
SQL Interview Question
Difference between Primary Key and Foreign Key in Database
A primary key uniquely identifies a row in a table, while a foreign key is used to link two tables together by referencing the primary key of the related table. The most important difference that you should note here is that a primary key cannot have a NULL value, whereas a foreign key can accept NULL values.
It is used to uniquely identify data in the table.
It is used to maintain relationship between tables.
It can’t be NULL.
It can accept the NULL values.
Two or more rows can’t have same primary key.
It can carry duplicate value for a foreign key attribute.
Primary has clustered index.
By default, It is not clustered index.
Primary key constraint can be defined on temporary table.
It can’t be defined on temporary tables.
How many subsets are there in SQL?
Subsets of SQLThere are three main subsets of the SQL language:. Data Control Language (DCL). Data Definition Language (DDL). Data Manipulation Language (DML)Each set of the SQL language has a special purpose:.
What are aggregate functions
Aggregate functions perform calculations on a set of values and return a single value. The common aggregate functions are:
- COUNT (counts the number of rows in the table)
- SUM (returns the sum of all values of a numeric column)
- AVG (returns the average of all values of a numeric column)
- MIN (returns the lowest value of a numeric column)
- MAX (returns the highest value of a numeric column).
What is Group by functions
The GROUP BY statement groups rows that have the same values into summary rows, like “find the number of customers in each country”.
The GROUP BY statement is often used with aggregate functions (COUNT(), MAX(), MIN(), SUM(), AVG()) to group the result-set by one or more columns.
SELECT COUNT(CustomerID), Country
GROUP BY Country;
Explain the different types of joins in SQL.
In SQL, the link clause is used to link rows from two or more tables based on a shared column. For table merging and data retrieval, use this clause. The most typical kinds of JOIN instructions consist of:
An INNER JOIN is the most common type of JOIN command and is used to return all rows from two or more tables when the JOIN condition is met.
LEFT JOIN –
This type of JOIN command returns rows from the left table when a matching row from the right meets the JOIN condition.
RIGHT JOIN –
This JOIN command is similar to a LEFT JOIN, but rows are returned instead from the right table when the JOIN condition on the left is met.
FULL JOIN –
FULL JOIN returns all rows from the left and the right when there is a match in any of the tables.
What are constraints?
Rules that can be imposed to the kind of data in a table are known as constraints in SQL. They are utilised to restrict the kind of data that can be kept in a specific table column. Some typical restrictions include:
NOT NULL –
This constraint prevents null values from being stored in a column.
This constraint says that values in a column must be unique. PRIMARY KEY uses the UNIQUE constraint.
PRIMARY KEY –
This constraint is used to specify which field is the primary key.
FOREIGN KEY –
This constraint uniquely identifies a row in another table.