Essential Data Analytics Interview Questions: Ace Your Next Interview
What is the difference between Data Mining and Data Analysis?
Table of Contents
ToggleData Mining
- Used to identify patterns in stored data.
- Mining is performed on clean and well documented data.
- Results extracted from data mining are not easy to interpret.
Data Analysis
- Used to order & organize raw data in a meaningful manner.
- Data cleaning is part of the data analysis process. As a result, the data is not accurately recorded.
- Results extracted from data analysis are easy to interpret.
To recap, Data Mining is a technique for identifying patterns within recorded data. Analysts employ algorithms to identify patterns, which is commonly used in Machine Learning. Data analysis involves cleaning and organizing raw data to get insights.
What is the process of Data Analysis?
Data analysis involves gathering, cleaning, analyzing, converting, and modeling data to provide insights and reports for commercial profitability.
Refer to the graphic below to see the various phases in the procedure.
- Collect Data: Data is gathered from numerous sources and kept so that it may be cleansed and processed. This phase removes any missing values and outliers.
- Analyze Data: Once the data is prepared, the following step is to analyze it. A model is run frequently to improve. The mode is then verified to ensure that it fulfills the necessary business criteria.
- Create Reports: Finally, the model is implemented, and the generated reports are distributed to the stakeholders.
What is the difference between Data Mining and Data Profiling?
Data Mining is the process of analyzing data to identify previously unknown relationships. It primarily focuses on the detection of anomalous records, dependencies, and cluster analysis.
Data Profiling is the process of examining specific aspects of data. It primarily focuses on giving useful information about data properties such as data type, frequency, etc.
What is data cleansing and what are the best ways to practice data cleansing?
Data cleansing, wrangling, or cleaning. Everything means the same thing. It is the process of discovering and correcting faults in order to improve the quality of data. Refer to the figure below to learn about the various approaches of dealing with missing data.
What are the important steps in the data validation process?
As the name suggests, Data Validation is the process of verifying data. This stage involves two primary processes. These are data screening and verification.
- Data Screening: Various algorithms are employed in this stage to screen the full data and identify any incorrect numbers.
- Data Verification: Each suspicious value is reviewed against several use-cases before a final judgment is made on whether it should be included in the data or not.
What do you think are the criteria to say whether a developed data model is good or not?
The answer to this question may differ from person to person. However, following are a few factors that I believe must be evaluated to determine whether a produced data model is good or not:
- The model constructed for the dataset should exhibit predictable performance. This is essential to forecast the future.
- A model is regarded to be good if it is easily adaptable to modifications based on company needs.
- If the data changes, the model should be able to scale with it.
- The generated model should also be easy for clients to consume in order to provide actionable and lucrative results.
When do you think you should retrain a model? Is it dependent on the data?
Business data changes on a daily basis, but the format remains unchanged. When entering a new market, facing more competition, or experiencing a shift in position, it’s essential to retrain the business model. So, when company dynamics change, it is advised that the model be retrained to reflect changing client behaviors.
Can you mention a few problems that data analyst usually encounter while performing the analysis?
The following are some of the most common issues faced during data analysis.
- Duplicate entries and spelling errors diminish data quality.
- Extracting data from a bad source may require significant cleaning effort.
- When data is extracted from sources, its representation may differ. Now, when you mix data from different sources, the change in representation may cause a delay.
- Finally, missing data may provide an issue while performing data analysis.
What is the KNN imputation method?
This approach is used to impute missing attribute values using attribute values that are the most comparable to the missing attribute.
Distance functions are used to determine how similar the two qualities are.
Mention the name of the framework developed by Apache for processing large dataset for an application in a distributed computing environment?
The complete Hadoop Ecosystem was developed for processing large dataset for an application in a distributed computing environment. The Hadoop Ecosystem
consists of the following Hadoop components.
- HDFS -> Hadoop Distributed File System
- YARN -> Yet Another Resource Negotiator
- MapReduce -> Data processing using programming
- Spark -> In-memory Data Processing
- PIG, HIVE-> Data Processing Services using Query (SQL-like)
- HBase -> NoSQL Database
- Mahout, Spark MLlib -> Machine Learning
- Apache Drill -> SQL on Hadoop
- Zookeeper -> Managing Cluster
- Oozie -> Job Scheduling
- Flume, Sqoop -> Data Ingesting Services
- Solr & Lucene -> Searching & Indexing
- Ambari -> Provision, Monitor and Maintain cluster
Now we go on to the next set of questions, the Excel Interview Questions.
Data Analyst Interview Questions:
Excel
Microsoft Excel is one of the most easy and powerful software tools accessible today. It allows users to do quantitative and statistical analysis using an easy interface for data manipulation, therefore its applications cover several disciplines and professional needs. This is an essential field that provides a foundation for becoming a Data Analyst. Let’s quickly explore the questions raised about this issue.
Can you tell what is a waterfall chart and when do we use it?
The waterfall chart displays both positive and negative numbers that contribute to the ultimate outcome value. For instance, if examining a company’s net income, include all expense numbers in the chart. This type of chart visualizes the relationship between revenue and net income after deducting costs.
How can you highlight cells with negative values in Excel?
Excel’s conditional formatting allows you to highlight cells with negative values. Here are the steps you can take:
- Highlight the cells with negative values.
- Go to the Home tab and choose Conditional Formatting.
- Navigate to Highlight Cell Rules and select the Less Than option.
- In the Less Than dialog box, set the value to 0.
How can you clear all the formatting without actually removing the cell contents?
You may wish to delete all of the formatting and only keep the basic/simple facts. To accomplish this, use the ‘Clear Formats’ choices on the Home Tab. Clicking on the ‘Clear’ drop-down menu reveals the choice.
What is a Pivot Table, and what are the different sections of a Pivot Table?
A Pivot Table is a basic tool in Microsoft Excel that lets you easily summarize large datasets. just is quite simple to use, since just needs dragging and dropping row/column headings to generate reports.
A pivot table consists of four distinct sections:
The Values Area is where values are reported.
Rows Area: The headings located to the left of the values.
Column Area: The titles at the top of the values area define the columns area.
Filter Area: This is an optional filter for drilling down in the data collection.
Can you make a Pivot Table from multiple tables?
Yes, we can combine numerous Pivot Tables when there is a relationship between them.
How can we select all blank cells in Excel?
If you want to select all blank cells in Excel, utilize the Go To Special Dialog Box. Here are the steps you may take to select all of the blank cells in Excel.
- Select the complete dataset and press F5. This will open the Go To dialog box.
- Select the ‘Special‘ button to open the Go To special dialog box.
- After that, pick Blanks and click OK.
The final step involves selecting all of the blank cells in your dataset.
What are the most common questions you should ask a client before creating a dashboard?
The answer to this question varies case by instance. However, here are some frequent questions to ask while constructing a dashboard in Excel.
- Purpose of the Dashboards
- Multiple data sources
- Use of the Excel Dashboard
- The frequency at which the dashboard should be refreshed.
- The version of Office that the client uses.
What is a Print Area and how can you set it in Excel?
In Excel, a Print Area is a collection of cells that are set to print whenever the worksheet is printed. For example, if you just want to print the first 20 rows from the full worksheet, you may choose the first 20 rows as the Print Area.
Now, to configure the Print Area in Excel, follow the steps below:
- Select the cells to set the Print Area.
- Then, select the Page Layout tab.
- Select Print Area.
- Select Set Print Area.
What steps can you take to handle slow Excel workbooks?
There are several strategies to deal with sluggish Excel spreadsheets. However, there are a couple methods you may manage workbooks.
- Try manual calculation mode.
- Keep all referenced data in a single spreadsheet.
- Frequently utilize Excel tables and named ranges.
- Use helper columns rather than array formulae.
- Avoid utilizing complete rows or columns in references.
- Convert all unnecessary formulas into values.
Can you sort multiple columns at one time?
Multiple sorting is the process of sorting one column and then sorting another column while keeping the original column intact. In Excel, you may sort several columns at the same time.
To perform multiple sorting, utilize the Sort Dialog Box. To accomplish this, choose the data you wish to sort and then click on the Data Tab. After that, click the Sort icon.
You can provide the information for one column in this dialog box, then sort to another column by clicking the Add Level button.
Moving on to the following series of questions, which are connected to statistics.
Data Analyst Interview Questions: Statistics
Statistics is a discipline of mathematics that involves data gathering, organization, analysis, interpretation, and presentation. Statistics fall into two categories: Differential and Inferential. This discipline is connected to mathematics and provides a strong foundation for a career in data analysis.
What do you understand by the term Normal Distribution?
This is one of the most significant and extensively used distributions in statistics. Normal distributions, often known as the Bell Curve or Gaussian curve, indicate the range of values in terms of mean and standard deviation. Refer to the image below.
The figure above shows that data is often spread evenly around a central value, with no bias to either side. The random variables are distributed in a symmetrical bell-shaped curve.
What is A/B Testing?
A/B testing is a statistical hypothesis test for a randomized experiment involving two variables, A and B. Split testing uses sample statistics to estimate population parameters. This test compares two web sites by displaying two variations, A and B, to a similar number of visitors, and the one with the highest conversion rate wins.
The purpose of A/B testing is to see whether there are any modifications to the website. For example, consider a banner ad on which you have spent a significant amount of money. Determine the return on investment (ROI) by analyzing the click rate of banner ads.
What is the statistical power of sensitivity?
The statistical power of sensitivity is used to assess a classifier’s accuracy.
The classifier might be Logistic Regression, Support Vector Machine, or Random Forest.
To quantify sensitivity, consider the ratio of predicted true events to total events. True occurrences refer to occurrences that were both true and predicted by the model.
What is the Alternative Hypothesis?
To comprehend the Alternative Hypothesis, first understand what the null hypothesis is. The null hypothesis is a statistical phenomenon used to test for possible rejection on the premise that the outcome of chance would be true.
Following this, you might conclude that the alternative hypothesis is a statistical phenomena that contradicts the Null Hypothesis. Typically, it is assumed that the observations are the consequence of an effect with some degree of variability.
What is the difference between univariate, bivariate and multivariate analysis?
The distinctions among univariate, bivariate, and multivariate analyses are as follows:
Univariate : A descriptive statistical approach that differs depending on the number of variables involved at a specific point in time.
Bivariate : analysis is used to determine the difference between two variables at a time.
Multivariate : analysis is the study of many variables. This analysis is used to investigate the impact of factors on replies.
Can you tell me what are Eigenvectors and Eigenvalues?
Eigenvectors : They are mostly used to comprehend linear transformations. These are calculated for correlation and covariance matrices.
Eigenvectors are the directions in which a certain linear transformation works, such as flipping, compressing, or stretching.
Eigenvalue : Eigenvalues are the transformation’s strength or the component that causes compression in the direction of the eigenvector.
What is the difference between 1-Sample T-test, and 2-Sample T-test?
To answer this question, first explain what T-tests are. See below for an explanation of the T-test.
T-tests are hypothesis tests that compare means. Each test on sample data results in a single value, known as the T-value. Please see below for the formula.
Because this formula is in ratio format, you may describe it using the signal-to-noise ratio analogy.
The numerator would be a signal, whereas the denominator would be noise.
So, to compute the 1-Sample T-test, remove the null hypothesis value from the sample means. If your sample mean is 7 and the null hypothesis value is 2, the signal will be 5.
The difference between the sample mean and the null hypothesis is proportional to the signal intensity.
Now, look at the denominator, which is the noise in our example and is a measure of variability known as the standard error of the mean. This reflects your sample’s ability to reliably predict the population or dataset mean.
Noise has an indirect relationship with sample accuracy.
The T-Test 1 may now be calculated using the signal-to-noise ratio. This demonstrates how well your signal stands out from noise.
To calculate the 2-Sample Test, determine the ratio of the difference between the two samples to the null hypothesis.
To recap, the 1-Sample T-test compares a sample set to a mean, whereas the 2-Sample T-test assesses whether a mean difference between two sample sets is statistically significant for the total population or due to chance.
What are different types of Hypothesis Testing?
The many forms of hypothesis testing are listed below:
T-test: Used for unknown standard deviations and small sample sizes.
The Chi-Square Test for Independence assesses the connection between categorical variables in a population sample.
Analysis of Variance (ANOVA) is a hypothesis testing method that compares mean values across groups. This test is commonly used in the same way as a T-test, but for more than two groups.
The Welch’s T-test is used to determine if two population samples have equal means.
How to represent a Bayesian Network in the form of Markov Random Fields (MRF)?
Consider the following examples for representing a Bayesian Network as Markov Random Fields:
Consider two variables connected by an edge in a Bayesian network. We can then generate a probability distribution that factorizes into a probability of A and then a probability of B. In contrast, if we consider the same network in a Markov Random Field, it will be represented as a single potential function.
So that was a basic example to begin with. Now, consider a difficult situation where one variable is a parent of two others. A is the parent variable, pointing down to B and C. In this scenario, the probability distribution would include the likelihood of A, as well as the conditional probabilities of B and C given A. To transform this into a Markov Random Field, factorize the similarly organized network using possible functions for the A/B and A/C edges. Refer to the image below.
What is the difference between variance and covariance?
Statisticians regularly utilize the mathematical words variance and covariance. Variance is the difference between numbers and their mean. Covariance describes how two random variables change together. This is mostly used to determine the correlation between variables.
Interview Questions: SAS
SAS Institute’s Statistical Analysis System (SAS) is the most common data analytics tool available. SAS can analyze complicated data and provide insights to improve decision-making and anticipate future events. This allows for data mining, manipulation, retrieval, and analysis across several sources.
What is interleaving in SAS?
Interleaving in SAS refers to the combination of individual sorted SAS data sets into a single sorted data set. You may interleave data sets by utilizing a SET statement in conjunction with a BY statement.
In the example below, the data sets are sorted according to the variable Age.
We may sort and then connect the data sets on age by using the following query.
data combined;
set Data1, Data2;
by Age;
run;
What is the basic syntax style of writing code in SAS?
The fundamental syntax for developing code in SAS is as follows:
1. Create the DATA statement, which effectively names the dataset.
2. Use the INPUT statement to name the variables in the data collection.
3. Each sentence should conclude with a semicolon.
4. There should be an appropriate spacing between words and statements.
What is the difference between the Do Index, Do While and the Do Until loop? Give examples.
To answer this question, first define what a Do loop is. A Do loop executes a block of code repeatedly based on a specified condition. The graphic below illustrates the process of the Do loop.
Do Index loop: The index variable serves as the start and finish value for the Do Index loop. SAS commands are performed repeatedly until the index variable reaches its ultimate value.
The Do While Loop employs a WHILE condition. This loop executes the block of code while the condition is true and continues to execute it until the condition is false. When the condition becomes false, the loop is ended.
The Do Until loop has an Until condition. This loop executes the block of code while the condition is false and continues to execute it until the condition is true. The loop ends when the condition is met.
If you have to explain using code, let’s suppose we want to compute the SUM and the number of variables.
For the loops, write the code as follows:
Do Index
DATA ExampleLoop; SUM=0; Do VAR = 1 = 10; SUM = SUM + VAR; END; PROC PRINT DATA = ExampleLoop; Run;
The output would be:
Obs SUM VAR 1 55 11
Do While
DATA ExampleLoop;
SUM = 0;
VAR = 1;
Do While(VAR<15);
SUM = SUM + VAR;
VAR+1;
END;
PROC PRINT DATA = ExampleLoop;
Run;
output:
Obs SUM VAR
1 105 15
Do Until
DATA ExampleLoop;
SUM = 0;
VAR = 1;
Do Until(VAR>15);
SUM=SUM+VAR;
VAR+1;
END;
PROC PRINT;
Run;
output:
Obs SUM VAR
1 120 16
What is the ANYDIGIT function in SAS?
The ANYDIGIT function searches for a character string. Once the string is located, it returns the requested string.
Can you tell the difference between VAR X1 – X3 and VAR X1 — X3?
Adding a sing dash between variables indicates they are sequentially numbered. Similarly, specifying a double dash between variables indicates all possible variables in the dataset.
For Example:
Consider the following data set:
Data Set: ID NAME X1 X2 Y1 X3
Then, X1 – X3 would return X1 X2 X3
and X1 — X3 would return X1 X2 Y1 X3
What is the purpose of trailing @ and @@? How do you use them?
The trailing @ is frequently referred to as the column pointer. Using the trailing @ in the Input statement allows you to read a portion of the raw data line, test it, and then determine how to read further data from the same record.
- The single trailing @ instructs the SAS system to “hold the line”.
- Double trailing @@ instructs the SAS system to “hold the line more strongly”.
When an Input statement ends with @@, the program releases the current raw data line when no further values can be received from it. The @@ retains the input record over several data iterations.
What would be the result of the following SAS function (given that 31 Dec 2017 is Saturday)?
Weeks = intck (‘week’,’31 dec 2017’d,’01jan2018’d);
Years = intck (‘year’,’31 dec 2017’d,’01jan2018’d);
Months = intck (‘month’,’31 dec 2017’d,’01jan2018’d);
Here, we will compute the weeks between December 31st, 2017 and January 1st, 2018. December 31, 2017 was a Saturday. So January 1st, 2018 will be a Sunday in the coming week.
- Weeks = 1 as both days are in separate weeks.
- Years = 1 since both days are in separate calendar years.
- Months = 1 since both days fall in separate months of the calendar.
How does PROC SQL work?
PROC SQL is just a simultaneous procedure for all observations. The following stages occur when a PROC SQL is executed:
- SAS examines each statement in the SQL process and looks for syntax issues.
- The SQL optimizer reads the query within the statement. So, the SQL optimizer essentially determines how the SQL query should be processed in order to reduce runtime.
- If there are any tables in the FROM statement, they are loaded into the data engine and may be accessed from memory.
- Codes and calculations are executed.
- The Final Table is built in memory.
- The final table is delivered to the output table specified in the SQL query.
If you are given an unsorted data set, how will you read the last observation to a new dataset?
The end = dataset option allows us to read the last observation into a new dataset.
For example:
data example.newdataset;
set example.olddataset end=last;
If last;
run;
The terms “newdataset” and “olddataset” refer to the newly formed and existing data sets, respectively. The temporary variable, initially to 0, is set to 1 when the set statement reads the last observation.
What are the differences between the sum function and using “+” operator?
The SUM function sums non-missing inputs, but the “+” operator returns a missing result if any arguments are missing. Consider the following example.
Example:
data exampledata1;
input a b c;
cards;
44 4 4
34 3 4
34 3 4
. 1 2
24 . 4
44 4 .
25 3 1
;
run;
data exampledata2;
set exampledata1;
x = sum(a,b,c);
y=a+b+c;
run;
We used the “+” operator to determine the value of y for the fourth, fifth, and sixth observations, resulting in missing values in the output.
x y
52 52
41 41
41 41
3 .
28 .
48 .
29 29
SQL Interview Questions:
RDBMS is one of the most widely used databases nowadays, hence SQL skills are required in the majority of employment categories, including Data Analyst. Knowing Structured Query Language sets you on the route to becoming a data analyst, since it will be evident in your interviews that you understand databases.
What is the default port for SQL?
The default TCP port assigned by the Internet Number Authority (IANA) for SQL servers is 1433.
What do you mean by DBMS? What are its different types?
A Database Management System (DBMS) is software that captures and analyzes data from users, applications, and databases. Databases hold data of many types, including strings, numbers, and pictures, which may be edited, accessed, and destroyed.
There are four types of DBMS: hierarchical, relational, network, and object-oriented.
- Hierarchical DBMS: As the name implies, this form of DBMS follows a predecessor-successor connection. So, it has a tree-like structure, with nodes representing records and branches representing fields.
- Relational database management systems (RDBMS) employ a structure that allows users to identify and access data in relation to other data in the database.
- Network DBMS: This form of DBMS enables many-to-many relationships, allowing numerous member records to be linked.
- Object-oriented database management systems (DBMS) employ little pieces of software known as objects. Each object includes a piece of data as well as instructions about what to do with it.
What is ACID property in a database?
ACID stands for Atomicity, Consistency, Isolation, and Durability. This feature is utilized in databases to provide reliable transaction processing. For definitions of these words, please refer to the list below.
- Atomicity: refers to transactions that are either totally successful or unsuccessful. Here, a transaction refers to a single operation. So, if one transaction fails, the entire transaction fails, and the database state remains unaltered.
- Consistency: This feature ensures that the data meets all validation standards. So, this ensures that the transaction never exits the database without finishing its state.
- Isolation: Isolation separates transactions until they are completed. So, effectively, each transaction is autonomous.
- Durability: ensures that your committed transaction is never lost. So, this ensures that the database will maintain track of pending updates in such a way that even if there is a power outage, crash, or other issue, the server will be able to recover from an anomalous termination.
What is Normalization? Explain different types of Normalization with advantages.
Normalization is the process of structuring data to reduce duplicates and redundancies. There are several stages of normalization. These are considered normal forms. Each successive normal form builds on the preceding one.
The first three typical forms are generally sufficient.
- First Normal Form (1NF): No repeated groupings inside rows.
- Second Normal Form (2NF): Every non-key (supporting) column value is reliant on the whole main key.
- The Third Normal Form (3NF): is exclusively dependent on the main key and has no additional non-key (supporting) column values.
- Boyce-Codd Normal Form (BCNF): is an improved variant of 3NF. A table is considered to be in BCNF if it is 3NF and relation X is the table’s super key for all X ->Y relationships.
Some of the advantages are:
- Better Database organization
- More Tables with smaller rows
- Efficient data access
- Greater Flexibility for Queries
- Quickly find the information
- Easier to implement Security
- Allows easy modification
- Reduction of redundant and duplicate data
- More Compact Database
- Ensure Consistent data after modification
What are the different types of Joins?
There are four types of joins used to extract data between tables: inner join, left join, right join, and full outer join. Please refer to the image on the right side.
- Inner join: In MySQL, the most frequent form of join is an inner join. It returns all rows from various tables that satisfy the join requirement.
- Left Join: In MySQL, the Left Join function returns all rows from the left table but only the matching rows from the right table when the join condition is met.
- Right Join: In MySQL, the Right Join function returns all rows from the right table but only the matching rows from the left table when the join condition is met.
- Full join: When one of the tables matches another, the full join returns all of the records. As a result, it retrieves all of the rows from both the left and right side tables.
Suppose you have a table of employee details consisting of columns names (employeeId, employeeName), and you want to fetch alternate records from a table. How do you think you can perform this task?
You can retrieve alternate tuples by referencing the tuple’s row number. To display the employeeId for even records, use the mod function with the following query:
Select employeeId from (Select rownumber, employeeId from employee) where
mod(rownumber,2)=0
The table name is ’employee’.
To display the employeeId of odd entries, execute the following query.
Select employeeId from (Select rownumber, employeeId from employee) where
mod(rownumber ,2)=1
Consider the following two tables.
Now, create a query to return a list of customers who took the course more than once on the same day. Customers should be categorized by customer, and the list should be sorted by the most current date.
SELECT
c.Customer_Id,
CustomerName,
Course_Id,
Course_Date,
count(Customer_Course_Id) AS count
FROM customers c JOIN course_details d ON d.Customer_Id = c.Customer_Id
GROUP BY c.Customer_Id,
CustomerName,
Course_Id,
Course_Date
HAVING count( Customer_Course_Id ) > 1
ORDER BY Course_Date DESC;
Consider the below Employee_Details table. Here the table has various features such as Employee_Id, EmployeeName, Age, Gender, and Shift. The Shift has m = Morning Shift and e = Evening Shift. Now, you have to swap the ‘m’ and the ‘e’ values and vice versa, with a single update query.
You can write the below query:
UPDATE Employee_Details SET Shift = CASE Shift WHEN 'm' THEN 'e' ELSE 'm' END;
Write a SQL query to get the third highest salary of an employee from Employee_Details table as illustrated below.
SELECT TOP 1 Salary
FROM(
SELECT TOP 3 Salary
FROM Employee_Details
ORDER BY salary DESC) AS emp
ORDER BY salary ASC;
What is the difference between NVL and NVL2 functions in SQL?
NVL(exp1, exp2) and NVL2(exp1, exp2, exp3) are functions that determine if the value of exp1 is null or not.
If we use the NVL(exp1,exp2) function, we will get the value of exp1 if it is not null, and the value of exp2 otherwise. But exp2 must have the same data type as exp1.
Similarly, if we use the NVL2(exp1, exp2, exp3) function, if exp1 is not null, exp2 will be returned; otherwise, the value of exp3 will be returned.
Tableau Interview Questions:
Tableau is a business intelligence program that connects users to relevant data. It visualizes and generates interactive, shared dashboards.
Learning Tableau will improve your understanding of data analysis and visualization.
What are the differences between Tableau and Power BI?
What is a dual axis?
Tableau provides the phenomena known as Dual Axis. This allows users to see two scales of two measurements in the same graph. Indeed.com, for example, uses a dual axis to display the comparison of two measurements as well as their growth over time. Dual axes allow you to compare numerous metrics at once by stacking two independent axes on top of one another.
Refer to the image below to see how it appears.
What is the difference between joining and blending in Tableau?
The phrase “joining” refers to the process of integrating data from the same source, such as an Excel worksheet or Oracle database tables.
Blending requires two clearly stated data sources in your report.
How to create a calculated field in Tableau?
To create a calculated field in Tableau, follow these steps:
- To launch the calculation editor, pick “Create > Calculated Field” from the drop down menu next to Dimensions in the Data pane.
- Name the new field and build a formula.
Look at the photo below:
How to view underlying SQL Queries in Tableau?
To inspect the underlying SQL queries in Tableau, we mostly have two options:
- Use the Performance Recording Feature: Create a Performance Recording to document key interactions with the workbook. Tableau’s worksheet allows users to monitor performance metrics.
Help -> Settings & Performance -> Start Performance Recording.
Help -> Settings & Performance -> Stop Performance Recording. - Reviewing Tableau Desktop Logs: You may examine the Tableau Desktop Logs at C:UsersMy DocumentsMy Tableau Repository. Check the log.txt and tabprotosrv.txt files to see whether there is a live connection to the data source. Refer to the tdeserver.txt file for an extract.
Design a view in a map such that if a user selects any country, the states under that country has to show profit and sales.
According to your question, you must have a country, state, profit and sales fields in your dataset.
- Double-click on the country field.
- Drag the state and drop it into Marks card.
- Drag the sales and drop it into size.
- Drag profit and drop it into color.
- Click on size legend and increase the size.
- Right-click on the country field and select show quick filter.
- Select any country now and check the view.
What is the difference between heat map and tree map?
A heat map is used to compare categories by color and size. Heat maps allow you to compare many measures. A treemap visualizes data similarly to a heat map. Additionally, it may depict hierarchical data and part-to-whole connections.
What is aggregation and disaggregation of data?
Aggregation of data: is the process of viewing numerical values or measures at a higher and more summary level of data. When you set a measure on a shelf, Tableau automatically aggregates the data. Simply looking at the function will reveal whether or not the aggregate was applied to a field. When a field is placed on a shelf, the function always appears before its name.
Example: After aggregation, the Sales field will become SUM(Sales).
Tableau only allows you to aggregate measurements from relational data sources. Multidimensional data sources provide only aggregated data. Tableau only supports multidimensional data sources on Windows.
Data disaggregation: provides access to all rows of a source, making it valuable for analysis.
Example: Consider evaluating the findings of a product satisfaction survey. The age of the participants is plotted along one axis. To discover the average age of participants, aggregate the Age variable. Alternatively, disaggregate the data to identify the age at which participants were most happy with their product.
Can you tell how to create stories in Tableau?
Stories are used to describe a series of events or to create a commercial use case. The Tableau Dashboard offers several choices for creating a narrative. Each narrative point can be based on a separate view or dashboard, or the entire tale can be based on the same visualization, but at different phases, with different markings filtered and comments added.
To build a narrative in Tableau, please follow the steps below:
- Click the New Story tab.
- In the lower-left corner of the screen, choose a size for your story. Choose from one of the predefined sizes, or set a custom size, in pixels.
- By default, your story gets its title from its sheet name. To edit it, double-click the title. You can also change your title’s font, color, and alignment. Click Apply to view your changes.
- To start building your story, drag a sheet from the Story tab on the left and drop it into the center of the view.
- Click Add a caption to summarize the story point.
- To highlight a key takeaway for your viewers, drag a text object over to the story worksheet and type your comment.
- To further highlight the main idea of this story point, you can change a filter or sort on a field in the view, then save your changes by clicking Update above the navigator box.
Can you tell how to embed views onto Web pages?
Interactive Tableau views and dashboards may be embedded in web pages, blogs, wikis, online apps, and intranet portals. Embedded views are updated as the underlying data changes or when their workbooks are updated on Tableau Server.
The license and permission constraints for embedded views are the same as those for Tableau Server. To see a Tableau view integrated in a web page, the user must have a Tableau Server account.
A Guest account is also accessible if your firm utilizes Tableau Server with a core-based subscription. Users in your business may see and interact with Tableau views embedded in web pages without logging in to the server. Contact your server or site administrator to see if the Guest user is enabled for the site you’re publishing to.
To embed views and alter their default look, follow these steps:
- Get the embed code provided with the view: The Share button at the top of each view contains embedded code that may be copied and pasted into your webpage. (Changing the showShareOptions option to false in the code disables the Share button in embedded views.)
- Customize the embed code: with options for the toolbar, tabs, and more. For further details, see Embed Code Parameters.
- Use Tableau’s JavaScript API: online developers may include Tableau JavaScript objects into online applications. The Tableau Developer Portal provides access to the API, documentation, code samples, and the Tableau developer community.