Data Analyst Quiz

31. What does the SUM function in SQL do?

Counts the number of records
Returns the total sum of a numeric column
Returns the highest value in a column
Sorts the records

32. In a box plot, what does the box represent?

The mean value of the dataset
The interquartile range (IQR)
The maximum value
The outliers

33. What is the main use of the GROUP BY clause in SQL?

To filter rows
To join tables
To aggregate data based on columns
To sort records

34. Which of the following is a Python library used for machine learning?

NumPy
Pandas
Scikit-learn
Seaborn

35. What does the WHERE clause in SQL do?

Joins two tables
Filters records based on a condition
Groups data
Aggregates data

36. In Python, what does the DataFrame object represent in the Pandas library?

A 2D labeled data structure
A 3D data array
A list of data
A tool for machine learning

37. What is the purpose of the HAVING clause in SQL?

To filter rows before aggregation
To filter rows after aggregation
To join multiple tables
To delete records

38. What is the primary purpose of the Python library Pandas?

Visualization
Numerical computations
Data manipulation and analysis
Machine learning

39. Which SQL command is used to retrieve data from a database?

INSERT
SELECT
DELETE
UPDATE

40. What is the difference between a LEFT JOIN and an INNER JOIN in SQL?

LEFT JOIN returns only matching rows
LEFT JOIN returns all rows from the left table, and INNER JOIN returns only matches
INNER JOIN returns unmatched rows
LEFT JOIN removes duplicates

41. In Python, which function is used to calculate the mean of a list of numbers?

mean()
sum()
avg()
max()

42. What does the term "data lake" refer to?

A tool for data visualization
A centralized repository for raw, unstructured data
A method for data cleansing
A type of relational database

43. What does the SQL DISTINCT keyword do?

Returns all rows
Removes duplicates from the result set
Sorts the data
Joins multiple tables

44. What is the median in a dataset?

The sum of all values divided by the number of values
The most frequent value in a dataset
The middle value when the dataset is ordered
The highest value in the dataset

45. What is the function of a heatmap in data visualization?

To display data using color to represent values
To create 3D charts

46. What does a scatter plot represent in data visualization?

The distribution of categories
The relationship between two continuous variables
A comparison of averages
The distribution of time

47. What is a histogram used for in data analysis?

To represent the frequency distribution of a dataset
To compare categories
To show relationships between two variables
To display time series data

48. What is the Pearson correlation coefficient used for?

To measure the distribution of data
To measure the variance
To measure the linear correlation between two variables
To group similar data

49. In machine learning, what is overfitting?

A model that performs well on new data
A model that is too complex and performs well on training data but poorly on unseen data
A model with too few features
A model that underestimates the variance

50. What is the purpose of feature scaling in machine learning?

To reduce the number of features
To normalize the range of independent variables
To increase model complexity
To split data into training and testing sets