Top BigQuery Interview Questions and Answers.

Advertisements

1. What kind of SQLs does Bigquery support?

BigQuery supports two kinds of SQLs they are:

  1. Standard SQL
  2. Legacy SQL

2. What problem does Google BigQuery solve?

Assume that we have huge datasets, and without the proper infrastructure and hardware, it is complicated to store the entire data. BigQuery came into existence; BigQuery is an enterprise data warehouse that allows robust SQL queries by using the Google infrastructure.

You have to move your data into BigQuery, and everything is solved. You will have the capability of controlling access to both the data and your project based on your business requirements, like giving others the ability to query or view your data.

3. Is BigQuery free?

BigQuery is not free, but the new customers will get the $300 to spend on Google Cloud for the first two months. And all the customers will be offered the storage of 10 GB and queries up to 1 TB per month that is entirely cost-free.

4. What is the use of BigQuery ML?

BigQuery ML(machine learning) is used to create and run the machine learning models within BigQuery with the help of SQL. One of the vital uses of BigQuery ML is that it accelerates the development speed without moving your data.

5. Why do we use CREATE MODEL statement?

CREATE MODEL statement is used to create a model within BigQuery. A query job is generated whenever you execute a standard SQL query that includes the “CREATE MODEL” statement.

Wish to become a high-paying Data warehouse expert?
Then check out our experts designed and deliverable Google BigQuery training course here.

6. What is clustering?

Clustering is the process of dividing/separating a massive amount of data into smaller groups of data. Clustering aims to divide groups with similar characteristics and distribute them into clusters.

7. Does BigQuery support indexes?

BigQuery does not use or support indexes because BigQuery performance or query costs depend upon the quantity of data scanned during a query.

8. What are the different ways of creating clustered tables?

We can create clustered tables in the following ways:

  • From query results
  • We can create a clustered table by using DDL create table including a cluster by clause.
  • Client libraries can be used to create clustered tables.

9. What are the permissions that are required to create a table?

Below are the necessary permissions that are required to create a table:

  • Bigquery.tables.create This is the permission that is required to create a table.
  • bigquery.tables.updateData This permission is required to write the data into the table using a copy job, query job, or load job.
  • Bigquery.jobs.create this permission is required to execute copy jobs, load jobs, or load jobs that write data into the table.

10. What are the different data types supported by BigQuery?

BigQuery supports the following data types:

  1. Integer data type
  2. ARRAY data type
  3. STRUCT data type

11. What is the syntax to declare an ARRAY type?

Angular brackets are used to declare the ARRAY types like this

ARRAY<x>

12. What are the rules for naming a table?

When we are creating a table in BigQuery, the name of the table must be different per dataset; two essential rules are to be considered while naming a table they are:

  1.  The table name must be only up to 1,024 characters.
  2. In every category, They must contain Unicode characters– L (letter), N (number), M (mark), Pd (dash), Pc (connector, including underscore), Zs (space).  

13. What does a BigQuery table contain?

A table in BigQuery contains individual records that are organized in rows. Every row is again classified into fields(also called columns). Every table in the BigQuery is defined by a schema that specifies column names, data types, and other related information.

14. Is the BigQuery table name case sensitive?

No, the BigQuery table name is not case-sensitive, so all the table names are lowercase by default.

15. Is there any way to deduplicate rows in a table?

There are many ways to deduplicate the rows in a table, but one of the easiest ways is to use the DISTINCT keyword.

Example:

Assume there is a table named course.testdata, this is how you can deduplicate the rows:

CREATE OR REPLACE TABLE `course.testdata`

PARTITION BY name

AS SELECT DISTINCT * FROM `course.testdata`;

16. Is there any way in Bigquery to select all columns except a few?

Yes, we can select all the columns except a few in BigQuery like this:

Select * except(title, comment) from course.samples.Wikipedia limit 15 

17. What is Dremel?

Dremel is a data analysis tool in which the data is organized in “columnar” format; it is quickly used to execute the queries of huge and structured datasets. Dremel is read-only, which means we cannot create or update functions, and also, it does not feature any table indexes.

18. How to undelete a BigQuery Table?

You can simply do it in three steps by using BigQuery CLI:

  1. Specify the old table name.
  2. Specify the wanted date in EPOCH time
  3. Finally, specify the new table name

Example:

bq cp my_project.old_dataset.old_table_name@1251990188213 my_project.new_dataset.new_table_name

19. What is “OMIT RECORD IF” in Google BigQuery?

OMIT RECORD IF is used to retrieve all the desired items from a table and eliminates the remaining items. To store the recovered items, you have to create a new table. 

Example:

SELECT * FROM

publicdata:example.coursedrill_nested

OMIT RECORD IF

COUNT(payload.pages.page_name) <= 50;

20. Why do we use CAST() in BigQuery?

CAST() is a function that is used to convert one data type into another in BigQuery, for example, if you want to convert a string into a timestamp, then you have to use the following syntax:

 SELECT

  CAST(‘2021-12-16 03:23:01-6:00’ AS TIMESTAMP) AS str_to_timestamp

21. How to TRUNCATE a table in Google BigQuery? 

You can truncate a table by using the below syntax

TRUNCATE TABLE [[project_name.]dataset_name.]table_name

However, this will not work out if a partition filter is needed within your table definition. Then you can use this alternative method:

CREATE OR REPLACE TABLE <dataset>.<table>

AS SELECT * FROM <dataset>.<table> LIMIT 0;

22. How to get a unique id for each row in a table?

#standardSQL

SELECT GENERATE_UUID() uuid

, * 

FROM table_name

Above syntax is used to get a unique id for each row in a table.

23. List the various ways to access the BigQuery cloud data warehouse?

There are five different ways to access the BigQuery Cloud Data Warehouse they are:

  1. ODBC Drivers
  2. JDBC Drivers
  3. Web User Interface
  4. Python Libraries
  5. BQ Command Line Client

24. What are the steps to follow to avoid/control query computation?

Follow the below steps to avoid/control query computation:

  • User-defined functions of javascript should be avoided.
  • Do not transform the data repeatedly through SQL queries.
  • Make use of approximate aggregation functions.
  • Join patterns must be optimized.
  • Cut the partitioned queries.

25. How many rows and columns can BigQuery have?

BigQuery can have 500 rows maximum, and there is a limit of 10,000 columns.

26. How much BigQuery can handle data?

BigQuery will work exceptionally with all data sizes, from a few rows of an excel spreadsheet to many petabytes of data. And the exciting part is it can run complicated queries within a few seconds.

27. Is there any way to reduce the size of a BigQuery query?

Yes, there are two simple ways to reduce the size of a query in BigQuery they are:

  1. Clustering
  2. Partitioning 

28. What is the use of standard SQL in BigQuery?

Standard SQL will allow us to access Bigquery tables directly from google sheets, Google drive, Google Bigtable, and Google Cloud Storage. With the standard SQL, there is no need to load the complete table in the BigQuery.

29. Is it possible to store unstructured data in the Bigquery?

Yes, we can store the raw/unstructured data in BigQuery with the help of the data lake, and then we can process the data as needed.

30. What is time travel?

Time travel allows us to access past data. For example, assume you have a table of data, and you deleted it accidentally, then you can use time travel to go 3-5 minutes back and get the deleted data.