Top AWS Redshift Interview Questions and Answers

Advertisements

Ques. 1): What is Redshift in AWS?

Answer: Redshift in AWS is a big data warehouse product that is fast and powerful enough to manage data warehouse service in the cloud. Redshift is a petabyte-scale data repository service. The execution is cost effective and simple to investigate efficiently all over the data. It employs the present intelligence marketing devices which accomplish at high speed and are completely controllable.

Ques: 2): What is Amazon Redshift?

Answer: Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. This enables you to use your data to acquire new insights for your business and customers. It is a data warehouse product which forms part of the larger cloud-computing platform Amazon Web Services. The name means to shift away from Oracle, red being an allusion to Oracle, whose corporate color is red and is informally referred to as “Big Red.” It is built on top of technology from the massive parallel processing (MPP) data warehouse company ParAccel (later acquired by Actian), to handle large scale data sets and database migrations.

Ques. 3): How will you load data to Amazon Redshift from different data sources such as Amazon EC2, DynamoDB, and Amazon RDS?

Answer: There are two ways of loading data to Amazon Redshift from different data sources, namely:

1.    Using the AWS Data Pipeline – Offers high performance, fault-tolerant, and reliable way of loading data from a range of AWS data sources. It allows specifying the data source, required data transformations, and then execute a pre-written import script for loading data

2.    Using the COPY command – Load data in parallel directly from Amazon DynamoDB, Amazon EMR, or any other SSH-enabled host

Ques. 4): What are the benefits of using AWS Redshift?

Answer:

•      We can run multiple queries on multiple nodes.

•      We can use Postgresql, ODBC and JDBC.

•      Automated backup

•      Built-in security.

•      When applications requires analytical function.

•      Cost effective compared to traditional data warehousing technique.

Ques. 5): What are the important Features of Redshift?

Answer:

•        Operations : Similar to RDS

•        Security : IAM,KMS ,VPC,SSL(similar to RDS)

•        Redshift provides 10times more performance compare to other warehouse services.

•        Redshift is highly available and have auto healing feature.

•        Redshift provides pay per node provisioned, 1/10th of the cost compare to other data warehouse services.

Ques. 6): Is Redshift similar to RDS?

Answer: Redshift is a heavily version of PostgreSQL, it’s not used for OLTP.OLTP remember is online transaction processing.So Redshift is not a replacement for RDS.Redshift is OLAP, OLAP stands for online analytical processing.That means that Redshift is used for analytics and data warehousing.

Ques. 7): How to show tables in Amazon Redshift?

Answer: Below is the command to list tables in a public schema:

SELECT DISTINCT tablename

FROM pg_table_def

WHERE schemaname = ‘public’

ORDER BY tablename;

Below is the command to describe the columns from a table called table_data

SELECT *

FROM pg_table_def

WHERE tablename = ‘table_data’

AND schemaname = ‘public’;

Ques. 8): What is MPP. Is Redshift support MPP?

Answer: MPP stands for massively parallel query execution .It’s highly distributed, when you run a query, it’s going to run it in parallel across so many instances and so many cores .And as such it’s called a massively parallel query execution which in turns makes the database highly available.

Ques. 9): How many types of nodes supported by Redshift and what are the functions of nodes?

Answer: Redshift supports 2 nodes -leader node and compute node. There is a leader node and the leader node is used to planning the queries and aggregate results across all compute nodes. So the compute nodes are going to actually be performing the queries and they will send the results back to the leader. If you have one node, then that node is both a leader node and a compute node.

Ques. 10): What is  Redshift Enhanced VPC Routing?

Answer: If you enable Redshift Enhanced VPC Routing feature , all the COPY of data from whatever storage you want into Redshift, or UNLOAD from Redshift back to S3 , goes through VPC which gives you enhanced security and maybe better performance as well as your data doesn’t go over the public internet.

Ques. 11): How will I be charged and billed if I use Amazon Redshift?

Answer: You pay only for what you use, and there are no minimum or setup fees. Billing commences for a data warehouse cluster as soon as the data warehouse cluster is available. Billing continues until the data warehouse cluster terminates, which would occur upon deletion or in the event of instance failure. You are billed based on:

•            Compute Node Hours

•            Backup Storage

•            Data Transfer

•            Data Scanned

Ques. 12): How far Redshift is better in performance as compare to other data warehouse technologies?

Answer: Redshift provides ten times better performance than other data warehouse technologies and Redshift is meant to scale 2 petabytes of data. So petabytes, 1 petabyte is 1000 terabytes, means a lot of data.

Ques. 13): Amazon’s Redshift uses which block size for its columnar storage?

Answer: Redshift uses the below block sizes for its columnar storage:

•      2KB

•      8KB

•      16KB

•      32KB

•      1024KB

Ques. 14): Is Redshift a row-based storage or columnar based?

Answer: Redshift supports columnar data storage (instead of row based) which makes it good for analytical processing not for transactional processing.

PostgreSQL, RDS, MySQL supports row-based storage of data.

Ques. 15): How do we load data into Redshift?

Answer: Data is loaded from S3, DynamoDB, DMS and Read Replicas in RDS for example, when you have a RBS database but you want to do analytics on it to create a read replica, to pull that data from the read replica into Redshift and to do the analytics into Redshift.

Ques. 16): What are the ideal cases for using the Classic Load Balancer and the Application Load Balancer?

Answer: The Classic Load Balancer is the befitting option for simple load balancing of traffic across several EC2 instances.

On the contrary, the Application Load Balancer is suitable for container-based or microservices architecture where there is either a requirement for routing traffic to different services or carrying out load balancing across multiple ports on the same EC2 instance.

Ques. 17): How is Amazon RDS, DynamoDB and Redshift different?

Answer: Amazon RDS is a database management service for relational databases,  it manages patching, upgrading, backing up of data etc. of databases for you without your intervention. RDS  is a Db management service for structured data only.

DynamoDB, on the other hand, is a NoSQL database service, NoSQL deals with unstructured data.

Redshift is an entirely different service, it is a data warehouse product and is used in data analysis.

Ques. 18): What data formats does Redshift Spectrum support?

Answer: Redshift Spectrum currently supports for Avro, CSV, Grok, Ion, JSON, ORC, Parquet, RCFile, RegexSerDe, SequenceFile and Tex.

Ques. 19): How we can monitor the performance of Redshift data warehouse cluster.

Answer: Performance metric like compute and storage utilization, read/write traffic can be monitored via AWS Management Console or using CloudWatch.

Ques. 20): Does Amazon redshift based on concept cluster?

Answer: Amazon Redshift uses nodes; group of nodes are called cluster. Single cluster runs an Amazon Redshift and it has one or multiple number of databases.