Search Training
X

20775 Performing Data Engineering on Microsoft HDInsight

Course Overview

The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

Who Should Attend

The primary audience for this course is data engineers, data architects, data scientists, and data developers who plan to implement big data engineering workflows on HDInsight.

Course Objectives

After completing this course, students will be able to: -Deploy HDInsight Clusters. -Authorizing Users to Access Resources. -Loading Data into HDInsight. -Troubleshooting HDInsight. -Implement Batch Solutions. -Design Batch ETL Solutions for Big Data with Spark -Analyze Data with Spark SQL. -Analyze Data with Hive and Phoenix. -Describe Stream Analytics. -Implement Spark Streaming Using the DStream API. -Develop Big Data Real-Time Processing Solutions with Apache Storm. -Build Solutions that use Kafka and HBase

Course Outline

1 - GETTING STARTED WITH HDINSIGHT

  • What is Big Data?
  • Introduction to Hadoop
  • Working with MapReduce Function
  • Introducing HDInsight
  • Lab : Working with HDInsight

2 - DEPLOYING HDINSIGHT CLUSTERS

  • Identifying HDInsight cluster types
  • Managing HDInsight clusters by using the Azure portal
  • Managing HDInsight Clusters by using Azure PowerShell
  • Lab : Managing HDInsight clusters with the Azure Portal

3 - AUTHORIZING USERS TO ACCESS RESOURCES

  • Non-domain Joined clusters
  • Configuring domain-joined HDInsight clusters
  • Manage domain-joined HDInsight clusters
  • Lab : Authorizing Users to Access Resources

4 - LOADING DATA INTO HDINSIGHT

  • Storing data for HDInsight processing
  • Using data loading tools
  • Maximising value from stored data
  • Lab : Loading Data into your Azure account

5 - TROUBLESHOOTING HDINSIGHT

  • Analyze HDInsight logs
  • YARN logs
  • Heap dumps
  • Operations management suite
  • Lab : Troubleshooting HDInsight

6 - IMPLEMENTING BATCH SOLUTIONS

  • Apache Hive storage
  • HDInsight data queries using Hive and Pig
  • Operationalize HDInsight
  • Lab : Implement Batch Solutions

7 - DESIGN BATCH ETL SOLUTIONS FOR BIG DATA WITH SPARK

  • What is Spark?
  • ETL with Spark
  • Spark performance
  • Lab : Design Batch ETL solutions for big data with Spark.

8 - ANALYZE DATA WITH SPARK SQL

  • Implementing iterative and interactive queries
  • Perform exploratory data analysis
  • Lab : Performing exploratory data analysis by using iterative and interactive queries

9 - ANALYZE DATA WITH HIVE AND PHOENIX

  • Implement interactive queries for big data with interactive hive.
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix
  • Lab : Analyze data with Hive and Phoenix

10 - STREAM ANALYTICS

  • Stream analytics
  • Process streaming data from stream analytics
  • Managing stream analytics jobs
  • Lab : Implement Stream Analytics

11 - IMPLEMENTING STREAMING SOLUTIONS WITH KAFKA AND HBASE

  • Building and Deploying a Kafka Cluster
  • Publishing, Consuming, and Processing data using the Kafka Cluster
  • Using HBase to store and Query Data
  • Lab : Implementing Streaming Solutions with Kafka and HBase

12 - DEVELOP BIG DATA REAL-TIME PROCESSING SOLUTIONS WITH APACHE STORM

  • Persist long term data
  • Stream data with Storm
  • Create Storm topologies
  • Configure Apache Storm
  • Lab : Developing big data real-time processing solutions with Apache Storm

13 - CREATE SPARK STREAMING APPLICATIONS

  • Working with Spark Streaming
  • Creating Spark Structured Streaming Applications
  • Persistence and Visualization
  • Lab : Building a Spark Streaming Application

Enroll Today

This is a 5-day class

Price: $2,975.00
Payment Options

ILT Instructor‑Led Training

OLL Online LIVE

GTR  Guaranteed to Run

Class times are listed Eastern time. This class is available for Private Group Training

To sort by location or date, click the ‘When’ and ‘Where’ column headings.

Class dates not listed.
Please contact us for available
dates and times.