Home |Courses |SQL Server |SQL Server On-Demand | MOC On-Demand: 20775-Perform Data Engineering on Microsoft HDInsight On-Demand Course

MOC On-Demand: 20775-Perform Data Engineering on Microsoft HDInsight - On-Demand Course

Learn SQL Server at your own pace with our On-Demand training.

Available 24/7

Professional Instruction

Free Training Materials

Course Details

Section 1: Getting Started with HDInsight
This section introduces Hadoop, the MapReduce paradigm, and HDInsight.

Topics :
• What is Big Data?
• Introduction to Hadoop
• Working with MapReduce Function
• Introducing HDInsight

Lab :
• Working with HDInsight
• Provision an HDInsight cluster and run MapReduce jobs

Section 2: Deploying HDInsight Clusters
This section provides an overview of the Microsoft Azure HDInsight cluster types, in addition to the creation and maintenance of the HDInsight clusters. The module also demonstrates how to customize clusters by using script actions through the Azure Portal, Azure PowerShell, and the Azure command-line interface (CLI). This section includes labs that provide the steps to deploy and manage the clusters.

Topics :
• Identifying HDInsight cluster types
• Managing HDInsight clusters by using the Azure portal
• Managing HDInsight Clusters by using Azure PowerShell

Lab :
• Managing HDInsight clusters with the Azure Portal
• Creating an HDInsight cluster that uses Data Lake Store storage
• Customize HDInsight by using script actions
• Delete an HDInsight cluster

Section 3: Authorizing Users to Access Resources
This section provides an overview of non-domain and domain-joined Microsoft HDInsight clusters, in addition to the creation and configuration of domain-joined HDInsight clusters. The module also demonstrates how to manage domain-joined clusters using the Ambari management UI and the Ranger Admin UI. This section includes the labs that will provide the steps to create and manage domain-joined clusters.

Topics :
• Non-domain Joined clusters
• Configuring domain-joined HDInsight clusters
• Manage domain-joined HDInsight clusters

Lab :
• Authorizing Users to Access Resources
• Prepare the Lab Environment
• Manage a non-domain joined cluster

Section 4: Loading data into HDInsight
This section provides an introduction to loading data into Microsoft Azure Blob storage and Microsoft Azure Data Lake storage. At the end of this lesson, you will know how to use multiple tools to transfer data to an HDInsight cluster. You will also learn how to load and transform data to decrease your query run time.

Topics :
• Storing data for HDInsight processing
• Using data loading tools
• Maximising value from stored data

Lab :
• Loading Data into your Azure account
• Load data for use with HDInsight

Section 5: Troubleshooting HDInsight
In this module, you will learn how to interpret logs associated with the various services of Microsoft Azure HDInsight cluster to troubleshoot any issues you might have with these services. You will also learn about Operations Management Suite (OMS) and its capabilities.

Topics :
• Analyze HDInsight logs
• YARN logs
• Heap dumps
• Operations management suite

Lab :
• Troubleshooting HDInsight
• Analyze HDInsight logs
• Analyze YARN logs
• Monitor resources with Operations Management Suite

Section 6: Implementing Batch Solutions
In this module, you will look at implementing batch solutions in Microsoft Azure HDInsight by using Hive and Pig. You will also discuss the approaches for data pipeline operationalization that are available for big data workloads on an HDInsight stack.

Topics :
• Apache Hive storage
• HDInsight data queries using Hive and Pig
• Operationalize HDInsight

Lab :
• Implement Batch Solutions
• Deploy HDInsight cluster and data storage
• Use data transfers with HDInsight clusters
• Query HDInsight cluster data

Section 7: Design Batch ETL solutions for big data with Spark
This section provides an overview of Apache Spark, describing its main characteristics and key features. Before you start, it’s helpful to understand the basic architecture of Apache Spark and the different components that are available. The module also explains how to design batch Extract, Transform, Load (ETL) solutions for big data with Spark on HDInsight. The final lesson includes some guidelines to improve Spark performance.

Topics :
• What is Spark?
• ETL with Spark
• Spark performance

Lab :
• Design Batch ETL solutions for big data with Spark.
• Creating a HDInsight Cluster with access to Data Lake Store
• Use HDInsight Spark cluster to analyze data in Data Lake Store
• Analyzing website logs using a custom library with Apache Spark cluster on HDInsight
• Managing resources for Apache Spark cluster on Azure HDInsight

Section 8: Analyze Data with Spark SQL
This section describes how to analyze data by using Spark SQL. In it, you will be able to explain the differences between RDD, Datasets and Dataframes, identify the uses cases between Iterative and Interactive queries, and describe best practices for Caching, Partitioning and Persistence. You will also look at how to use Apache Zeppelin and Jupyter notebooks, carry out exploratory data analysis, then submit Spark jobs remotely to a Spark cluster.

Topics :
• Implementing iterative and interactive queries
• Perform exploratory data analysis

Lab :
• Performing exploratory data analysis by using iterative and interactive queries
• Build a machine learning application
• Use zeppelin for interactive data analysis
• View and manage Spark sessions by using Livy

Section 9: Analyze Data with Hive and Phoenix
In this module, you will learn about running interactive queries using Interactive Hive (also known as Hive LLAP or Live Long and Process) and Apache Phoenix. You will also learn about the various aspects of running interactive queries using Apache Phoenix with HBase as the underlying query engine.

Topics :
• Implement interactive queries for big data with interactive hive.
• Perform exploratory data analysis by using Hive
• Perform interactive processing by using Apache Phoenix

Lab :
• Analyze data with Hive and Phoenix
• Implement interactive queries for big data with interactive Hive
• Perform exploratory data analysis by using Hive
• Perform interactive processing by using Apache Phoenix

Section 10: Stream Analytics
The Microsoft Azure Stream Analytics service has some built-in features and capabilities that make it as easy to use as a flexible stream processing service in the cloud. You will see that there are a number of advantages to using Stream Analytics for your streaming solutions, which you will discuss in more detail. You will also compare features of Stream Analytics to other services available within the Microsoft Azure HDInsight stack, such as Apache Storm. You will learn how to deploy a Stream Analytics job, connect it to the Microsoft Azure Event Hub to ingest real-time data, and execute a Stream Analytics query to gain low-latency insights. After that, you will learn how Stream Analytics jobs can be monitored when deployed and used in production settings.

Topics :
• Stream analytics
• Process streaming data from stream analytics
• Managing stream analytics jobs

Lab :
• Implement Stream Analytics
• Process streaming data with stream analytics
• Managing stream analytics jobs

Section 11: Implementing Streaming Solutions with Kafka and HBase
In this module, you will learn how to use Kafka to build streaming solutions. You will also see how to use Kafka to persist data to HDFS by using Apache HBase, and then query this data.

Topics :
• Building and Deploying a Kafka Cluster
• Publishing, Consuming, and Processing data using the Kafka Cluster
• Using HBase to store and Query Data

Lab :
• Implementing Streaming Solutions with Kafka and HBase
• Creating a virtual network and gateway
• Creating a storm cluster for Kafka
• Creating a Kafka producer
• Creating a streaming processor client topology
• Creating a Power BI dashboard and streaming dataset
• Creating an HBase cluster
• Creating a streaming processor to write to HBase

Section 12: Develop big data real-time processing solutions with Apache Storm
This section explains how to develop big data real-time processing solutions with Apache Storm.

Topics :
• Persist long term data
• Stream data with Storm
• Creating Storm topologies
• Configure Apache Storm

Lab :
• Developing big data real-time processing solutions with Apache Storm
• Stream data with Storm
• Creating Storm Topologies

Section 13: Create Spark Streaming Applications
This section describes Spark Streaming; explains how to use discretized streams (DStreams); and explains how to apply the concepts to develop Spark Streaming applications.

Topics :
• Working with Spark Streaming
• Creating Spark Structured Streaming Applications
• Persistence and Visualization

Lab :
• Building a Spark Streaming Application
• Installing Required Software
• Building the Azure Infrastructure
• Building a Spark Streaming Pipeline

Please check the course description to find prerequisite information.

-10%

MOC On-Demand: 20775-Perform Data Engineering on Microsoft HDInsight

On-Demand Training Course

$ 995
90/month licence

24/7 Access
Hands-On Practice Exercises
Free Repeats
Professional Instruction

Enroll Today

Testimonials

This was the class I needed.

The instructor Jeff took his time and made sure we understood each topic before moving to the next. He answered all of our questions, and I don't know about the rest of the students, but was very pleased with this experience.

I finally understand how to use Excel.

-Amanda T (Yale New Haven Hospital).

Great class!

We were able to cover a lot of information in one day without getting overwhelmed.

-Maria R (Microsoft).

Free Repeats

Learn At Your Pace

No Travel

Professional Instruction

Affordable Pricing

Group Discounts

Why Choose Us?

Business Computer Skills has provided professional IT training services for individual students and organizations for almost 20 Years.

Our combination of expert instructors, hands-on learning, convenient class schedules and affordable prices will help you achieve your learning goals.

MOC On-Demand: 20775-Perform Data Engineering on Microsoft HDInsight - On-Demand Course

Course Details

MOC On-Demand: 20775-Perform Data Engineering on Microsoft HDInsight

Testimonials

Company

Course Types

Support

MOC On-Demand: 20775-Perform Data Engineering on Microsoft HDInsight - On-Demand Course

Course Details

MOC On-Demand: 20775-Perform Data Engineering on Microsoft HDInsight

Testimonials

Company

Course Types

Support

Social