In this lesson we’ll take the skills we’ve learned developing Hadoop MapReduce jobs using C# and Visual Studio, and apply them to the cloud.

To run jobs on the Azure HDInsight, first you need to provision a cloud Hadoop infrastructure, which is covered in Deploying an HDInsight Cluster in Windows Azure. To learn to create a C# MapReduce from scratch, make sure you’ve seen the lesson on Intro to Hadoop MapReduce with C#.

Microsoft’s distribution of Hadoop, HDInsight, can be used on a local workstation as a development environment as we have done in prior videos.  However Big Data jobs running against large data sets require a production-scale Hadoop cluster.  Windows Azure HDInsight is Microsoft’s elastic Hadoop infrastructure delivered on its cloud platform, Windows Azure.

The process for developing MapReduce jobs that run in Azure HDInsight is the same as we have done previously.  In this lesson we’ll make a few changes so our MapReduce job will obtain and store data in the Azure Storage Vault (ASV), which provides an HDFS file system running on the economical Windows Azure Blob Storage infrastructure.

At the conclusion of this lesson, your Visual Studio based MapReduce job will automatically upload itself to Azure HDInsight and run very much like it did on your local workstation–except it will now have access to a highly scalable cloud compute and storage environment!

Leave a Reply

Your email address will not be published. Required fields are marked *

*