Data Processing using Spark

Duration in Days
5 Days

Training Scope, Objectives, Outlines, and Expected Outcomes

  • Learning Spark
  • Learn ETL over spark.
  • How spark engine work and how to optimize the results.
  • Tuning jobs.
  • Labs and hands-on for all the component.
  • Certification for the component.

Suggested Course Outlines:
  • Spark Processing
    • Introduction to Distributed Systems
    • Spark overview
      • Spark Architecture.
      • Spark Context.
      • Spark RDDs.
      • RDDs Operations
      • Spark Dataframe.
      • Spark SQl
      • Cluster Architectures for Spark (Running or YARN)
    • Spark Engine in details
      • How Spark handle the parallels executions.
      • How spark work with network flows.
      • How Spark work with JVM.
    • Spark Tuning jobs
    • Spark Streaming (Continues applications)
    • Develop spark jobs with Redis
    • Distributed Message Systems(Kafka)
    • Oozie Overview
    • Schedule spark workflow using Ooziev
    • Spark MLib ( Distributed Machine Learning) (Unstructured Data Analytics use cases to be included Ex: Text mining and URL analysis)
  • Spark Graphs
Expected Accomplishments:
How the training can possibly add value
  • How to perform ETL over spark.
  • Spark streaming and how to create streaming jobs.
  • Spark graph and functions over graph.
  • Learning spark engine details.
  • How to tune and optimize your jobs.
©2011 SitesPower Training Institute. All Rights Reserved.