Data Processing using Spark
Duration in Days
Training Scope, Objectives, Outlines, and Expected Outcomes
- Learning Spark
- Learn ETL over spark.
- How spark engine work and how to optimize the results.
- Tuning jobs.
- Labs and hands-on for all the component.
- Certification for the component.
Suggested Course Outlines:
- Spark Processing
- Introduction to Distributed Systems
- Spark overview
- Spark Architecture.
- Spark Context.
- Spark RDDs.
- RDDs Operations
- Spark Dataframe.
- Spark SQl
- Cluster Architectures for Spark (Running or YARN)
- Spark Engine in details
- How Spark handle the parallels executions.
- How spark work with network flows.
- How Spark work with JVM.
- Spark Tuning jobs
- Spark Streaming (Continues applications)
- Develop spark jobs with Redis
- Distributed Message Systems(Kafka)
- Oozie Overview
- Schedule spark workflow using Ooziev
- Spark MLib ( Distributed Machine Learning) (Unstructured Data Analytics use cases to be included Ex: Text mining and URL analysis)
- Spark Graphs
How the training can possibly add value
- How to perform ETL over spark.
- Spark streaming and how to create streaming jobs.
- Spark graph and functions over graph.
- Learning spark engine details.
- How to tune and optimize your jobs.