This course provides a detailed overview of the Apache Spark runtime and application architecture as well as the fundamental concepts of the RDD and DataFrame APIs in Spark.

Basic primers on the map reduce processing pattern and functional programming using Python are provided as well.

The course will teach Apache Spark programming using the transformations and actions available in the RDD and DataFrame APIs and within Spark SQL. Hands-on exercises are provided throughout the course to reinforce concepts.

As well as basic programming skills, additional deep dives are provided into additional programming and runtime constructs such as broadcast variables, accumulators, and RDD and DataFrame storage and lineage options.

Topics covered include:

  • Apache Spark introduction and background
  • Map reduce processing pattern
  • Spark deployment modes
  • Spark runtime and application architecture
  • Understanding Spark RDDs
  • Using Spark with distributed file systems and object stores
  • Functional programming with Python
  • Using Spark RDD transformations and actions
  • RDD storage levels
  • Caching, persistence and checkpointing of Spark RDDs
  • Broadcast variables and accumulators
  • Partitioning in Spark
  • Processing RDDs with external programs
  • Improving Spark application performance
  • Apache hive metastore overview
  • DataFrame API and Spark SQL architecture
  • Using the DataFrameReader and DataFrameWriter APIs
  • Using DataFrame API transformations and actions
  • Using Apache Spark SQL
  • Choosing between the RDD and DataFrame APIs

Developed by Jeffrey Aven, author of SAMS Teach Yourself Apache Spark and Data and Analytics with Spark using Python, this course will provide the core knowledge and skills needed to develop applications using Apache Spark.

The “Data Transformation and Analysis Using Apache Spark” module is the first of three modules in the “Big Data Development Using Apache Spark” series, and lays the foundations for subsequent modules including “Stream and Event Processing using Apache Spark” and “Advanced Analytics using Apache Spark”.

See what former trainees are saying about AlphaZetta courses.

Additional Information

Audience Expert
This course is suitable for developers and analysts who will be working with Spark. It is ideally suited for users transitioning to a Spark runtime environment from a relational database programming or analysis background (eg, data warehouse/ETL developers or BI analysts).
  • General programming skills
  • Basic Python programming skills
  • Some data warehouse, BI or transactional database experience is preferable but not required
  • Some prior exposure to Spark or Hadoop is preferable but not required
Objective Attendees should, by the end of the course:

  • Understand the Spark distributed processing framework and runtime architecture
  • Understand the fundamentals of Spark programming using both the RDD and DataFrame APIs
  • Have mastery over the basic transformations and actions in the Spark RDD API
  • Have mastery over basic Spark DataFrame operations
  • Be prepared for more advanced topics in Spark including Spark streaming and machine learning
Format Class
Duration 2 days
Trainer Courses are taught by Jeffrey Aven.

Jeffrey Aven is a big data, open source software, and cloud computing consultant, author and instructor based in Melbourne, Australia.

Jeffrey has extensive experience as a technical instructor, having taught courses on Hadoop and HBase for Cloudera (awarded Cloudera Hadoop Instructor of the Year for APAC in 2013) and courses on Apache Kafka for Confluent in addition to delivering his own courses.

Jeffrey is also the author of several Big Data related books including SAMS Teach Yourself Hadoop in 24 Hours, SAMS Teach Yourself Apache Spark in 24 Hours and Data Analytics with Spark using Python.

In addition to his credentials as an instructor and author, Jeff has over thirty years of industry experience and has been involved in key roles with several major big data and cloud implementations over the last several years.

Delivery Method In-person at AlphaZetta Academy locations or on-premise for corporate groups

Our online courses run as live online meetings using Zoom for the video meeting part and Microsoft virtual computers for the practical components. The benefit of having a live trainer for online training is you can ask questions, obtain mentoring from the trainer and interact with classmates.

Course participants will require the following technologies and online accounts. Please check that your setup satisfies these requirements:

  • Course participants will require the following technologies and online accounts:
  • Reliable computer (Windows, Mac or Linux)
  • Webcam (to help facilitate the mentoring aspect of our training)
  • Reliable internet access
  • A quiet space
  • Zoom video conferencing software and Zoom account (register and pre-install the software at
  • Microsoft account in order to access the virtual lab PCs (Existing or new account. There’s nothing to be installed, you just need an account to sign-in with.)

Meals and refreshments

Face-to-face courses: Catered morning tea and lunch are provided on both days of the course. Please notify us at least a week ahead if you have any special dietary requirements.


Use to email us any questions about the course, including requests for more detail, or for specific content you would like to see covered, or queries regarding prerequisites and suitability.
If you would like to attend but for any reason cannot, please also let us know.


Course material may vary from advertised due to demands and learning pace of attendees. Additional material may be presented, along with or in place of advertised.

Cancellations and refunds

You can get a full refund if you cancel 14 days or more before the course starts. No refunds will be issued for cancellations made less than 14 days before the course starts.

Frequently asked questions (FAQ)

Do I need to bring my own computer?
This is dependent on the venue. Please check the course event page.

Why do I need to provide a shipping address?
For online courses, we need an address to send you the course notes that you need for the course.

Private and Corporate Training

In addition to our public seminars, workshops and courses, AlphaZetta Academy can provide this training for your organisation in a private setting at your location or ours, or online. Please enquire to discuss your needs.

Enquire Now

Scheduled Public Courses

Private and Corporate Training

In addition to our public seminars, workshops and courses, AlphaZetta Academy can provide this training for your organisation in a private setting at your location or ours, or online. Please enquire to discuss your needs.

Enquire Now

Other Apache Spark Courses

Other Data Engineering Curriculum Courses


Eugene’s courses are not your standard technical courses where you learn how to put data into a model and get a result. The real life experiences – warts and all – he brings to the instruction mean that attendees walk away with a better understanding of the real life challenges of analytics as well as the technical know-how. We routinely send our team members on these courses to help them get the capabilities that really help our clients get better insights from their data.

James Beresford, Director, Agile BI

Eugene’s introductory course to data science was outstanding. I found the subject matter and delivery fascinating, accessible and informative. I found Eugene approachable, interesting to listen to and excellent at simplifying complex concepts. I highly recommend this course for anyone who wants to know what data science—and all the buzz surrounding it!—are all about.

C.T. Johnson, Director, Statecraft