Data Engineering Curriculum

This is the curriculum for IT professionals, data engineers, data analysts, and those supporting data science. The levels build up on another, each block represents a class of two days.

Fundamentals of AI, Machine Learning, Data Science and Predictive Analytics

2019-10-25T02:39:22+00:00Categories: Predictive Analytics & AI, Level 1, Data Culture Electives, Data Science Curriculum, Data Governance Curriculum, Executive Curriculum, Data Engineering Curriculum, Dr Eugene Dubossarsky, Innovation & Tech (CTO) Curriculum, AI Engineering Curriculum, All Academy Courses|Tags: , , , , |

Our leading course has transformed the artificial intelligence (AI), machine learning (ML) and data science practice of the many managers, sponsors, key stakeholders, entrepreneurs and beginning data analytics and data science practitioners who have attended it. This course is an intuitive, hands-on introduction to ai, data science and machine learning, it's your artificial intelligence 101. The training focuses on central concepts and key skills, leaving you with a deep understanding of the foundations of ai and data science and even some of the more advanced tools used in the field. The course does not involve coding, or require any coding knowledge or experience.

Intro to R (+ data visualisation)

2019-11-05T23:25:16+00:00Categories: Level 1, Data Culture Electives, Impact, Data Science Curriculum, R, Data Visualisation, Data Engineering Curriculum, Dr Eugene Dubossarsky, AI Engineering Curriculum, All Academy Courses|Tags: , |

R is the world’s most popular data mining and statistics package. It’s also free, and easy to use, with a range of intuitive graphical interfaces. This two-day course will introduce you to the R programming language, teaching you to create functions and customise code so you can manipulate data and begin to use R self-sufficiently in your work.

Intro to Python for Data Analysis

2019-11-25T06:55:55+00:00Categories: Level 1, Data Culture Electives, Data Science Curriculum, Python, Data Engineering Curriculum, Dr Eugene Dubossarsky, AI Engineering Curriculum, All Academy Courses|Tags: , |

Python is a high-level, general-purpose language used by a thriving community of millions. Data-science teams often use it in their production environments and analysis pipelines, and it’s the tool of choice for elite data-mining competition winners and deep-learning innovations. This course provides a foundation for using Python in exploratory data analysis and visualisation, and as a stepping stone to machine learning.

Data Science and Big Data Analytics: Leveraging Best Practices and Avoiding Pitfalls

2019-10-17T00:15:45+00:00Categories: Data Governance Level 2, Data Engineering Curriculum Electives, Data Science Curriculum, Data Science Level 2, Data Governance Curriculum Electives, Stephen Brobst, Executive Curriculum, Data Visualisation, Data Engineering Level 2, Data Management, Executive Level 2, Big Data, All Academy Courses|Tags: , , , , , , |

Data science is the key to business success in the information economy. This workshop will teach you about best practices in deploying a data science capability for your organisation. Technology is the easy part; the hard part is creating the right organisational and delivery framework in which data science can be successful in your organisation. We will discuss the necessary skill sets for a successful data scientist and the environment that will allow them to thrive. We will draw a strong distinction between “Data R&D” and “Data Product” capabilities within an enterprise and speak to the different skill sets, governance, and technologies needed across these areas. We will also explore the use of open data sets and open source software tools to enable best results from data science in large organisations. Advanced data visualisation will be described as a critical component of a big data analytics deployment strategy. We will also talk about the many pitfalls and how to avoid them.

Data Governance 1

2019-10-25T10:12:30+00:00Categories: Data Culture Level 1, Data Culture Electives, Government, Data Science Curriculum, Data Governance Curriculum, Data Science Level 1, Executive Curriculum, Mark Burnard, Data Engineering Curriculum, Innovation & Tech (CTO) Curriculum, Data Governance Level 1, AI Engineering Curriculum, Financial Risk, Data Engineering Level 1, AI Engineering Level 1, Executive Level 1, All Academy Courses, Innovation & Tech (CTO) Level 1|Tags: , , , |

This two day course provides an informed, realistic and comprehensive foundation for establishing best practice Data Governance in your organisation. Suitable for every level from CDO to executive to data steward, this highly practical course will equip you with the tools and strategies needed to successfully create and implement a Data Governance strategy and roadmap.

Leadership and Resilience Skills for Data Professionals

2019-10-25T10:26:00+00:00Categories: Data Science Curriculum, Data Science Level 1, Data Engineering Curriculum, Leadership & Management, Katrina Loukas, Data Engineering Level 1, All Academy Courses|Tags: , |

Many people today have been developed emotionally and mentally for an era that no longer really exists. This has created a critical soft-skills gap between current workforce ability and business requirements today. In this course participants learn to ‘readapt’ their soft skills so that they are aligned with a thriving 21st century business. They are also given a simple framework from which to continue the self-development so that the training instigates sustainable change.

Data Transformation and Analysis Using Apache Spark

2019-11-25T06:49:44+00:00Categories: Jeffrey Aven, Level 1, Data Science Curriculum Electives, Data Governance Curriculum Electives, Apache Spark, Data Engineering Curriculum, All Academy Courses, Apache Spark Training with Jeffrey Aven, Experienced Analytics Instructor + Big Data Author|Tags: , |

With big data expert and author Jeffrey Aven. The first module in the “Big Data Development Using Apache Spark” series, this course provides a detailed overview of the spark runtime and application architecture, processing patterns, functional programming using Python, fundamental API concepts, basic programming skills and deep dives into additional constructs including broadcast variables, accumulators, and storage and lineage options. Attendees will learn to understand the Apache Spark framework and runtime architecture, fundamentals of programming for Spark, gain mastery of basic transformations, actions, and operations, and be prepared for advanced topics in Spark including streaming and machine learning.

Stream and Event Processing using Apache Spark

2019-11-25T06:49:44+00:00Categories: Jeffrey Aven, Level 2, Data Science Curriculum Electives, Apache Spark, Data Engineering Curriculum, All Academy Courses, Apache Spark Training with Jeffrey Aven, Experienced Analytics Instructor + Big Data Author|Tags: , |

The second module in the “Big Data Development Using Apache Spark” series, this course provides the Spark streaming knowledge needed to develop real-time, event-driven or event-oriented processing applications using Apache Spark. It covers using Spark with NoSQL systems and popular messaging platforms like Apache Kafka and Amazon Kinesis. It covers the Spark streaming architecture in depth, and uses practical hands-on exercises to reinforce the use of transformations and output operations, as well as more advanced stream-processing patterns. With big data expert and author Jeffrey Aven.

Stars, Flakes, Vaults and the Sins of Denormalisation

2019-10-18T03:01:05+00:00Categories: Data Governance Level 2, Innovation & Tech (CTO) Curriculum Electives, Data Governance Curriculum Electives, Executive Curriculum Electives, Innovation & Tech (CTO) Level 2, Stephen Brobst, Data Engineering Curriculum, Data Management, AI Engineering Curriculum, Executive Level 2, Data Engineering Level 1, AI Engineering Level 1, All Academy Courses|Tags: , , , |

Providing both performance and flexibility are often seen as contradictory goals in designing large scale data implementations. In this talk we will discuss techniques for denormalisation and provide a framework for understanding the performance and flexibility implications of various design options. We will examine a variety of logical and physical design approaches and evaluate the trade offs between them. Specific recommendations are made for guiding the translation from a normalised logical data model to an engineered-for-performance physical data model. The role of dimensional modeling and various physical design approaches are discussed in detail. Best practices in the use of surrogate keys is also discussed. The focus is on understanding the benefit (or not) of various denormalisation approaches commonly taken in analytic database designs.

Best Practices in Enterprise Information Management

2019-10-24T04:45:22+00:00Categories: Data Culture Level 1, Data Culture Curriculum, Innovation & Tech (CTO) Curriculum Electives, Data Governance Curriculum, Stephen Brobst, Fraud and Security, Executive Curriculum, Data Engineering Curriculum, Data Governance Level 1, Data Management, Executive Level 2, Big Data, Data Engineering Level 1, All Academy Courses, Innovation & Tech (CTO) Level 3|Tags: , , , , , |

The effective management of enterprise information for analytics deployment requires best practices in the areas of people, processes, and technology. In this talk we will share both successful and unsuccessful practices in these areas. The scope of this workshop will involve five key areas of enterprise information management: (1) metadata management, (2) data quality management, (3) data security and privacy, (4) master data management, and (5) data integration.