With big data expert and author Jeffrey Aven. The first module in the “Big Data Development Using Apache Spark” series, this course provides a detailed overview of the spark runtime and application architecture, processing patterns, functional programming using Python, fundamental API concepts, basic programming skills and deep dives into additional constructs including broadcast variables, accumulators, and storage and lineage options. Attendees will learn to understand the Apache Spark framework and runtime architecture, fundamentals of programming for Spark, gain mastery of basic transformations, actions, and operations, and be prepared for advanced topics in Spark including streaming and machine learning.
The second module in the “Big Data Development Using Apache Spark” series, this course provides the Spark streaming knowledge needed to develop real-time, event-driven or event-oriented processing applications using Apache Spark. It covers using Spark with NoSQL systems and popular messaging platforms like Apache Kafka and Amazon Kinesis. It covers the Spark streaming architecture in depth, and uses practical hands-on exercises to reinforce the use of transformations and output operations, as well as more advanced stream-processing patterns. With big data expert and author Jeffrey Aven.
With big data expert and author Jeffrey Aven. The third module in the “Big Data Development Using Apache Spark” series, this course provides the practical knowledge needed to perform statistical, machine learning and graph analysis operations at scale using Apache Spark. It enables data scientists and statisticians with experience in other frameworks to extend their knowledge to the Spark runtime environment with its specific APIs and libraries designed to implement machine learning and statistical analysis in a distributed and scalable processing environment.
This course presents statistical, computational and machine-learning techniques for predictive detection of fraud and security breaches. These methods are shown in the context of use cases for their application, and include the extraction of business rules and a framework for the interoperation of human, rule-based, predictive and outlier-detection methods. Methods presented include predictive tools that do not rely on explicit fraud labels, as well as a range of outlier-detection techniques including unsupervised learning methods, notably the powerful random-forest algorithm, which can be used for all supervised and unsupervised applications, as well as cluster analysis, visualisation and fraud detection based on Benford’s law. The course will also cover the analysis and visualisation of social-network data. A basic knowledge of R and predictive analytics is advantageous.
Text analytics is a crucial skill set in nearly all contexts where data science has an impact, whether that be customer analytics, fraud detection, automation or fintech. In this course, you will learn a toolbox of skills and techniques, starting from effective data preparation and stretching right through to advanced modelling with deep-learning and neural-network approaches such as word2vec.
This course is an intuitive introduction to forecasting and analysis of time-series data. We will review a range of standard forecasting methods, including ARIMA and exponential smoothing, along with standard means of measuring forecast error and benchmarking with naive forecasts, and standard pre-processing/de-trending methods such as differencing and missing value imputation. Other topics will include trend/seasonality/noise decomposition, autocorrelation, visualisation of time series, and forecasting with uncertainty.
This class builds on the introductory Python class. Jupyter Notebook advanced use and customisation is covered as well as configuring multiple environments and kernels. The Numpy package is introduced for working with arrays and matrices and a deeper coverage of Pandas data analysis and manipulation methods is provided including working with time series data. Data exploration and advanced visualisations are taught using the Plotly and Seaborne libraries.
This course goes deeper into the tidyverse family of packages, with a focus on advanced data handling, as well as advanced data structures such as list columns in tibbles, and their application to model management. Another key topic is advanced functional programming with the purrr package, and advanced use of the pipe operator. Optional topics may include dplyr on databases, and use of rmarkdown and Rstudio notebooks.
Blockchain is one of the most disruptive and least understood technologies to emerge over the previous decade. This course gives participants an intuitive understanding of blockchain in both public and private contexts, allowing them to distinguish genuine use cases from hype. We explore public crypto-currencies, smart contracts and consortium chains, interspersing theory with case studies from areas such as financial markets, health care, trade finance, and supply chain. The course does not require a technical background.
This is an introduction to the exciting new field of quantum computing, including programming actual quantum computers in the cloud. Quantum computing promises to revolutionise cryptography, machine learning, cyber security, weather forecasting and a host of other mathematical and high-performance computing fields. A practical component will include writing quantum programs and executing them on simulators as well as on actual quantum computers in the cloud.