Data Science and Big Data Analytics: Leveraging Best Practices and Avoiding Pitfalls

2021-07-23T01:00:38+00:00May 13th, 2019|Tags: , , , , , , |

Data science is the key to business success in the information economy. This workshop will teach you about best practices in deploying a data science capability for your organisation. Technology is the easy part; the hard part is creating the right organisational and delivery framework in which data science can be successful in your organisation. We will discuss the necessary skill sets for a successful data scientist and the environment that will allow them to thrive. We will draw a strong distinction between “Data R&D” and “Data Product” capabilities within an enterprise and speak to the different skill sets, governance, and technologies needed across these areas. We will also explore the use of open data sets and open source software tools to enable best results from data science in large organisations. Advanced data visualisation will be described as a critical component of a big data analytics deployment strategy. We will also talk about the many pitfalls and how to avoid them.

Data Governance I

2022-09-16T07:27:02+00:00May 8th, 2019|Tags: , , , |

This two day course provides an informed, realistic and comprehensive foundation for establishing best practice data governance in your organisation. Suitable for every level from CDO to executive to data steward, this highly practical course will equip you with the tools and strategies needed to successfully create and implement a data governance strategy and roadmap.

Leadership and Resilience Skills for Data Professionals

2021-03-12T03:59:37+00:00July 23rd, 2019|Tags: , |

Many people today have been developed emotionally and mentally for an era that no longer really exists. This has created a critical soft-skills gap between current workforce ability and business requirements today. In this course participants learn to ‘readapt’ their soft skills so that they are aligned with a thriving 21st century business. They are also given a simple framework from which to continue the self-development so that the training instigates sustainable change.

Launching a New Digital Investment

2021-04-16T04:11:58+00:00January 31st, 2021|Tags: , , |

Launching a new digital investment requires the know-how to select technology, justify the investment and obtain stakeholders’ approval. This course gives you the knowledge and tools to justify your digital investment, compare real costs, understand the required resources and finally launch your project, program or product.

Data Governance II

2022-09-16T07:41:39+00:00December 2nd, 2020|Tags: , , , |

This one day course builds on the foundation of Data Governance I, and dives deeper into selected areas that are designed to provide the most practical and real-world applications of data governance. It includes the change management journey to the “data-driven” organisation, and implications of the necessity of model governance in the context of data science, AI/ML initiatives and RPA/IPA .

Advanced Analytics Using Apache Spark

2021-03-12T03:57:56+00:00February 28th, 2019|Tags: , |

With big data expert and author Jeffrey Aven. The third module in the “Big Data Development Using Apache Spark” series, this course provides the practical knowledge needed to perform statistical, machine learning and graph analysis operations at scale using Apache Spark. It enables data scientists and statisticians with experience in other frameworks to extend their knowledge to the Spark runtime environment with its specific APIs and libraries designed to implement machine learning and statistical analysis in a distributed and scalable processing environment.

Fraud and Anomaly Detection

2021-03-12T03:57:53+00:00February 11th, 2019|Tags: , |

This course presents statistical, computational and machine-learning techniques for predictive detection of fraud and security breaches. These methods are shown in the context of use cases for their application, and include the extraction of business rules and a framework for the inter-operation of human, rule-based, predictive and outlier-detection methods. Methods presented include predictive tools that do not rely on explicit fraud labels, as well as a range of outlier-detection techniques including unsupervised learning methods, notably the powerful random-forest algorithm, which can be used for all supervised and unsupervised applications, as well as cluster analysis, visualisation and fraud detection based on Benford’s law. The course will also cover the analysis and visualisation of social-network data. A basic knowledge of R and predictive analytics is advantageous.

Stars, Flakes, Vaults and the Sins of Denormalisation

2021-07-23T01:03:47+00:00May 13th, 2019|Tags: , , , |

Providing both performance and flexibility are often seen as contradictory goals in designing large scale data implementations. In this talk we will discuss techniques for denormalisation and provide a framework for understanding the performance and flexibility implications of various design options. We will examine a variety of logical and physical design approaches and evaluate the trade offs between them. Specific recommendations are made for guiding the translation from a normalised logical data model to an engineered-for-performance physical data model. The role of dimensional modeling and various physical design approaches are discussed in detail. Best practices in the use of surrogate keys is also discussed. The focus is on understanding the benefit (or not) of various denormalisation approaches commonly taken in analytic database designs.

Data Transformation and Analysis Using Apache Spark

2022-02-16T04:17:23+00:00February 21st, 2019|Tags: , |

With big data expert and author Jeffrey Aven. Learn how to develop applications using Apache Spark. The first module in the “Big Data Development Using Apache Spark” series, this course provides a detailed overview of the spark runtime and application architecture, processing patterns, functional programming using Python, fundamental API concepts, basic programming skills and deep dives into additional constructs including broadcast variables, accumulators, and storage and lineage options. Attendees will learn to understand the Apache Spark framework and runtime architecture, fundamentals of programming for Spark, gain mastery of basic transformations, actions, and operations, and be prepared for advanced topics in Spark including streaming and machine learning.

Best Practices in Enterprise Information Management

2021-07-23T00:58:54+00:00May 17th, 2019|Tags: , , , , , |

The effective management of enterprise information for analytics deployment requires best practices in the areas of people, processes, and technology. In this talk we will share both successful and unsuccessful practices in these areas. The scope of this workshop will involve five key areas of enterprise information management: (1) metadata management, (2) data quality management, (3) data security and privacy, (4) master data management, and (5) data integration.

Go to Top