All training courses are provided in-house and the courses listed below can be mixed to meet your needs.
Please contact us to discuss what courses would best match your requirements and if you have any questions on specific courses or course categories.
Data science tasks break down when the datasets become too large. Throw enough time and money at this specific problem and it will eventually evaporate. But what can one achieve on a budget? In this course, participants will learn to tackle simple Big Data problems using Spark and H20.
Data mining is the collection of processes by which we can extract useful insights from data. Inherent in this definition is the idea of data reduction: useful insights (whether in the form of summaries, sentiment analyses, etc.) ought to be “smaller” and “more organized” than the original raw data. The challenges presented by high data dimensionality (the so-called curse of dimensionality) must be addressed in order to achieve insightful and interpretable analytical results. In this course, we introduce the basic principles of dimensionality reduction and a number of feature selection methods (filter, wrapper, regularization), discuss some advanced topics (SVD, spectral feature selection, UMAP and other topological reduction methods), with examples.
Bayesian analysis is sometimes maligned by data analysts, due in part to the perceived element of arbitrariness associated with the selection of a meaningful prior distribution for a specific problem and the (former) difficulties involved with producing posterior distributions for all but the simplest situations. On the other hand, we have heard it said that “while classical data analysts need a large bag of clever tricks to unleash on their data, Bayesians only ever really need one.” With the advent of efficient numerical samplers, modern data analysts cannot shy away from adding the Bayesian arrow to their quiver. In this course, we will introduce the basic concepts underpinning Bayesian analysis, and present a small number of examples that illustrate the strengths of the approach.
With the advent of automatic data collection, it is now possible to store and process large troves of data. There are technical issues associated to massive data sets, such as the speed and efficiency of analytical methods, but there are also problems related to the detection of anomalous observations and the analysis of outliers. Extreme and irregular values behave very differently from the majority of observations. For instance, they can represent criminal attacks, fraud attempts, targeted attacks, or data collection errors. As a result, anomaly detection and outlier analysis play a crucial role in cybersecurity, quality control, etc. The (potentially) heavy human price and technical consequences related to the presence of such observations go a long way towards explaining why the topic has attracted attention in recent years. In this course, we will review various detection methods, and provide a comparative analysis of algorithms (performance, limitations), illustrated with the help of some practical examples, and with particular attention paid to supervised and unsupervised methods.