* This tutorial is not included in Tutorial Day Registration (10 Nov). To attend this tutorial, please register for Main Conference.

Tutorial on Massively Scalable Production Grade Deep Learning with the Microsoft Cognitive Toolkit *

Full-day hands-on tutorial at CIKM 2017, Tuesday 7 November 2017

Sayan Pathak

Microsoft Research, University of Washington

Sayan Pathak, PhD is a Principal Engineer and Machine Learning Scientist in the Cognitive Toolkit (CNTK) team at Microsoft. He has published and commercialized cutting edge computer vision and machine learning technology to big data problems applied to medical imaging, neuroscience, computational advertising and social network domains. He has developed ML based technologies that have been FDA 510k cleared for clinical use in US. Prior to joining Microsoft, he worked at Allen Institute for Brain Science where published in top journals such as Nature, Nature Neuroscience, IEEE Journals etc. He has over 25 peer reviewed journal papers and has presented at conferences across the globe.

He is also a faculty at the University of Washington for 15 years and has active collaboration with different faculty at the Indian Institute of Technology for over 4 years. He has been a consultant to several startups and principal investigator on several US National Institutes of Health (NIH) grants. He received his BS from Indian Institute of Technology, Kharagpur, India in 1994. He earned his MS and PhD in Bioengineering (with computer vision specialization) in 1996 and 2000 respectively.

Frank Seide

Microsoft Research

Frank Seide, a native of Hamburg, Germany, is a Principal Researcher at Microsoft Research, and an architect of Microsoft's Cognitive Toolkit for deep learning. His current research focus is on deep neural networks for conversational speech recognition. Together with co-author Dong Yu, he was first to show the effectiveness of deep neural networks for recognition of conversational speech, and he was part of the effort to break through human parity on this task in 2016. Throughout his career, he has been interested in and worked on a broad range of topics and components of automatic speech recognition, including spoken-dialogue systems, recognition of Mandarin Chinese, and, particularly, large-vocabulary recognition of conversational speech with application to audio indexing, transcription, speech-to-speech translation, and distributed neural-network model training.

In 1993, Frank received a Master degree in electrical engineering from the University of Technology of Hamburg-Harburg, Germany, and joined the speech research group of Philips Research in Aachen, Germany, to work on spoken-dialogue systems. He then transferred to Taiwan as one of the founding members of Philips Research East-Asia, Taipei, to lead a research project on Mandarin speech recognition. In June 2001, he joined the speech group at Microsoft Research Asia, Beijing, initially as a Researcher, since 2003 as Project Leader for offline speech applications, and since October 2006 as Research Manager. In 2014, Frank joined the Speech & Dialogue group at MSR Redmond as a Principal Researcher.

Install Cognitive Toolkit (CNTK)Please install Cognitive Toolkit (CNTK) following the instruction located here. The attendees will be able to follow the content without needing to install the toolkit. However, for the best educational experience it is highly recommended that one install the toolkit. The tutorial is packed with content leaving little time to troubleshoot issues if attendees plan on installing at the venue which may be severely limited in network bandwidth needed for downloading the pre-requisites, toolkit and the test data.

What is this tutorial about?

Train neural networks like the Microsoft product groups do! This hands-on tutorial introduces the Microsoft Cognitive Toolkit (formerly known as CNTK), a scalable open-source deep-learning toolkit for Windows and Linux that has been used for Skype, Cortana, Bing, Xbox, and much more. For example, Microsoft product groups achieved a major breakthrough when they used CNTK to learn to recognize conversational speech as well as a human can. In general, CNTK trains models as fast or faster than its competitors and scales up well on modern GPUs (see the performance comparison from HKBU and recent demos from NVidia at SC’16 and Cray at NIPS’16). CNTK is also compatible with popular algorithms (feed-forward, convolutional, and recurrent networks) and languages, including APIs for Python and C++.

This tutorial is split into three sections. In Part 1, we introduce how the toolkit has been used in different domains within and outside Microsoft. We also show how it is being used by customers both on-premises and on Azure cloud. In Part 2, we dive into the structure of deep learning models and elaborate on CNTK’s optimizations that make it scale well across multiple servers and GPUs. Part 3 is a hands-on session where you will use Jupyter notebooks to train several types of deep neural networks, taken from use cases in the vision, text, time series and speech domains. In particular, you will work with residual networks using fully connected CNN, which achieved record breaking effectiveness in recognizing objects in the ImageNet challenge; RNN/LSTM for sequence-to-sequence and reading comprehension of text documents (ResoNet); autoencoders for semi-supervised learning; Generative Adversarial Networks (GAN) for unsupervised learning; and reinforcement learning for enabling machines to optimally perform tasks such as playing games. Network bandwidth permitting, we will also explore scalable learning on Azure with simple introductory to very advanced end-to-end use cases. Time permitting, we will discuss integration of CNTK with Spark for scaling out.

Keywords. Deep learning, large-scale distributed machine learning, online learning, text (sequence to sequence, reasoning networks), computer vision (ResNet, GANs, video) and reinforcement learning, cloud based distributed computing, convolutional neural networks, LSTMs, Recurrence, GAN, autoencoders, Azure, Spark

Materials. The tutorial’s code and data will be made available through Jupyter iPython notebooks; CNTK itself can be downloaded from GitHub. The hands-on sessions require a laptop with Windows 7+ or Linux, and a CUDA-capable GPU is recommended.

Who should attend?

Current or future deep-learning practitioners and researchers looking for a tool that is easy to use yet efficient and scalable across multi-machine GPU clusters for real-world workloads. In Part 1, those new to deep learning will gain an understanding of what deep learning can achieve and its applications in the field. Part 2 requires familiarity with basic concepts from programming, linear algebra, and probability. Part 3 requires basic programming skills. If you are not familiar with Python, you will be able to run the code and gain experience. If you are, you will be able to explore more on your own.

What will I get out of this tutorial?

If your background and interest are primarily non-technical, e.g. if you are an executive or business-oriented practitioner, you will learn how deep learning is becoming pervasive in different areas including how we are using it at Microsoft, as well as our external partners. If you are technically minded, you will learn how to perform classification and regression using supervised, semi-supervised and unsupervised technique. If you are an advanced user, you will be exposed to different advanced algorithms and learn to code, modify and scale using state-of-the-art techniques.

You will leave this tutorial with working recipes for the different use cases, along with the experience of running their jobs in Azure (network bandwidth permitting). Time permitting, you will also learn how to distribute the workload by integrating with Spark.

Detailed Outline

Section

Topics

Intro to deep learning

Application, Performance and scalability

Scalability & performance

Comparison of different toolkits, Performance scalability tricks and high-level introduction to Azure & Spark

Deep dive

Dissect a feed forward network, Fun with recurrence

Hands-on (beginners)

Predict cancer using logistic regression, Classify digits with feed forward, Analyze time series data with LSTM, Integrate with Spark

Hands-on (intermediate)

Text / Language

Sequence classification, Sequence to sequence learning

Vision

ResNet, Auto encoder

Hands-on (advanced)

Text

Sequence to sequence with attention, ReasoNet

Vision

GAN, Video processing

Reinforcement learning

Atari game or self-driving car