Sayan Pathak, PhD is a Principal
Engineer and Machine Learning Scientist in the Cognitive Toolkit (CNTK) team at Microsoft. He has published and commercialized cutting edge computer vision and machine learning technology to big data problems applied to medical imaging, neuroscience, computational advertising and social network domains. He has developed ML based technologies that have been FDA 510k cleared for clinical use in US. Prior to joining Microsoft, he worked at Allen Institute for Brain Science where published in top journals such as Nature, Nature Neuroscience, IEEE Journals etc. He has over 25 peer reviewed journal papers and has presented at conferences across the globe.
He is also a faculty at the University of Washington for 15 years and has active collaboration with different faculty at the Indian Institute of Technology for over 4 years. He has been a consultant to several startups and principal investigator on several US National Institutes of Health
(NIH) grants. He received his BS from Indian Institute of Technology, Kharagpur, India in 1994. He earned his MS and PhD in Bioengineering (with computer vision specialization) in 1996 and 2000 respectively.
Frank Seide, a native of
Hamburg, Germany, is a Principal Researcher at Microsoft Research, and an architect of Microsoft's Cognitive Toolkit for deep learning. His current research focus is on deep neural networks for conversational speech recognition. Together with co-author Dong Yu, he was first to show the effectiveness of deep neural networks for recognition of conversational speech, and he was part of the effort to break through human parity on this task in 2016. Throughout his career, he has been interested in and worked on a broad range of topics and components of automatic speech recognition, including spoken-dialogue systems, recognition of Mandarin Chinese, and, particularly, large-vocabulary recognition of conversational speech with application to audio indexing, transcription, speech-to-speech translation, and distributed neural-network model training.
In 1993, Frank received a Master degree in electrical engineering from the University of Technology of Hamburg-Harburg, Germany, and joined the speech research group of Philips Research in Aachen, Germany, to work on spoken-dialogue systems. He then transferred to Taiwan as one of the founding members of Philips Research East-Asia, Taipei, to lead a research project on Mandarin speech recognition. In June 2001, he joined the speech group at Microsoft Research Asia, Beijing, initially as a Researcher, since 2003 as Project Leader for offline speech applications, and since October 2006 as Research Manager. In 2014, Frank joined the Speech & Dialogue group at MSR Redmond as a Principal Researcher.
Train neural networks like the Microsoft product groups do! This hands-on tutorial introduces the Microsoft Cognitive Toolkit (formerly known as CNTK), a scalable open-source deep-learning toolkit for Windows and Linux that has been used for Skype, Cortana, Bing, Xbox, and much more. For example, Microsoft product groups achieved a major breakthrough when they used CNTK to learn to recognize conversational speech as well as a human can. In general, CNTK trains models as fast or faster than its competitors and scales up well on modern GPUs (see the performance comparison from HKBU and recent demos from NVidia at SC’16 and Cray at NIPS’16). CNTK is also compatible with popular algorithms (feed-forward, convolutional, and recurrent networks) and languages, including APIs for Python and C++.
This tutorial is split into three sections. In Part 1, we introduce how the toolkit has been used in different domains within and outside Microsoft. We also show how it is being used by customers both on-premises and on Azure cloud. In Part 2, we dive into the structure of deep learning models and elaborate on CNTK’s optimizations that make it scale well across multiple servers and GPUs. Part 3 is a hands-on session where you will use Jupyter notebooks to train several types of deep neural networks, taken from use cases in the vision, text, time series and speech domains. In particular, you will work with residual networks using fully connected CNN, which achieved record breaking effectiveness in recognizing objects in the ImageNet challenge; RNN/LSTM for sequence-to-sequence and reading comprehension of text documents (ResoNet); autoencoders for semi-supervised learning; Generative Adversarial Networks (GAN) for unsupervised learning; and reinforcement learning for enabling machines to optimally perform tasks such as playing games. Network bandwidth permitting, we will also explore scalable learning on Azure with simple introductory to very advanced end-to-end use cases. Time permitting, we will discuss integration of CNTK with Spark for scaling out.
Keywords. Deep learning, large-scale distributed machine learning, online learning, text (sequence to sequence, reasoning networks), computer vision (ResNet, GANs, video) and reinforcement learning, cloud based distributed computing, convolutional neural networks, LSTMs, Recurrence, GAN, autoencoders, Azure, Spark
Materials. The tutorial’s code and data will be made available through Jupyter iPython notebooks; CNTK itself can be downloaded from GitHub. The hands-on sessions require a laptop with Windows 7+ or Linux, and a CUDA-capable GPU is recommended.
Current or future deep-learning practitioners and researchers looking for a tool that is easy to use yet efficient and scalable across multi-machine GPU clusters for real-world workloads. In Part 1, those new to deep learning will gain an understanding of what deep learning can achieve and its applications in the field. Part 2 requires familiarity with basic concepts from programming, linear algebra, and probability. Part 3 requires basic programming skills. If you are not familiar with Python, you will be able to run the code and gain experience. If you are, you will be able to explore more on your own.