2016 MATLAB & Simulink 技術高峰會-鈦思科技

C1 / 深度學習揭密：影像/影片篇
Demystifying Deep Learning: Image/Video Focus

Abhijit Bhattacharjee / MathWorks Inc.

如果你是深度學習的入門者，想要知道怎麼把深度學習應用到日常工作中嗎?目前深度學習在許多類似人類行為的任務中，可以達到幾乎是頂級工藝的準確度，比如像是辨識畫面中物體名稱，或者在環境中辨認出光跡等等。

深度學習主要包含了：組裝大量的資料集、建立一個類神經網路、訓練、視覺化、評估不同的模型、使用特定的硬體(通常需要特殊程式編撰知識)等任務。這些任務通常因為其背後的複雜理論變得更有挑戰性。

在本段演講，我們將展示MATLAB簡化上述任務，並介紹可以減低您進行低階編程語言的新功能。藉此，我們將解碼深度學習領域的實用知識，再來建立並訓練神經網路來進行像是辨識手寫文字、分類畫面中的食物、分類訊號、及找出城市環境中的可駕駛區域等任務。

管理龐大的影像資料集
網路的視覺化，並從具有黑盒子特性的深度網路中獲得洞見
透過拖和放(drag-and-drop)介面，從草稿建立神經網路
執行圖片和訊號的分類，和在圖片上進行畫素層級(pixel-level)的語意分割(semantic segmentation)
從GoogLeNet和ResNet等網路匯入訓練資料集
從TensorFlow Keras、Caffe、以及ONNX Model 格式匯入模型
透過在電腦叢集的平行運算加速網路訓練
把需要大量人工作業的真實地面標記（ground truth labeling）變成自動化的工作
自動為嵌入式目標硬體產生開源程式碼

Are you new to deep learning and want to learn how to use it in your work? Deep learning can achieve state-of-the-art accuracy in many humanlike tasks such as naming objects in a scene or recognizing optimal paths in an environment.

The main tasks are to assemble large data sets, create a neural network, to train, visualize, and evaluate different models, using specialized hardware - often requiring unique programming knowledge. These tasks are frequently even more challenging because of the complex theory behind them.

In this session, we’ll demonstrate new MATLAB features that simplify these tasks and eliminate the low-level programming. In doing so, we’ll decipher practical knowledge of the domain of deep learning. We’ll build and train neural networks that recognize handwriting, classify food in a scene, classify signals, and figure out the drivable area in a city environment.

Manage extremely large sets of images
Visualize networks and gain insight into the black box nature of deep networks
Build networks from scratch with a drag-and-drop interface
Perform classification tasks on image and signals, and pixel-level semantic segmentation on images
Import training data sets from networks such as GoogLeNet and ResNet
Import models from TensorFlow Keras, Caffe, and the ONNX Model format
Speed up network training with parallel computing on a cluster
Automate manual effort required to label ground truth
Automatically generate source code for embedded targets