AI IoT Workshop: Tools & Products

Agenda

https://software.intel.com/en-us/articles/distributed-training-of-deep-networks-on-amazon-web-services-aws

Models

Platforms

 

Tools

Products

 

Algorithms

IDEs

Languages

Exercise:

Distributed Training of Deep Networks on Amazon Web Services* (AWS)

The AI Market

of the 7%, 60% ~ " classical machine learning " and 40% ~ "deep learning."

7% of server sales in 2016 were for AI but it is

the "fastest-growing data center workload."

 fyi: 97% (of the classical machine learning) used Intel Xeon processors to handle the computations

The IoT Market

Cortex-M Processors

M0 (basic): low cost, power & area

M3, M4, M33: middle tier apps

M7: embedded applications

M23, M33: security

*M4, M7, M33 process DSP algorithms such as sensor fusion, motor control and power management

Text

Not quite ML ready

R to merge with M series

Why Bother?

https://devblogs.nvidia.com/parallelforall/digits-deep-learning-gpu-training-system/

Do it for the

https://devblogs.nvidia.com/parallelforall/digits-deep-learning-gpu-training-system/

Use Cases

image and video classification
computer vision
speech recognition
natural language processing
audio recognition
et al

Tools

Frameworks

incorporate GPU acceleration

Torch

  • a scientific computing framework that puts GPUs first
  • interface to C, via LuaJIT  
  • embeddable, with ports to iOS, Android and FPGA backends

Caffe

  • define and optimise your models without hard-coding
  • switch between CPU and GPU by setting a single flag
  • train on a GPU machine then deploy to mobile devices.
  • process over 60M images per day  with a single NVIDIA K40 GPU*

Theano

  • a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently
  • define, optimize, and evaluate mathematical expressions, especially ones with multi-dimensional arrays
  • surpasses C on a CPU by many orders of magnitude by taking advantage of recent GPUs

http://deeplearning.net/software/theano/tutorial/

Products: Chips

  1. TLM (Transaction-level modeling)   separates communication among modules from the functional units/ architecture.
  2. PCI - modem, sound card
  3. UART -  configures the data format and transmission speeds in an asynchronous and serial fashion
  4. I2C (Inter-Integrated Circuit) serial bus that  attaches low-speed peripheral ICs  to processors and microcontrollers
  5. CAN - peer-to-peer network. no master that controls when individual nodes have access to read and write data 
  6. GPIO (Generic Input Output Pins) require you to write settings into a register to define how each pin is used
  7. SDIO ( Secure Digital Input Output) - a secure card for inputs/outputs
  8. SPI (Serial Peripheral Interface) - a bus for short distance communication embedded systems in a synchronous and   serial fashion

ARM

Cortex A8(2009) to Cortex A7(2013)

big.LITTLE by ARM (2011)

1 low + 1 high powered chip

extends  battery life

no performance reduction

big

but cores are limited and fixed to a specific frequency and power consumption!

Power down big cores to save battery life

.LITTLE

ARM DynamIQ (2017)

ARM DynamIQ

 each of the 8 cores can have different performance and power characteristics

the CPUs share the memory sub-system

DynamIQ Benefits

50x faster responsiveness for ML and AI applications than big.LITTLE

ARM stressed that the number of chips ...shipped...will be simpler ARM chips (low-power Cortex-R and Cortext-M designs, like those used in Fitbits)

HOWEVER

 “I would have thought within a couple of years pretty much all smartphones will be using Dynamiq,” says Ronco.

https://www.theverge.com/2017/3/21/14998100/arm-new-dynamiq-microarchitecture-ai-chip-design

Products: Accelerators

can be used to to programme the FPGA

 

e.g., convert hardware design language (HDL) files into a configuration bitstream

 

http://web.mit.edu/6.111/www/s2004/NEWKIT/ise.shtml

Intel FPGA

(Altera)

For IoT gateways

  • flexible protocol switching 
  • secure remote in-field upgrades
  • the industry’s "highest performance-per-watt and performance-per-$"

 

Products: Programming Languages

VHDL

  • V ery (High Speed Integrated Circuit) H ardware D escription L anguage
  • created by the US Department of Defense
  • many programming constructs - verbose
  •  strongly typed language (predefined data types such as integer, character)
  • cannot accurately represent hardware
  • popular with FPGA designers, low-level modeling not required
  • better than VHDL for non-synthesizable portion of your code
  • separates the interface from the contract (i.e. "VHDL entities")
  • supports custom types
  • readability is good

 

Verilog

  • good at hardware modeling
  • lacks higher level (programming) constructs
  • preferred by ASIC designers
  • not as verbose as VHDL
  • similar to C
  • better performance than VHDL
  • prevalent in the Americas or Asia
  • good for open-source projects that rely on CAD
  • loosely typed language
  • risk of warnings and error in linting/synthesis process

 

System Verilog

  • adds some higher level constructs to Verilog
  • doesn't extend the hardware modeling capabilities
  • borrowed a lot from VHDL
  • predominant language for writing testbenches

 

EDA Playground

edit, save, simulate, synthesise SystemVerilog, Verilog, VHDL and other HDLs from your web browser

Models

Classification v Regression

  1. Binary Classification Model
  2. Multiclass Classification Model
  3. Regression Model

binary classification problems

  • predict a binary outcome

  • one of two possible classes, true or false

  • algorithm such as logistic regression

Logistic Regression

Examples

  • "Is this email spam or not spam?"

  • "Will the customer buy this product?"

  • "Is this product a book or a farm animal?"

  • "Is this review written by a customer or a robot?"

Multiclass Classification Model

  • allows you to generate predictions for multiple classes

  • predict one of more than two outcomes

  • algorithm such as multinomial logistic regression

Multinomial Logistic Regression

Examples

"Is this product a book, movie, or clothing?"

"Is this movie a romantic comedy, documentary, or thriller?"

"Which category of products is most interesting to this customer?"

Regression Model

  • predict a numeric value

  • algorithm such as linear regression

Linear Regression

Examples

  • "What will the temperature be in Seattle tomorrow?"

  • "For this product, how many units will sell?"

  • "What price will this house sell for?"

More Algorithms

K-Nearest Neighbor (KNN)

K-Nearest Neighbor (KNN)

  1. Find your input data
  2. Determine your output data
  3. Find neighbouring Xs that are similar to your current X
  4. Introduce a new datapoint that close neighbors liked
  5. Choose an algorithm; e.g., k-nearest neighbors
  6. Define proximity; e.g., via  feature vectors   [1,1,-1,0,0]. Note: feature vectors don't care which algorithm you're applying
  7. Calculate the distance to all other Xs
  8. Introduce another new data point
  9. Calculate the score for those data points among the k nearest peers. The highest score is the most likely answer.

Radial Basis Function (RBF(N))

  • classification by measuring the input’s similarity to examples from the training set
  • the N stands for a particular type of neural network
  • “prototypes” represent examples from your training set
  • each neuron:
    • stores a “prototype” vector
    • computes the Euclidean distance between the input and its prototype
    • so, if your input is more like your class A prototypes than your class B prototypes, it  classifies your input as class A

http://mccormickml.com/2013/08/15/radial-basis-function-network-rbfn-tutorial/

Questions for Reflection

a.) What makes some tools and products better, faster or cheaper than others?

b.) Are regression and classification the most important problems for tools and products to solve?

c.) Which other tools and products are good for machine learning engineering and/or data science?

Exercise

https://www.tensorflow.org/get_started/tflearn

https://www.tensorflow.org/get_started/tflearn

Train the Model

  1. Input your training datasources

    1. UCI Machine Learning Repository

    2. historical data (csv)

    3. prediction file (csv)

  2. Upload the datasource to your own AWS S3 bucket
  3. Open the file and look at the attributes in the header row.
  4. Name the data attribute that contains the target to be predicted
  5. Enter instructions for data transformations

  6. Training parameters to control the learning algorithm

 

https://www.tensorflow.org/get_started/tflearn

Create DataSources

  1. create two datasources,

  2. one for training the model and

  3. one for evaluating the model

 

https://www.tensorflow.org/get_started/tflearn

Parameters for Training

  1. Maximum model size

  2. Maximum number of passes over training data

  3. Shuffle type

  4. Regularization type

  5. Regularization amount

https://www.tensorflow.org/get_started/tflearn

Evaluation settings

  1. evaluate the predictive quality of the ML model

https://www.tensorflow.org/get_started/tflearn

Recipes

  1. evaluate the predictive quality of the ML model

https://www.tensorflow.org/get_started/tflearn

Review

  1. Review the ML Model's Predictive Performance

  2. Set a Score Threshold

 

https://www.tensorflow.org/get_started/tflearn

Set a Score Threshold

  1. Review the ML Model's Predictive Performance

 

https://www.tensorflow.org/get_started/tflearn

Generate Predictions

  1. Input training datasource

  2. Name of the data attribute that contains the target to be predicted

  3. Required data transformation instructions

  4. Training parameters to control the learning algorithm

 

https://www.tensorflow.org/get_started/tflearn

Clean Up

  1. Input training datasource

  2. Name of the data attribute that contains the target to be predicted

  3. Required data transformation instructions

  4. Training parameters to control the learning algorithm

 

https://www.tensorflow.org/get_started/tflearn

Train the Model

  1. Input your training datasources

    1. UCI Machine Learning Repository

    2. historical data (csv)

    3. prediction file (csv)

  2. Upload the datasource to your own AWS S3 bucket
  3. Open the file and look at the attributes in the header row.
  4. Name the data attribute that contains the target to be predicted
  5. Enter instructions for data transformations

  6. Training parameters to control the learning algorithm

 

https://www.tensorflow.org/get_started/tflearn

https://www.ibm.com/ms-en/marketplace/engineering-solutions-on-cloud/details#product-header-top

http://electronicdesign.com/fpgas/fpgas-alternative-cloud-computing

https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-sequence-learning/

https://web.stanford.edu/class/cs224n/reports/2761183.pdf

https://stevenmiller888.github.io/mind-how-to-build-a-neural-network/

https://www.nextplatform.com/2017/03/21/can-fpgas-beat-gpus-accelerating-next-generation-deep-learning/

https://chatbotsmagazine.com/unsupervised-deep-learning-for-vertical-conversational-chatbots-c66f21b1e0f

REFERENCES

End of Part II

Extras

Platforms

Neon, CaffeonSpark

Wise.io

Neon

CaffeonSpark

 

insert reference

Nervana

Caffe

CaffeonSpark

 

insert reference

Berkeley Vision & Learning Center

Aliyun

CNTK

 

insert reference

Alibaba

Watson Analytics

CNTK


insert reference

BigML

FPGA

Clock Frequency??

  • maximum clock speed (fmax) depends on the device
  • also depends on how your code is written
  • division and loops with loop dependencies slow the speed

 

Synthesizable??

  • means that your code can actually be compiled into hardware (gates and flipflops)
  • unlike software languages where the entire language is compilable in all situations,
  • HDLs like Verilog and VHDL only have a subset of the language that is synthesizable
  • other code is for simulation and testing only

 

Hardware Logic??

  • designs can become so large they are not practical
  • expensive operations prevent all code from fitting on your FPGA
  • division takes more than 25% of the FPGA
  • beginner's rule of thumb: code should not exceed more than 5% of the FPGAs resources

 

Steps FPGA

  1. plan your design
  2. draw a rough diagram of your hardware
  3. write your code
  4. synthesize your code
  5. check for functional correctness
  6. use a simulator with a testbench

 

About TensorFlow

TensorFlow’s high-level machine learning API (tf.contrib.learn) makes it easy to configure, train, and evaluate a variety of machine learning models. In this tutorial, you’ll use tf.contrib.learn to construct a neural network classifier and train it on the Iris data set to predict flower species based on sepal/petal geometry. 

https://www.tensorflow.org/get_started/tflearn

Steps: tf.contrib.learn

You'll write code to perform the following five steps:

  1. Load CSVs containing Iris training/test data into a TensorFlow Dataset
  2. Construct a neural network classifier
  3. Fit the model using the training data
  4. Evaluate the accuracy of the model
  5. Classify new samples

 

NOTE: TensorFlow should e installed onto your machine before getting started with this tutorial.

A typical neuron" takes in a set of inputs, sums them together, takes some function of them, and passes the output through a weighted connection to another neuron.

https://people.orie.cornell.edu/davidr/or474/deveaux.pdf

Regression is "multivariable nonlinear function approximation"

 

see multilayer feedforward network

https://people.orie.cornell.edu/davidr/or474/deveaux.pdf

estimating the weights in a neural network is a nonlinear least squares problem

https://people.orie.cornell.edu/davidr/or474/deveaux.pdf