The Rise of Intelligent Machines

Published in

Mic.ai blog

9 min readMay 30, 2017

Initially, I planned this blog post to be out in May 2016, which is — yes — one year ago! The plan was to write about Machine Learning: what made it so popular, what tools and frameworks are available, etc. Rather than repeating what many people already wrote in thousands of blog posts, this will be an overview of our journey. It’s very likely that this route is not so unique, but let’s start at the very beginning.

The Artificial Intelligence journey for me didn’t begin in 2016 or 2015. It began when I chose an optional “Artificial intelligence methods” course at a university in late 90s early 2000s. It was prepared by and taught by Šarūnas Raudys — a professor of Computer Science at Vilnius University. He’s specialising in Artificial Intelligence, and Machine Learning and his publications (sorted by citations) are available here.

If I’m honest, I don’t remember much from this particular course. However, I remember an example of the application of AI that professor shared with the class of students. He worked on a project with a company in Japan which took sound coming from washing machines and predicted if the device is about to fail before it fails. And this was a truly “wow” moment that got stuck in my head forever.

Fast forward 15 years or so, and we end up in 2015 when nearly everyone in technology is talking about Artificial Intelligence and Machine Learning. In fact, I’m pretty sure a lot has happened in between 2000 and 2015, it’s just that in November 2015, I decided to take a deep dive into Machine Learning and see what this buzz is all about.

I always was and still am a huge fan of Stanford University, and when after a quick check I saw a course from this university available on Coursera, taught by Andrew Ng, it was a no-brainer. After three months of learning (sometimes after work, at times during weekends or while waiting for the flight) I was ready to roll the sleeves and get my hands dirty.

**Taught by:** Andrew Ng, Associate Professor, Stanford University; Chief Scientist, Baidu; Chairman and Co-founder, Coursera

I enjoyed this course a lot, and Andrew Ng gave an excellent start. Following Andrew on Twitter helped to create a network of other people to follow who played a significant role in Machine Learning.

So let’s have a look at all the people who contributed most in helping to know the world of ML.

Šarūnas Raudys, Professor of Computer Science, Vilnius University. I put him at the top of the list as he was the first person to introduce artificial intelligence to me.
Andrew Ng is a Chinese American computer scientist. He is the former chief scientist at Baidu, where he led the company’s Artificial Intelligence Group. I added Andrew Ng to this list for creating the “Machine Learning” course at Stanford University and making it available for free via Coursera.
Andrej Karpathy is a Research Scientist at OpenAI working on Deep Learning in Computer Vision, Generative Modeling and Reinforcement Learning. Andrej is on this list for designing and teaching a new Stanford class on Convolutional Neural Networks for Visual Recognition (CS231n). The class was the first Deep Learning course offering at Stanford and has grown from 150 enrolled in 2015 to 330 students in 2016, and 750 students in 2017.
Yann LeCun is a computer scientist with contributions in machine learning, computer vision, mobile robotics and computational neuroscience. He is well known for his work on optical character recognition and computer vision using convolutional neural networks (CNN) and is a founding father of convolutional nets. Yann is on this list for overall impact to the community and contributions to the science of machine learning, mobile robotics and computational neuroscience. In 2016 Yann LeCun received The Lovie Lifetime Achievement Award. There’s also a video of Yann accepting an award available on Youtube.
Jeff Dean is an American computer scientist and software engineer. He is currently a Google Senior Fellow in the Systems and Infrastructure Group. Starting in 2011, Google Brain built DistBelief as a proprietary, machine learning system, based on deep learning neural networks. Its use grew rapidly across various Alphabet companies in both research and commercial applications. Google assigned different computer scientists, including Jeff Dean, to simplify and refactor the codebase of DistBelief into a faster, more robust application-grade library, which became TensorFlow. Jeff is on this list mainly for his impact on making TensorFlow open source and widely adopted. Here are the videos from the TensorFlow Dev Summit 2017. It was also an honour to see Jeff Dean speaking live and giving “Building Machine Learning Systems that Understand” presentation in ACM SIGMOD/PODS conference in June 2016 in San Francisco.
Denny Britz is currently a resident on the Google Brain team. He studied Computer Science at Stanford University, where he worked on probabilistic models for NLP, and UC Berkeley, where he worked on a cluster-computing framework called Spark. Denny is on this list for his fantastic blog site — WildML.
François Chollet is the author of Keras, one of the most widely used libraries for deep learning in Python. He has been working with deep neural networks since 2012. François is currently doing deep learning research at Google. I put François on this list based on his work on Keras. You can also watch his “Integrating Keras & TensorFlow: The Keras workflow expanded” presentation from TensorFlow Dev Summit 2017.
Clement Farabet is VP of AI Infrastructure at NVIDIA. Before joining NVIDIA, he was running a team at Twitter called Cortex Core. The team focuses on building a high-leverage Machine Learning/Deep Learning platform to power every aspect of the Twitter product (recommendation systems, search, timeline ranking, etc.). Clement is an active member of AI/ML community, and he also follows our project on Twitter!

So these people listed above had the biggest impact. However, there are many others that I probably forgot to mention who no doubt contributed a lot to AI/ML research and deserved a lot of respect.

The parallel activity next to establishing a network of influent people in AI & ML was to find articles and blog posts related to what we’re trying to do — make sense of the environmental sound from public places. As a very natural and a very practical task we decided to build a machine learning model for musical instrument recognition.

There was a choice of using Caffe, Theano, TensorFlow, Torch and other deep learning libraries. I also remember reading Kenneth Tran’s (a Research Engineer in the Deep Learning Technology Center, Microsoft Research) overview of all frameworks on GitHub. It is a great resource! TensorFlow back then was only recently made open source, although the feedback in the community was very convincing already. Also, people had high hopes as Google (a big company) was behind this project. So the decision was — TensorFlow is the way to go!

The blog post “Classifying Bees With Google TensorFlow” by Philippe Dagher was an excellent way to touch code and try TensorFlow out. The next thing I found was Tom Herold’s (from scalableminds.com) GitHub repository and played with his BerlinNet neural network architecture. Some things just worked, some things almost worked. I wanted to train a model on a large machine with GPU and then run predictions on a much smaller computer — TensorFlow on Intel Edison. If you wonder why Intel Edison — have a look at our previous blog post: “Building Our First Smart Microphone Prototype”.

The blog post from John Ramey was also a great help: “Installing TensorFlow on an AWS EC2 Instance with GPU Support”. Nevertheless, after receiving $628.93 (around £490) monthly bill from Amazon AWS, it was clear that it would be more cost effective to buy a computer with GPU card. The total cost of PC components (almost identical specs to p2.xlarge instance) was just above £2000, and the electricity consumption is around £8 per month.

Intel Skylake Xeon E3–1225 V5 Quad Core Server CPU, 64GB RAM, Titan X GPU card, two 2TB SATA disks

As a next step, we downloaded some videos from the internet. Removed the video part and kept the audio part, we then labelled them, converted to spectrograms and trained using AlexNet CNN architecture using TensorFlow. It wasn’t perfect, but most importantly it worked!

Illustrating the principle of musical instrument recognition using neural networks

Demonstrating musical instrument recognition

Next use case where we used Machine Learning was to classify photos from various places: bars, restaurants, cafes into suitable and not suitable (“bad” or “good”). The approach we took was to build two models. The first step was to generate image caption (a CNN followed by RNN) and then use CNN for classifying text (the caption). Initially, we tried NeuralTalk2 and a little bit later switched to “Show and Tell: A Neural Image Caption Generator” released by Google. The second step was to take image caption text and classify as “Good” or “Bad” using CNN. We used the example Denny Britz shared in his WildML site — “Implementing a CNN for Text Classification in TensorFlow.”

Piotr Mirowski at London Machine Learning Meetup, 7th of November 2016

Piotr Mirowski (Research Scientist at Google DeepMind) gave a presentation in London Machine Learning Meetup in November last year where he covered few topics including how image captioning works. You can download his presentation slides here.

The diagram below explains how we get from the image to the answer if the picture is suitable or not. You can see few screenshots of UI bellow the chart.

The “Caption” column displays the caption of an image and “Class Auto” column contains an automatically generated label.

One image got labelled as “Good” and other three as “Bad”

So what are we working on now and what are the next steps? As I mentioned at the beginning of this post, we started to use ML a year ago, and we learned a lot about the technology itself and also read articles and blog posts by other people working on the similar set of problems. While working on the first model for automatic musical instrument classification, we looked at “Automatic Instrument Recognition in Polyphonic Music using Convolutional Neural Networks” paper by Peter Li, Jiyuan Qian, and Tian Wang. Later, after we already build a working model, we found a great post by Aaqib Saeed — “Urban Sound Classification”. It consists of part one and part two.

Another fascinating post was by Eugenio Culurciello. It is a comparison of convolutional neural network architectures.

AlexNet vs. Inception-v3 and other CNN architectures

The actual post on Neural Network architectures can by found here.

So the plan is to re-train the model and use “Inception-V3” or “Inception-V4” architecture by using Keras (which we didn’t use building our first model) and TensorFlow as a backend. Also, when we trained the first model, we didn’t use TensorFlow Serving. Now we have the experience. TensorFlow Serving is a flexible, high-performance serving system for ML models, designed for production environments. And another bit of good news is that we now have access to the much larger dataset. A few months ago Google released “A large-scale dataset of manually annotated audio events”. They also published an article related to this dataset: “Audio Set: An ontology and human-labeled dataset for audio events”. So by choosing a better performing neural network architecture and a larger dataset, we have high hopes for improving the accuracy of predictions. We also plan to train for musical genre recognition and get a bit more information on what the vibe is like in bars, restaurants and cafes.

Finally, thanks for reading and if you found this blog post useful or would like to leave a comment, share or start a discussion, you’re very welcome to do so. Happy machine learning!

The Rise of Intelligent Machines

Written by Tomas Ramanauskas