In the summer of 2013, as Matthew Zeiler was close to finishing a Ph.D. in artificial intelligence at New York University, he seemed to have every tech giant in the palm of his hand. Zeiler had left an internship with a Google AI group a few weeks earlier when he got a call from an unknown number while he was running along the Hudson River. It was Alan Eustace, then a senior vice president of engineering at Google, who had heard about Zeiler’s AI chops. Eustace wanted Zeiler to join permanently. To entice him, Eustace told him he would make an offer that was among the highest Google had ever made to a new graduate, Zeiler recalls. Zeiler won’t say how much he was offered, and Google declined to comment. But offers for top recruits with specific expertise can add up to several millions of dollars over four years, according to people with knowledge of the matter. Regardless, Google’s offer kicked off a bidding war for Zeiler and his know-how in deep learning, the vaunted branch of AI that’s driving major breakthroughs in computing.
Within days, Zeiler received a bigger offer from Microsoft, which Google promptly matched. Apple also wanted to chat, and when Zeiler flew out to Silicon Valley, Mark Zuckerberg personally sought to persuade him to join a new AI research group at Facebook. Zeiler respectfully turned them all down, deciding instead to start a company with an audacious goal: to compete with the giants that were courting him. “It was a crazy period,” Zeiler remembers. “I had this low-risk opportunity of joining a tech giant versus doing my own startup.” Zeiler says he knew that some of his algorithms worked better than Google’s on certain AI problems. “I knew I had to follow my gut,” he says.
Four years later, Zeiler’s New York City-based startup, Clarifai, is widely seen as one of the most promising in the crowded, buzzy field of machine learning. Clarifai offers image- and video-recognition tools for developers that rival those from Google, Microsoft and others. Much as Stripe and Twilio make it easy for programmers to tap into payments and communications capabilities, Clarifai gives its customers access to cutting-edge AI techniques that would cost millions to replicate. Companies like Unilever, BuzzFeed, Ubisoft and Staples U.K., as well as makers of medical devices and drones, use Clarifai to automatically analyze millions of images and videos. One of the company’s 100 or so customers, i-Nside, makes a smartphone accessory for imaging the inside of an eardrum and diagnosing ear diseases. Revenue, while still small, is expected to reach $10 million as early as next year, according to people close to the company.
That Clarifai has made it this far is, in and of itself, remarkable. In the past few years, AI–in particular a form of it called deep learning or deep neural networks–has emerged as the Next Big Thing in tech. Deep-learning techniques work loosely like the brain, with layers of “neurons” connected with “synapses.” The techniques are leading to substantial breakthroughs in areas like image and speech recognition, which in turn are ushering in advances in everything from medicine to self-driving cars to robotics.
But there’s a problem: Amid the scramble for talent, the richest companies in tech have consumed entire university departments and acquired just about every AI startup they could get their hands on. Google has been the hungriest, with at least 11 AI-related acquisitions, spending upwards of $1 billion for just two of those, DeepMind and api.ai. Nearly all the upstarts that competed with Clarifai have been bought: Amazon acquired Orbeus; Salesforce got MetaMind; IBM snapped up AlchemyAPI. When it comes to image recognition, Clarifai is perhaps the only one left that can compete with Amazon, Google, IBM and Microsoft, all of which offer AI image-recognition tools to their cloud-computing customers. Clarifai has already rebuffed several multimillion-dollar acquisition offers, according to an early employee. Zeiler says he is determined to keep the company independent.
Clarifai has none of the might or reach of its rivals, but Zeiler insists, convincingly, that playing Switzerland in a global AI war is a valuable asset. Many large companies that want to incorporate AI into their products are fearful of handing over their data to giants like Google and Amazon. Photobucket is a case in point. After assessing competing tools from Amazon, Google and IBM, the image- and video-hosting service became one of Clarifai’s largest customers in terms of image volume. “Any time you’re dealing with Google, you have to wonder if they’re taking your data and training their own system,” says Mike Knowles, senior infrastructure developer at Photobucket. With its Photos app, Google competes with Photobucket. Zeiler says many other potential customers are at risk of colliding with the ever-expanding ambitions of tech’s biggest companies. “They open new divisions that compete with their customers,” Zeiler says. “That’s what we don’t do.”
At 30, Zeiler, who grew up in Beausejour, a small town in Canada some 40 miles northeast of Winnipeg, seems an unlikely challenger to tech’s powerhouses. With slicked-back hair that he cuts only a couple of times a year, he retains the disheveled air of a college student.
But Zeiler’s obsession with AI put him on a path to be mentored by some of the field’s biggest luminaries. Oddly enough, his interest in the field started with a video of a flickering flame that he saw while an undergraduate at the University of Toronto. The video, shown to him by a grad student, looked startlingly real, yet it was generated by a computer using an AI technique. Zeiler had just learned the basics of computer programming but hadn’t taken to it. The flame represented something different. No human had explicitly programmed it to move around in predetermined ways. Instead, a computer had been fed video data, deduced a pattern and generated the video on its own. “I was completely blown away,” Zeiler says. “It was a whole new way to get computers to do what you wanted. I had to learn more.”
Graham Taylor, the Ph.D. candidate who had shown him the video, brought Zeiler into a research lab that was run by Geoffrey Hinton, widely considered one of the godfathers of neural networks. Taylor liked the ambitious yet amiable Zeiler. “He was smart but wasn’t a jerk,” Taylor says. In Hinton’s lab Zeiler worked on using AI techniques to track pigeons’ mating rituals, resulting in his first paper, “Learning Pigeon Behaviour Using Binary Latent Variables.” He graduated at the top of his class.
Zeiler then headed to NYU for a Ph.D., following Taylor, who was a postdoctoral student there. Taylor worked under Yann LeCun, another pioneer in deep learning, who now heads Facebook’s AI efforts. Eventually, Zeiler did two internships at Google and worked for Jeff Dean, the head of a then-new deep-learning research group called Google Brain. Hinton, who now works at Google and retains a position at the university, was part of that 20-person AI skunkworks. (Google Brain has since grown into one of the most high-profile and vital groups within Google.)
Zeiler founded Clarifai in November 2013 after his second internship, just as he was finishing his Ph.D. The company got off to an auspicious start. Zeiler tested his image-recognition algorithms in a highly regarded contest called ImageNet. The 2012 ImageNet had shaken the AI world when a team from Hinton’s lab in Toronto, using deep-learning techniques, cracked a huge barrier in accuracy: Its error rate was 15%, far better than the 25% attained with earlier AI approaches. In 2013, Zeiler beat out the competition with an error rate of just 12%.
For the next few months, Zeiler worked alone, pushing the limits of his neural networks and rewriting the code to turn it into a commercially viable product. He installed four servers in his apartment to crawl the Web for images to train his algorithms. At one point, his apartment got so hot that he had to leave his windows open in the middle of winter. By April 2014, Zeiler hired a second employee, and the two moved the servers to a New Jersey data center, where Clarifai continues to expand. In October 2014, he made the service available to developers. His first customer was a wedding lifestyle website called Style Me Pretty, which uses Clarifai to identify and categorize thousands of user-uploaded pictures and serves ads based on what’s in an image.
In 2015, Clarifai landed its first sizable investment: a $10 million round led by Union Square Ventures. The corporate coinvestors, who clearly understood the potential of what Zeiler was building, included Qualcomm, AI chip specialist Nvidia and, interestingly, Google’s venture arm. The following year, in a round led by Menlo Ventures, Clarifai raised another $30 million, at a valuation of $120 million, according to PitchBook. “Tech giants are working on similar products, but they don’t wake up every day living and dying on building the best image-recognition service,” Menlo partner Matt Murphy says. Clarifai now has 55 employees, including 10 dedicated to digging through the latest AI research so the company can stay current. Last year, it hired a veteran sales executive from Google’s enterprise unit as its chief customer officer.
A recent study by the consulting firm CapTech shows Clarifai remains competitive with, and in some cases outperforms, tech giants like Amazon, Google and Microsoft in image recognition. But finding and keeping AI talent to maintain that position–let alone expand into new areas like audio recognition and beyond–won’t be easy. In February, Clarifai scored a longtime Google AI researcher, Andrea Frome, as its head of research, but she abruptly departed after only four months. Frome declined to speak about her departure, and Zeiler says it was the result of differing priorities. Access to data–lots of it–to “train” algorithms is also an area where Clarifai is likely to find itself at a permanent disadvantage compared to its much-larger rivals.
Clarifai’s latest tool trains AI models on smartphones, not in the cloud, where most AI systems do the bulk of their computing. On a recent day, in a San Francisco hotel lobby, Zeiler pulls out his cracked iPhone 6. As he moves the camera, the phone identifies all the objects around it–chairs, a fireplace, people, cars, as well as a MacBook that Zeiler had just trained it to recognize. It’s a tantalizing demonstration of the potential for deep learning as it moves into the most important device in people’s lives. “We’re only seeing the tip of the iceberg of what these systems will be able to do,” Zeiler says.