Computer Vision moving image

Computer vision – how video will be the new text

Christin Löhr has combined her obsession with artificial intelligence & her job at movingimage by digging deep into computer vision. In this interview the Chief Product Owner explains how computer vision will change the way we communicate and how Google will have to adapt to that.

Christin, when scrolling through your Twitter feed one realizes pretty quickly that you are Tech obsessed. A lot of visionary ideas but day to day stuff as well: For example I can integrate videos directly into my Outlook mails?

Obsessed is actually a pretty accurate description. Sometimes I feel overwhelmed from all the progress we are making in science and technology. I mean, I still remember having to use a modem to connect to the internet and now even my home scale has wifi and is connected to my health app. It’s crazy when you think about it. But this is what fascinates me most- how we can use the progress we’re making in these fields to make our everyday lives easier and more fun.
That was actually the main motivation behind the Outlook video Add-In. I’ve been working on this project for the last couple of months, after growing really frustrated with the fact, that it is near to impossible to send an email with a video attachment in the enterprise environment. You run in either size or security restrictions. But I love to communicate with videos. Not only can you explain complex problems much easier using screencapturing, it’s also much faster and thus more efficient. Also, it’s easier for me to detect people’s emotions when I see their face, something that can be challenging for me when I talk to someone on the phone.
Enabling a software program that you use every day to record and share videos to simplfy your daily business felt like a logical consequence of that. Luckily, a lot of our customers feel the same way so that I could actually bring that idea to life. With the add-in you can create videos, give them basic metadata, upload them to the cloud and share a link to a landingpage without having to leave your familiar environment.

You are an expert on computer vision. What does this field include? Where do we use it already? And where will it be used in the future?

I wouldn’t go as far as to call me an expert- I’ve just always been curious. I was really into science fiction growing up and always fascinated by robots. So, naturally, I’ve ended up being obsessed with artificial intelligence. I work for a software company that provides secure enterprise video platform services and I’ve tried to find a way to combine those two passions. That’s when I came across computer vision.
Computer vision is – in short – the area in machine learning, that tries to replicate the human visual system. It is amazing what our brain can do based on the data it gets from our optical nerves: recognizing familiar faces, detecting the emotional state a person is in or identifiying objects and classify them. And although Neuroscience still doesn’t know excactly how our brain is doing that, we are able to train machines to analyze images or video files using face detection, pattern recognition and emotion recognition.
A couple of months ago I’ve come across a Watson based humanoid robot, that is used to sell a delivery service product. It uses emotion recognition to “see” when a potential customer is bored or interested and adjusts its sales strategy accordingly. How cool is that?
Another example of where we already use computer vision is the automated passport control at airports. Both your biometric passport and your face are being scanned and the key features are being compared to each other in realtime.
One of my favorite use cases is a Google glass app, though, that helps people who suffer from autism to identify human emotions and thus making communication much easier for them.
In the future robots will do tasks that are way to dangerous for humans, like disaster relief or mining. They will be able to get medicine to remote places way faster than humans ever could and will help remind Alzheimer’s patients to take their medicine. And for all of that they need the ability to see (gather data and make sense of it), they need computer vision.

You’re saying computer vision will also change the way we communicate. What do you mean by that?

Well, there has already been a change in the way we communicate. Look at your Facebook or Twitter news feed or open your Instagram app- text has almost entirely been replaced by pictures and videos. Video is everywhere! Companies are using video no longer only for marketing purposes but also for internal communication, training and social intranets. They ask their customers to participate in “user generated content campaigns”, hence accumulating a huge amount of data every single day, data that has to be managed. Soon it will be too much for us to handle manually. Computer vision will automate processes and trigger certain workflows. Keywords can automatically be generated using face recognition or object detection and the video can be processed based on that, allowing us to produce even more data.
In fact, automation will be one of the key benefits from computer vision. You won’t have to go to an embassy anymore to register for election thanks to face recognition. As mentioned earlier, people who have difficulties with emotion detection themselves will be able to join conversations. Sales pitches will be done by cute little robots, that change their strategy depending on your facial expression (something I am looking forward to and fear at the same time because I will probably end up bying a lot of stuff that I don’t need).

If everything will be more video-intensive, we will use more data. Will mobile internet and its bandwith have to adapt to this shift first?

Yes and no. Yes, there will be more data that needs to be transferred and the mobile internet will have to adapt to that. However, there is also another approach to solving this problem: If your bandwith is limited you can reduce the amount of data that is being transferred in the first place. HLS and MPEG Dash for example have already managed to do so- instead of the whole file being transferred at once, videos are segmented. The segments are also available in different resolutions so that the player can load only the segment it needs at that time and depending on the available bandwith choose the appropriate quality level. It’s called adaptive bitrate streaming and in contrast to progressive download (the technology that was used before) it makes sure, that you can watch the newest Stranger Things Episode on Netflix bufferfree and bandwith-friendly even in the S-Bahn.

In an age where a top Google ranking for an important keyword is like a lottery jackpot the written word is often still more powerful than video. Do you think this will change?

Definitely! Google’s search algorithm is already using pattern recognition for its image search ranking. In the past is was enough to give an image enough metainformation to convince Google that a picture of a duck is actually a picture of, let’s say, a Porsche. Now with the help of computer vision Google can’t be fooled that easy any more. It’s only a matter of time until the same applies to video files as well. Imagine you could search for keywords that don’t have to be in the video’s metadata but are actually part of the video’s content. All the treasures you will find deeply hidden in Youtube because the creators didn’t care to add a proper title or description. One factor of the ranking criteria in Google could then be how often the keyword appears IN the video and not how many times you can fit it into the video’s metadata or website’s content and source code.
Funny enough, someone told me a few weeks ago that his children are no longer using Google for research but YouTube. Soon enough video will be the new text and Google will have to adapt to that.

WomenInTech Interviews Ada Lovelace 2017

This article is part of Ada’s Heiresses 2017, an interview collection with speakers of Ada Lovelace Festival 2017. Are you interested in more #womenintech content? You can download Ada’s Heiresses here for free.

Christin Löhr Computer VisionChristin Löhr can be described in one word – visionary. As the Chief Product Owner at movingimage, Löhr has been the driving force behind their groundbreaking Live Webcast and a strong presence in the Leadership team. Her background in digital audio/video grew exponentially whilst studying Media Technology at the Technische Universität Ilmenau, where she developed workflow engines for media content management systems. In 2014 Christin joined movingimage as a Project Manager, spearheading the development of innovative video solutions for enterprise-grade customers. As an active member of both the STEMinist community and Geekettes network, Löhr embodies the new wave of women in tech reshaping a landscape where women have traditionally been underrepresented – until now.