The goal of humanizing software and systems is a long-standing one. Decades ago, smart minds were already arguing about what would make a computer behave more like a person. Turing, Minsky, Searle, and other thought leaders offered insights, tests, and (don’t run away, now) philosophies about the possibility of a computer’s psyche. Artificial intelligence grew with various programming languages, tools, systems, and robots. Nonetheless, a big divide remained between the academic theorizing and the mass-market applications. Now, Microsoft Cognitive Services might just change all of that.
Providing the Missing Link between Technology and Users
In “end-user-speak”, Microsoft Cognitive Services are tools that allow intelligent features to be added to software applications. Examples of such features are detection of emotion, recognition of speech and faces, and understanding of speech and language, as well as knowledge management and information search capabilities. In addition, Microsoft Cognitive Services are comprehensive. They span most, if not all, of the aspects we take for granted in interactions between human beings, and that we have so far found lacking in our interactions with computing machines. They align with the functioning of the human brain and help improve human decisions. A big step forward has been made with the packaging of the tools or algorithms, which require relatively little extra software to make them into practical, useful applications.
The Power of IT and AI Packaged for Practical Results
Underneath the layers that now seem so much more human, the services are still software-driven. IT systems are still built on bits and bytes, and Microsoft Cognitive Services belong to the realm of IT. In fact, the services are APIs or application programming interfaces, together with software development kits (SDKs) and services. Application developers may well find this reassuring. They can apply their knowledge and understanding of programming to these APIs to feed the services input and receive processed output afterwards. For example, the Microsoft Computer Vision API gives developers access to algorithms for processing images and returning results. Developers can upload an image or specify a URL for an image. They can then have the algorithm analyze the image in different ways, according to choices made at the input stage. Developers are also likely to breathe a sigh of relief on hearing that Microsoft Cognitive Services work across devices and platforms that include Android, iOS, and Windows for broad compatibility and ease of implementation.
Putting Microsoft Cognitive Services to Use
After the concepts and the principles, the next question is – what, specifically, can you do with these services? Possible use cases are numerous. These services can help analyze call center or customer service interactions with customers in real time, in order to detect different emotions. Face detection in security surveillance systems is another application, so is the analysis of information from body and car cameras. Speech translation between different languages can be done in real time, including between spoken and signed languages for the disabled.
Microsoft demonstrates its artificial intelligence capabilities in its Cortana application. Among other things, Cortana is part of the latest version of the Windows operating system via the Edge browser. It uses search API capabilities to retrieve information, offers a music recognition service, simulates rolling dice and coin flipping, and recognizes natural voice commands. The Cortana Analytics Suite builds on these capabilities with business-oriented functions for real-time sales recommendations, customer churn predictions, predictive maintenance, and more. In manufacturing, financial services, retail, and healthcare, for instance, Cortana can help enterprises to get closer to their customers in ways that are more proactive, helpful, and natural.
Building Blocks to Make All Kinds of Smart Apps
The Microsoft Cognitive Services building blocks can be organized into five categories: vision, speech, language, knowledge, and search. Here’s a brief list of the APIs in each category and their main functions:
- Computer Vision API. Returns actionable information from images.
- Emotion API. Recognizes emotions and allows personalization of interactions.
- Face API. Detects, analyzes, and identifies faces in photos, and organizes and tags them.
- Video API. Analyzes and processes videos within an app.
- Bing Speech API. Converts speech to and from text, and understands intentions.
- Custom Recognition Intelligent Service (CRIS). Provides fine tuning of speech recognition.
- Speaker Recognition API. Tells an app who is talking.
- Bing Spell Check API. Identifies and corrects spelling mistakes in an app.
- Language Understanding Intelligent Service (LUIS). Interprets commands from users.
- Linguistic Analysis API. Interprets (parses) complex text through language analysis.
- Text Analytics API. Identifies language, topics, key phrases, and sentiment from text.
- Web Language Model API. Uses language models developed from web-scale data.
- Academic Knowledge API. Explores links between academic articles, journals, and authors.
- Entity Linking Intelligence Service. Adds contextual knowledge for people, events, and locations.
- Knowledge Exploration Service. Adds interactive search capability to structured data.
- Recommendations API. Provides customers with personalized product recommendations.
- Bing Autosuggest API. Offers smart autosuggest options for searches in an app.
- Bing Image Search API. Adds advanced image and metadata search to an app.
- Bing News Search API. Offers pertinent news searches to users.
- Bing Video Search API. Searches for trending videos and other rich media results.
- Bing Web Search API. Adds powerful search function to apps.
Different combinations of these APIs have been used to build apps for sale (Computer Vision API and Bing Speech API to record customer information without typing), for security management (Face API to control access via a door), and to describe the contents of images (using Computer Vision API, Emotion API, and Bing Image Search API.)
The only limit is likely to be the imagination of the user. Microsoft Cognitive Services are rich enough to rapidly build prototypes of new AI applications for many user needs or wants. That, finally, may also be one of the smartest things about these services overall: that they let developers start with real end-user requirements for intuitive, easy-to-use functionality and build an app that suits the users, instead of trying to make the users suit the app.
Post written by Jason Milgram, Director Software Development, Champion Solutions Group / MessageOps
Microsoft Azure MVP (2010-current)