Deep Learning | Adobe Media and Data Science Research (MDSR) Laboratory

MobiVSR - Mobile Application for Visual Speech Recognition

Visual speech recognition (VSR) is the task of recognizing spoken language from video input only, without any audio. VSR has many …

Nilay Srivastava, Astitwa Saxena, Yaman Kumar Singla, Debanjan Mahata, Rajiv Ratn Shah, Amanda Stent, Roger Zimmerman

Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

Automatic Speech Scoring (ASS) is the computer-assisted evaluation of a candidate’s speaking proficiency in a language. ASS …

Yaman Kumar Singla, Avyakt Gupta, Shaurya Bagga, Changyou Chen, Balaji Krishnamurthy, Rajiv Ratn Shah

Document Structure Extraction using Prior based High Resolution Hierarchical Semantic Segmentation

Structure extraction from document images has been a long-standing research topic due to its high impact on a wide range of practical …

Mausoom Sarkar, Milan Aggarwal, Arneh Jain, Hiresh Gupta, Balaji Krishnamurthy

Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature Attribution

As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned …

Nikaash Puri, Sukriti Verma, Piyush Gupta, Dhruv Kayastha, Shripad Deshmukh, Balaji Krishnamurthy, Sameer Singh

Multi-Modal Association based Grouping for Form Structure Extraction

Document structure extraction has been a widely researched area for decades. Recent work in this direction has been deep …

Milan Aggarwal, Mausoom Sarkar, Hiresh Gupta, Balaji Krishnamurthy

Retrospective Loss: Looking Back to Improve Training of Deep Neural Networks

Deep neural networks (DNNs) are powerful learning machines that have enabled breakthroughs in several domains. In this work, we …

Surgan Jandial, Ayush Chopra, Mausoom Sarkar, Piyush Gupta, Balaji Krishnamurthy, Vineeth Balasubramanian

SieveNet: A Unified Framework for Robust Image-based Virtual Try-On

Image-based virtual try-on for fashion has attracted considerable attention recently. The task requires trying on the desired clothing …

Ayush Chopra, Surgan Jandial, Kumar Ayush, Mayur Hemani, Balaji Krishnamurthy

SimPropNet: Improved Similarity Propagation for Few-shot Image Segmentation

Few-shot segmentation (FSS) methods perform image segmentation for a particular object class in a target (query) image, using a small …

Siddhartha Gairola, Mayur Hemani, Ayush Chopra, Balaji Krishnamurthy

Lipper: Synthesizing Thy Speech using Multi-View Lipreading

Lipreading has a lot of potential applications such as in the domain of surveillance and video conferencing. Despite this, most of the …

Yaman Kumar Singla, Rohit Jain, Khwaja Mohd. Salik, Rajiv Ratn Shah, Yifang Yin, Roger Zimmerman