Research

2020

Project TextMall: Text-Mining Made Simple for All

2 minute read

Published:

The objective of this research is to develop a general-purpose text analytics platform, i.e., Text-Mall, which would enable real-world users to easily explore the power of Text-Mining in a simple and interactive fashion without worrying about the underlying details of Natural Language Processing.

Project Annotate: Ad-Hoc Unsupervised Concept Annotation

3 minute read

Published:

Information retrieval and Knowledge mining become much easier if data is categorized and annotated precisely. With the rapid growth of Big-data, it is infeasible to perform manual annotation, as it is slow and expensive. Although the area of text annotation is not in the nascent phase, it has not been well-studied from a user-centric point of view, which is the goal of this project.

Project A2I-MOOC: Artificially Intelligent and Interactive MOOCs

3 minute read

Published:

MOOCs have abysmal retention rates (5-15%) and high student failure rates (7-13%). In this project, we propose two ways to increase student engagement in MOOCs and other online courses through an artificially intelligent system that leverages machine learning and natural language processing. This system will (1) process, prioritize and organize students’ questions in real-time and provide the most relevant questions to instructors for answering during their live lectures, and (2) automate the creation of breakout rooms (which have recently become popular in Zoom classes) based on high-interest topics emerging from student questions and populated by like-minded students during live lectures.

Project Robust IR and NLP Evaluation

3 minute read

Published:

We propose the new framework of IR evaluation with both upper and lower bound (UL) normalization of traditional metrics and systematically study the effect of UL-normalization on three popular evaluation metrics. We also propose three different variations of the proposed upper and lower bound (UL) normalized evaluation framework and experiment each of them with three evaluation metrics individually, creating nine new evaluation metrics in total. We show how we can compute more realistic query-specific lower-bounds for evaluation metrics by computing their expected values for each query in case of a randomized ranking of the corresponding documents. We also theoretically prove their correctness.

Project Deep-MD: Molecular Dynamics Modeling with Deep Time series Forecasting

1 minute read

Published:

We explore machine-learning methodologies for predicting the outcomes of MD simulations by preserving their accurate time labels. This idea will greatly reduce the computational expenses associated with performing MD, making it broadly accessible beyond the current user-base of scientific researchers to high schools and colleges, where the computational resources are sparse.

2018

Research Statement 2018. (Archived)

19 minute read

Published:

Broadly, my research interest lies at the intersection of Text Mining, Natural Language Processing and Information Retrieval. More specifically, I have been studying how to mine Big Text Data across different application domains to find interesting patterns that can provide novel insights to domain experts, which is, otherwise, difficult to perceive due to the scale of the data.