A Study of Feature Construction for Text-based Forecasting of Time Series Variables.

Published in ACM CIKM [Short Paper] , 2017

Download paper here

Image not Loading

Time series are ubiquitous in the world since they are used to measure various phenomena (e.g., temperature, spread of a virus, sales, etc.). Forecasting of time series is highly beneficial (and necessary) for optimizing decisions, yet is a very challenging problem; using only the historical values of the time series is often insufficient. In this paper, we study how to construct effective additional features based on related text data for time series forecasting. Besides the commonly used n-gram features, we propose a general strategy for constructing multiple topical features based on the topics discovered by a topic model. We evaluate feature effectiveness using a data set for predicting stock price changes where we constructed additional features from news text articles for stock market prediction. We found that: 1) Text-based features outperform time series-based features, suggesting the great promise of leveraging text data for improving time series forecasting. 2) Topic-based features are not very effective stand-alone, but they can further improve performance when added on top of n-gram features. 3) The best topic-based feature appears to be a long-term aggregation of topics over time with high weights on recent topics.