Published in:COMMUNICATIONS IN STATISTICS. THEORY AND METHODS, vol. -, pp. 1-19 (ISSN 0361-0926)
Abstract:A formal statistical framework is proposed for synthesis of text information into sentiment indicators. Each text document is treated as an exchangeable collection of stems of words (tokens), and used in conjunction with a multinomial inverse regression approach to efficiently synthesize the information content in text documents. The proposed methodology is illustrated through the buildup of sentiment indicators using Twitter news outlet text information. These synthesizing indicators, quantitative in nature, can be built across disciplines to capture changes in the economic, financial, and social conditions, and also serve to reveal heterogeneity across countries, sectors, or markets. The proposed approach is computationally fast and allows for time variation in the indexes.