Deception in Language

Language is a medium of conveying information, but unfortunately, it is also often used to deceive. We have explored the detection of such language in Online reviews, and discovered that the stylometric aspects of language play an important role in exposing the deceptive intent of writers (Feng, Banerjee, and Choi; 2012a). We have also carried out experiments on the process of creating such language by investigating the differences in how people type when writing truthful as opposed to deceptive texts, and revealed interesting parallels between typing patterns and speech patterns when people lie (Banerjee et al. 2014). On a related note, we also investigated stylometric aspects of language to identify the traits of individual writers (Feng, Banerjee, and Choi; 2012b).

Research Group

Research Products

[Feng, Banerjee, and Choi; 2012a]
  • Song Feng, Ritwik Banerjee, and Yejin Choi. Syntactic Stylometry for Deception Detection. In Proceedings of the 50th Annual Meeting of the Association for Computation Linguistics (Vol. 2: Short Papers), pp. 171 - 175. Association for Computational Linguistics, 2012. [ PDF ]
[Feng, Banerjee, and Choi; 2012b]
  • Song Feng, Ritwik Banerjee, and Yejin Choi.Characterizing Stylistic Elements in Syntactic Structure. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1522 - 1533. Association for Computational Linguistics, 2012. [ PDF ]
[Banerjee et al. 2014]
  • Ritwik Banerjee, Song Feng, Jun Seok Kang, and Yejin Choi. Keystroke Patterns as Prosody in Digital Writings: A Case Study with Deceptive Reviews and Essays. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1469 - 1473. Association for Computational Linguistics, 2014. [ PDF ]
  • The dataset contains truthful and deceptive writings from two domains: business reviews, and essays on two topics of social interest: gun control and gay marriage. The data is available for download as compressed tar.bz2 files:
The uncompressed dataset consists of files with tab-separated values. The key log data is found in the last column, titled ReviewMeta. This field has a list of KeyUp, KeyDown and MouseUp event logs. Note that the first event timestamp is not always zero. The event logs have the following formats:[timestamp] KeyUp/KeyDown [javascript keycode][timestamp] MouseUp [begin-index] [end-index]