
Exploring Evolution of Public Opinions on Tianya Club Using Dynamic Topic Models
Zhihua YAN, Xijin TANG
Journal of Systems Science and Information ›› 2020, Vol. 8 ›› Issue (4) : 309-324.
Exploring Evolution of Public Opinions on Tianya Club Using Dynamic Topic Models
Online media have brought tremendous changes to civic life, public opinions, and government administration. Compared with traditional media, online media not only allow individuals to browse news and express their views more freely, but also accelerate the transmission of opinions and expand influence. As public opinions may arouse societal unrest, it is worth detecting the primary topics and uncovering the evolution trends of public opinions for societal administration. Various algorithms are developed to deal with the huge volume of unstructured online media data. In this study, dynamic topic model is employed to explore topic content evolution and prevalence evolution using the original posts published from 2013 to 2017 on the Tianya Zatan Board of Tianya Club, which is one of the most popular BBS in China. Based on semantic similarities, topics are grouped into three themes: Family life, societal affairs, and government administration. The evolution of topic prevalence and content are affected by emergent incidents. Topics on family life become popular, while themes "societal affairs" and "government administration" with bigger standard deviations are more likely to be influenced by emergent hot events. Content evolution represented by monthly pairwise distance matrix is very easy to find change points of topic content.
topic modeling / dynamic topic models / text mining / topic evolution {{custom_keyword}} /
Table 1 The statistics of Tianya Zatan dataset after preprocessing |
Year | Original posts # | Corpus # (thousand) | Words in dictionary # |
2013 | 437, 806 | 64, 845 | 66, 761 |
2014 | 383, 573 | 59, 176 | 66, 811 |
2015 | 258, 060 | 39, 580 | 66, 817 |
2016 | 200, 742 | 30, 156 | 66, 806 |
2017 | 124, 453 | 18, 158 | 66, 691 |
Total | 1, 404, 634 | 211, 915 | 66, 844 |
Table 2 Examples of topics of Tianya Zatan corpus in January of 2013 |
No. | Label | Top words (Jan. 2013) | Theme1 | Trend2 |
1 | Institution management | Work, Management, System, Supervision, Organization | 2 | + |
5 | Product quality safety | Production, Food, Milk powder, Product, Criterion | 2 | C |
9 | Company operations | Work, Management, System, Supervision, Organization | 2 | C |
18 | Marriage | Woman, Man, Girl, Marriage, Divorce | 1 | + |
27 | Financial gegulation | Capital, Estate, Immigration, Investment, Gong Aiai | 3 | + |
34 | Job | Job, Poster, Friend, Tianya, Feeling, Graduation | 2 | C |
45 | Traffic accident | Driver, Vehicle, Yellow light, Traffic police | 3 | - |
52 | Telecom fraud | Phone, Swindler, Contact, Information, ID card | 3 | - |
1 1: Family life; 2: Societal affairs; 3: Government administration. 2 +: Up trend; -: Down trend; C: Constant trend. |
Table 3 Summary of topics generated by DTM |
Theme | Details | Topics # |
Family life | Marriage, Friends, Feelings, Travel, Entertainment, Traditional culture, Diet, Religion, Job, etc. | 15 |
Societal affairs | Social media, Urban environment, Product safety, Decoration, Livelihood, School, Spiritual civilization, etc. | 27 |
Government administration | State-owned firm, Building demolition, Official corruption, Civil rights, Criminal offence, Migrant workers, etc. | 18 |
Figure 3 Heat map displaying the top thirty topics on average prevalence from 2013 to 2017 |
Table 4 Topic trends using the Cox-Stuart trend test with |
Type | Family life | Societal affairs | Government administration | Total |
Up trend | 9 | 5 | 4 | 18 |
Down trend | 1 | 7 | 13 | 21 |
Constant | 4 | 16 | 1 | 21 |
Total | 15 | 27 | 18 | 60 |
1 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
2 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
3 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
4 |
Rohani V A, Shayaa S, Babanejaddehaki G. Topic modeling for social media content: A practical approach. Proceedings of 3rd International Conference on Computer and Information Sciences, 2016: 397-402.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
5 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
6 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
7 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
8 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
9 |
Blei D M, Lafferty J D. Dynamic topic models. Proceedings of the 23rd International Conference on Machine Learning, 2006: 113-120.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
10 |
Ahmed A, Xing E. Dynamic non-parametric mixture models and the recurrent chinese restaurant process: With applications to evolutionary clustering. Proceedings of the SIAM International Conference on Data Mining, 2008: 219-230.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
11 |
Iwata T, Yamada T, Sakurai Y, et al. Online multiscale dynamic topic models. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010: 663-672.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
12 |
Allan J, Carbonell J, Doddington G, et al. Topic detection and tracking pilot study: Final report. Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, 1998: 194-218.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
13 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
14 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
15 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
16 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
17 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
18 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
19 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
20 |
Hall D, Dan J, Christopher D. Studying the history of ideas using topic models. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2008: 363-371.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
21 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
22 |
Greene D, Cross J P. Exploring the political agenda of the European parliament using a dynamic topic modeling approach. Political Analysis, 2017: 25(1): 77-94.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
23 |
Lau J H, Collier N, Baldwin T. On-line trend analysis with topic models: # twitter trends detection topic model online. Proceedings of COLING 2012, 2012: 1519-1534.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
24 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
25 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
26 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
27 |
Cao L N, Tang X J. Prevailing trends detection of public opinions based on Tianya Forum. Proceedings of International Conference on Intelligent Data Engineering and Automated Learning, 2013: 186-193.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
28 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
29 |
Hu Y, Tang X J. Using support vector machine for classification of Baidu hot word. Proceedings of International Conference on Knowledge Science, Engineering and Management (KSEM 2013). Springer, 2013: 580-590.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
30 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
31 |
Blei D M, Lafferty J D. Dynamic topic models. Proceedings of the 23rd International Conference on Machine Learning, 2006: 113-120.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
32 |
Griffiths T L, Steyvers M. Finding scientific topics. Proceedings of the National Academy of Sciences, 2004(suppl 1): 5228-5235.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
33 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
34 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
35 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
36 |
Mehrotra R, Sanner S, Buntine W, et al. Improving LDA topic models for microblogs via tweet pooling and automatic labeling. Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2013: 889-892.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
37 |
Lau J H, Grieser K, Newman D, et al. Automatic labelling of topic models. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011: 1536-1545.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
38 |
Chuang J, Ramage D, Manning C, et al. Interpretation and trust: Designing model-driven visualizations for text analysis. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2012: 443-452.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
39 |
Sievert C, Shirley K. LDAvis: A method for visualizing and interpreting topics. Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, 2014: 63-70.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
40 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
41 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
42 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_ref.label}} |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
The authors gratefully acknowledge the editor and two anonymous referees for their insightful comments and helpful suggestions that led to a marked improvement of the article.
/
〈 |
|
〉 |