Video Content Popularity Prediction Using Machine Learning Methods
Video Content Popularity Prediction Using Machine Learning Methods
Ilya L. Shafirov
National Research University – Higher School of Economics, Moscow, Russia, This email address is being protected from spambots. You need JavaScript enabled to view it.
National Research University – Higher School of Economics, Moscow, Russia, This email address is being protected from spambots. You need JavaScript enabled to view it.
Journal of Economic Regulation,
2020, Vol.
11
(no. 2),
This paper deals with the research problem of predicting the popularity of newly created video content. Machine learning task is represented by binary classification of videos into “popular” and “unpopular”. Based on the Pareto principle, the “popular” videos are those, which are part of the top 20% most viewed videos. The article provides an overview of studies on the video content popularity prediction problem by using methods of machine learning (including deep learning). The author explores the applicability of various modifications of existing methods to solve the research problem. The author also develops the new method based on a combination of the ensemble of trees and neural networks. Each method is tested on a sample of 11,000 YouTube videos data, which is collected by using a purposefully developed parsing software. Based on the tests results, it is suggested to use the method of combining tree ensembles and neural networks. The quality of prediction by using this method is characterized by the following metrics: 87% of videos are correctly classified (Accuracy); among the videos classified as popular, 63% are popular (Precision); 49% of truly popular videos are correctly identified (Recall). Research findings indicate characteristics that are most likely to influence the popularity of the newly created video: the number of views and dislikes of the last publi shed video on this channel; the number of channel subscribers; last video's publishing time; new video title; the channel establishment date. The limitations and directions for improving the method are outlined; the need for interdisciplinary research is proposed as encompassing the interests of marketers, data analysts, linguists and psychologists.
Keywords:
digital economy; marketing; management; Big data; video content; machine learning methods
References:
- Alexa Internet, Inc. (2020). The top 500 sites on the web (https://www.alexa.com/topsites – Accessed: 15-Apr-2020).
- Bielski, A., Trzcinski, T. (2018). Pay Attention to Virality: Understanding Popularity of Social Media Videos with the Attention Mechanism. 2018 IEEE/CVF Conference on Computer
- Vision and Pattern Recognition Workshops (CVPRW). DOI: https://doi.org/10.1109/cvprw.2018.00309 – Accessed: 29-Apr-2020).
- Buffer (2020). State of Social 2019 (https://buffer.com/state-of-social-2019 – Accessed: 11-Apr-2020).
- Clement, J. (2019). Global logged-in YouTube viewers per month 2017–2019. Statista (https://www.statista.com/statistics/859829/logged-in-youtube-viewers-worldwide/ – Accessed: 9-Jan-2020).
- Clement, J. (2020). Global number of internet users 2005–2019. Statista (https://www.statista.com/statistics/273018/number-of-internet-users-worldwide/ – Accessed: 12-Mar-2020).
- Crane, R., Sornette, D. (2008). Viral, quality, and junk videos on YouTube: Separating content from noise in an information-rich environment. The AAAI Spring Symposium: Social Information Processing ( https://www.aaai.org/Papers/Symposia/Spring/2008/SS-08-06/SS08-06-004.pdf – Accessed: 15-Jan-2020).
- Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv ( https://arxiv.org/pdf/1810.04805.pdf – Accessed: 15-May-2020).
- Enberg, J. (2020). How COVID-19 Has – And Has Not – Affected Global Ad Spending. eMarketer (https://www.emarketer.com/content/how-coronavirus-affects-global-ad-spending – Accessed: 15-Apr-2020).
- Facebook for Developers (2020). API Graph (https://developers.facebook.com/docs/graphapi/ – Accessed: 16-Apr-2020).
- Fontanini, G., Bertini, M., Del Bimbo, A. (2016). Web Video Popularity Prediction using Sentiment and Content Visual Features. ICMR’16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. DOI: http://dx.doi.org/10.1145/2911996.2912053 – Accessed: 29-Apr-2020).
- Geurts, P., Ernst, D., Wehenkel, L. (2006). Extremely randomized trees. Mach Learn, 63, 3–42 (https://doi.org/10.1007/s10994-006-6226-1 – Accessed: 15-May-2020).
- Guandan Chen, Qingchao Kong, Nan Xu, Wenji Mao (2019). NPP: A neural popularity prediction model for social media content. Neurocomputing, 333, 221–230. DOI: 10.1016/j.neucom.2018.12.039 (Accessed: 05-May-2020).
- Influence Marketing Hub (2019). Coronavirus (COVID-19) Marketing & Ad Spend Impact: Report + Statistics (https://influencermarketinghub.com/coronavirus-marketing-adspend-report/ – Accessed: 11-May-2020).
- Keyan Ding, Kede Ma, Shiqi Wang (2019). Intrinsic Image Popularity Assessment. Proceedings of ACM Conference (Conference’19). ACM, New York, NY, USA, 9 pages (https://arxiv.org/pdf/1907.01985.pdf – Accessed: 15-May-2020).
- Min Gyeong Choe, Jae Hong Park, Dong Won Seo (2019). How Long Will Your Videos Remain Popular? Empirical Study of the Impact of Video Features on YouTube Trending Using Deep Learning Methodologies, pp. 190–197 / In: Jennifer J. Xu, Bin Zhu, Xiao Liu, Michael J. Shaw, Han Zhang, Ming Fan (eds.) The Ecosystem of e-Business: Technologies, Stakeholders, and Connections: 17th Workshop on e-Business, WeB 2018, Santa Clara, CA, USA, December 12, 2018, Revised Selected Papers. Springer, 199 p.
- Newman, M. E. J. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46(5), 323–351. DOI: 10.1080/00107510500052444 (Accessed: 11-May-2020).
- Pew Research Center (2020). Share of US adults using social media, including Facebook, is mostly unchanged since 2018 (https://www.pewresearch.org/fact-tank/2019/04/10/shareof-u-s-adults-using-social-media-including-facebook-is-mostly-unchanged-since-2018/ – Accessed: 09-Apr-2020).
- Sherman (2019). 35 Digital Marketing Statistics That Will Convince You to Advertise Online. Lyfe Marketing (https://www.lyfemarketing.com/blog/digital-marketing-statistics/ – Accessed: 12-Jan-2020).
- Statt, N. (2020). YouTube is a $15 billion-a-year business, Google reveals for the first time. The Verge (https://www.theverge.com/2020/2/3/21121207/youtube-google-alphabetearnings-revenue-first-time-reveal-q4-2019 – Accessed: 12-Apr-2020).
- Tao Chen, Damian Borth, Trevor Darrell, Shih-Fu Chang (2014). DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks. arXiv (https://arxiv.org/abs/1410.8586 – Accessed: 15-May-2020).
- Trzciński T., Andruszkiewicz P., Bocheński T., Rokita P. (2017). Recurrent Neural
- Networks for Online Video Popularity Prediction. arXiv (https://arxiv.org/pdf/1707.06807.pdf – Accessed: 29-Apr-2020).
- Trzciński, T., Rokita, P. (2017). Predicting Popularity of Online Videos Using Support Vector Regression. IEEE Transactions on Multimedia, 19(11), 2561–2570. DOI: 10.1109/TMM.2017.2695439 (Accessed: 23-Jan-2020).
- Tubics (2020). How Many YouTube Channels Are There? (https://www.tubics.com/blog/number-of-youtube-channels/ – Accessed: 08-Apr-2020).
- Wyzowl (2019). The State of Video Marketing 2019 (https://info.wyzowl.com/state-ofvideo-marketing-2019-report – Accessed: 21-Feb-2020).
- YouTube (2020). Press – YouTube (https://www.youtube.com/intl/en-GB/about/press/ – Accessed: 09-Apr-2020).
Publisher:
Ltd. "Humanitarian perspectives"
Founder: Ltd. "Humanitarian perspectives"
Online ISSN: 2412-6047
ISSN: 2078-5429
Founder: Ltd. "Humanitarian perspectives"
Online ISSN: 2412-6047
ISSN: 2078-5429