Papers
arxiv:2509.11057

YTCommentVerse: A Multi-Category Multi-Lingual YouTube Comment Corpus

Published on Sep 14, 2025
Authors:
,

Abstract

YTCommentVerse is a large-scale multilingual dataset of YouTube comments spanning 15 content categories and over 50 languages, providing rich resources for analyzing sentiment, toxicity, and engagement patterns across diverse cultural contexts.

AI-generated summary

In this paper, we introduce YTCommentVerse, a large-scale multilingual and multi-category dataset of YouTube comments. It contains over 32 million comments from 178,000 videos contributed by more than 20 million unique users spanning 15 distinct YouTube content categories such as Music, News, Education and Entertainment. Each comment in the dataset includes video and comment IDs, user channel details, upvotes and category labels. With comments in over 50 languages, YTCommentVerse provides a rich resource for exploring sentiment, toxicity and engagement patterns across diverse cultural and topical contexts. This dataset helps fill a major gap in publicly available social media datasets particularly for analyzing video sharing platforms by combining multiple languages, detailed categories and other metadata.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.11057 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.11057 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.11057 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.