In a past, professional life, I once did a task that examined the quality and occurrence of text. No on asked for it. I was a Business Analyst who kept seeing the same comments come up, and I was concerned that poor quality notes were being left on customer accounts.
I was trying to ascertain how good complaint resolution notes left by on customer cases were based on their length, uniqueness, and frequency. Now that I find myself temporarily unemployed, I thought it would be fun (if you can call data fun - I do) to create a study on HIVE comments, and to do some objective analysis on the comments left on HIVE.
Because I am feeling lazy for this analysis, I am using power query and Excel, so I'll include the step by step methodology as I go.
Firstly, some parameters about the data used:
- Extracted from HIVE SQL, I am looking at the Comments table.
= DBHive{[Schema="dbo",Item="Comments"]}[Data]
- I am then looking only for a week worth of content
= Table.SelectRows(dbo_Comments, each [created] >= #datetime(2025, 5, 18, 0, 0, 0) and [created] <= #datetime(2025, 5, 24, 0, 0, 0))
- I am interested only in comments, not top level posts. Therefore I am filtering OUT content that does not have a parent author. I'm also keeping everything with a "blank" title, as this appears to get me actual comments.
= Table.SelectRows(#"Filtered Rows", each ([parent_author] <> null and [parent_author] <> "") and ([title] = ""))
This leaves me with 98,655 comments to work with as a sample set, looking at a period of a week. The first thing I want to check the integrity of the data, and given that I know my own data best, let me test to see what I've been doing and who I've been talking to most on the blockchain in the last week:
holoz0r replies to user | this many times |
---|---|
riverflows | 16 |
galenkp | 8 |
cryptoandcoffee | 4 |
jorgebgt | 3 |
creativemary | 3 |
abh12345 | 3 |
meno | 3 |
hivewatchers | 3 |
azircon | 3 |
fastchrisuk | 2 |
mattclarke | 2 |
acidyo | 2 |
steevc | 2 |
menati | 2 |
beatminister | 2 |
raceline | 2 |
buggedout | 2 |
vatman | 2 |
edicted | 2 |
vimukthi | 2 |
Looks about right, given that I know my activity.
So my next step is to figure out which account did the most replies in the sampled period. (because, as we all should know by now, not every account is a human, and it is pretty obvious on the basis of some of the account names that appear in the list.
The next thing I want to learn about is users who are not me, because they are typically more interesting than myself. The thing I love about data is that data hides absolutely nothing, and we can see that there is a lot of bots or tokens...
User making comment | count of comments |
---|---|
hivebuzz | 3634 |
lolzbot | 2000 |
actifit | 988 |
worldmappin | 940 |
luvshares | 822 |
beerlover | 700 |
splinterboost | 621 |
pizzabot | 616 |
ladytoken | 596 |
bpcvoter1 | 452 |
roswelborges | 448 |
aquarius.academy | 448 |
chi4god | 442 |
hug.bot | 435 |
hivebits | 418 |
u89gw | 415 |
xcv47 | 413 |
w7ngc | 412 |
jkl65 | 411 |
w95hj | 409 |
sor31 | 409 |
hk14d | 407 |
fgh87 | 407 |
asd09 | 407 |
f76wz | 405 |
vmn31 | 404 |
dw38h | 404 |
wiv01 | 403 |
x6oc5 | 402 |
zxc43 | 401 |
What I am interested in next is probably a futile exercise, but I want to know what the most commonly left ... comment is and what percentage that IDENTICAL comment makes up of all the comments left during the week.
I am pleased to report that this simple analysis reveals that:
Over 10% of the comments left on HIVE comments are entirely meaningless
Data doesn't lie. Here are the top 100 most commonly left comments.
Furthermore, once I exclude non-duplicate comments, we find that 32,068 of the comments left on HIVE for the week are non-unique. Therefore, from our original sample of 98,655 comments, a whopping 32.5% of comments left on the HIVE blockchain are NOT UNIQUE!
This means, on aggregate, for every comment that you see on HIVE, about one in three will be the same. Context is important though, therefore we've got to consider common phrases that appear at the top of the list:
When I look through the duplicate comments, I can see that we're a grateful bunch, with the string "thank" appearing in 12,861 comments, or 13% of replies.
I plan on interrogating this data in more depth, but I think this is a good starting point to build a future "dashboard" of comment health on HIVE.
What would you like to see in such a dashboard?
My thoughts are as follows:
- Is the comment unique?
- How many comments by x user?
- Who swears the most?
- What comments are just calling bots to give tokens?
- Is the comment longer or shorter than the average comment?
- Who on had the most interactions with who?
- Does the comment contain picture(s)?
Open to suggestions. Give me stuff to do.
Thank for this analysis 😃
The bots are bad, nice to know about one in three bad!
I see what you did there. That won't come up in the report until I do more analysis on newer data. ;)
I am hoping to publish some stuff in a few hours exploring some more insights that I am digging into using the data set.
Surprise surprise, more like negative surprise surprise at how many bot accounts and auto-generated comments from token call commands there are!
Some great statistics there for a starting point of what you are looking into.
A dashboard would be great to share a bulk of those dot points you mentioned.. start off including some of the most useful points:
This will provide some solid data and insights into comments and can later be built upon further.
I agree with data being fun, seeing the numbers and statistics showing how you want, from what you worked on is a good feeling!
Image google search sourced and original source is from Reddit
Thanks mate, I don't want to overload hivesql too much with my queries, so I'm going to try and do as much local processing as possible.
I have used PowerBI professionally, and could build such a thing and perhaps distribute the PBIX file for people to run it on their local systems, as I don't intend on paying for a pro licence. I might use PowerBI to prototype, then...
Do the data processing on my machine(s), then figure out a way to display JSON / CSV data in a web interface and host it as a github page or some such...
Good idea, as it will be a lot for it to process along with everything already being processed, through the mass transactions the chain is going through daily.
PowerBI is so useful, I haven't dealt with it much and mostly used Excel but PowerBI is on my list for learning.
Distributing it would be good for people to see the stats, same as hosting on Github and displaying on a web interface. Keen to see more posts on this, as you feel the need to share progress for anyone else that is interested.
PBI is useful, but can be a royal pain to use. I think for this, I'll have a data pipeline something like...
Each week, grab the comments table for the most recent period. (sql)
Append it to the existing data. (python)
Complete my feature engineering and analysis (pbi)
Join it to any other relevant data (python)
Identify any new edge cases. (my brain)
Run the analytical compute. (probably python)
Update table, refresh dashboard, publish interesting bits on hive. (not sure on what tech i'll use)
I find this analysis very true to my experience when browsing Hive. Every time I post, I get dozens of comments like “Awesome post!”, “Thanks for sharing”, and “Great content,” – which I know 90% are bots. As a new user, I feel a bit discouraged because I don’t know if the interactions on Wacky Flip are genuine or just “bait” to earn tokens. If Hive wants to retain real users, the quality of comments should be prioritized over quantity!
Well, there will be some future analysis that I perform on those comments from different users that are the same.
A lot of people might say "thank you", "thanks", "great photo", "good post", "Really interesting", or other stuff like that.
I have three pages built so far following on from this post to break down some stuff, and will just present it as is, without any commentary, aside from sharing my methodology.
The best way to get genuine followers for your account is to find other content creators and offer genuine comments yourself. Otherwise, yeah, nothing but bots trying to farm votes.
Good analysis! This is the kind of data that I find fascinating. A dashboard is a great idea but even updates about these figures would be helpful. Hivewatchers keeps an eye out for SPAM, but I have spent a lot of time wondering how we can examine and verify some of the content that has already been created. The first 6 days are key for upvotes, but most upvotes are usually made within the first hour of being created.
For some more stat suggestions;
I honestly have a lot more ideas.
I see other good suggestions in the other comments as well.
https://hiveuprss.github.io/hiveisbeautiful/
Thanks for sharing that link! I watched it for maybe a little bit too long, it was a pretty nice implementation of sitting back and watching the world go by.
I have some logic written to detect any calls to bots that have an exclamation mark in front of them. I think some sort of hybrid metric that looks at comment length and complexity (of the language used) will find those that aren't just calling the bot for people to farm the hive engine tokens.
I think the other interesting thing (if it is there in the data) - I can probably work on this as a goal - is ... what was the oldest post that got a comment that week?
Will require me to do one big pull of data from hivesql, or maybe download and interrogate the block log for historical data, but it would definitely be very interesting!
That IS an interesting link for sure! Thanks for sharing!
What is the Podping thing btw?
I believe https://3speak.tv/ uses Podping. @brianoflondon knows about this.
"Podping leverages Hive to solve the RSS polling problem."
Recent post --> https://peakd.com/podping/@brianoflondon/podping-on-hive-is-quietly-announcing-3000-podcast-episodes-per-hour
Cool! Had never heard about it, but saw it popping up. Thanks for explaining! ;-)
!INDEED
Very interesting! Maybe to find out how comments are rewarded, or if posts from authors with higher stake get on average more comments. I have the strong feeling that there are many comment farmers out there who don´t contribute much but at least try to be not entirely repetitive (unlike the "well done sir" from past times). And if those comment farmers get more upvotes on their comments than users with less, but more meaningful comments.
Or a separate analysis of long comments (e.g. >50 words). Are they more rewarded than shorter comments? Which communities have longer comments? Are there communities with significantly more comments than others? There is a lot to do...
Comment length by community is a very interesting idea, and I am already wondering if high stakeholders get more engagement on account of lower vested accounts thinking they will get noticed and get some crumbs.
Comment depth is also a factor. If there’s a free flowing discourse, that’s awesome, if length is consistent.
You’ve given me a lot to think about and I appreciate that.
I guess this is what happens when you've got plenty of time 😂
Only for now... this has been circulating in my thoughts for some time. Maybe I'll make a scoring algorithm and get people grades for their comment performance.
There's plenty of junk on Hive. I wondered what all the accounts were with random looking 5 letter names. Looks like they are part of Waivo and may just be used to store data for their system. Anyone can use Hive as a database and they don't even have to worry about downvotes really. I wonder if this will expand. I was actually thinking about some data I could store on the blockchain, but I know there are ways to do it without creating posts or comments.
There's a certain idiot who leaves loads of spammy comments on my posts, but we can't stop that. At least they are mostly hidden. You don't need that much HP to allow a bot to post all day.
Cheers for the interesting data.
There will be some more data to come!
The definition of spam varies from user to user. There is also a list of accounts that I'm curating to flag as "non human" in my data set.
Longer term, we might need to see "competition" for inclusion in blocks, perhaps based on resource credits.
There is a lot of spam, but the resource credits you get for HP are an absolute bargain compared to the amount of crap you can inject into the chain.
A question. If I like to give commentors a token for a comment, instead of a few cents. At least in my mind, a token is more valuable than some cents. So, if someone farms tokens like Lol, Pizza, Beer, or any other, would it appear that I am a partner of the farmers?
If you're making a good, comment that references the material and tries to get some genuine discussion with the author, that is very different from just hitting them with a "token call" comment.
I suppose if you're thanking someone for their comment and that's all, but instead of the reward being HIVE, and being the token instead via the token call, I suppose it is going to be more than just the token call, right? :)
Yeah, you are right, and short replies to comments seems to be in fashion nowadays. Thankfully there are still people around who one can have a good conversation with. Replies like "very nice pictures", and such does not auger well for long engagements.
!BEER
I just published a new post with some new insights into just that.. :)
https://peakd.com/hive-133987/@holoz0r/further-analytics-regarding-comments-on-hive
Yep, I saw the post, and it explains a lot to me. !LOL
lolztoken.com
No one, it happens Autumnatically.
Credit: reddit
@holoz0r, I sent you an $LOLZ on behalf of papilloncharity
(1/10)
View or trade
BEER
.Hey @holoz0r, here is a little bit of
BEER
from @papilloncharity for you. Enjoy it!Learn how to earn FREE BEER each day by staking your
BEER
.View or trade
BEER
.Hey @holoz0r, here is a little bit of
BEER
from @papilloncharity for you. Enjoy it!Did you know that <a href='https://dcity.io/cityyou can use BEER at dCity game to buy cards to rule the world.
The bilpcoin spam farm (at least 10 accounts, although not all are active spammers) has been a source of irritation for me over the last year. The repetition combined with the length is insane.
Another account, @aquarius.academy, seems to feed from a list of dubiously-soirced "inspirational" quotations and has taken to replying to old posts long past payout. I try to keep an eye on it and down ote any upvotes while alerting those voters to the scam. Can you filter accounts rotating through such lists?
Finally, while I see less of it now, the ol' "nice post" and "follow for follow?" stuff has declined, but still seems to crop up from new users unfamiliar with our netiquette or bad actors trying the laziest possible methods.
I have a list of accounts that I'm tagging as human or not human, and it is my intent to focus on "human" interaction, but yes, I will be publishing the list of accounts that I am tagging as non human in my next post on this topic.
There is a filter in my data set for this, as I've set it up as a flag in the table I'm building.
I am really keen to see how many posts are just "Thank You" and the number of these, and how many unique users do things like that. That will be my next group by command on my data set, I am just processing some more rows to get a bit of a longer term picture than the week I sampled here.
A week is just a snap shop, whereas a few months will be meaningful, and enable week on week tracking for trends, word clouds, key words and overall sentiment analysis and more.
There's so much that can be done.
It's nice we have an open system for that kind of analysis. Web2 social media is full of the same kind of nonsense, but transparency is antithetical to their model.
Thanks for sharing! Data is fun! Now this makes me want to learn how to access HiveSQL to play around with. :)
It is easy. There's a tutorial on WWW.HIVESQL.IO
You need to pay a 1HBD Fee. You can then connect using PowerQuery, SQL, Python, or whatever other programming language you'd like to use that supports SQL statements.
I'm doing some more in depth analysis on this data set at the moment, and I expect to have a few other posts around this data in the coming days. :)
This is interesting information. One thing made me scratch my head: the WUSANG comment being the most commonly mentioned. Wasn't sure what to make of that, I run the bot and HBIT seems to get called more often, and LUV even more often. So, why WUSANG? I wondered. Then, I saw that your sample week was 5/18 to 5/24. On 5/21, dead center (of course!), the Hivebits account was attacked. Evidently WUSANG was the command of choice. I wrote about it at https://peakd.com/hivebits/@crrdlx/hbit-resource-credits (initially thinking it was just a resource credits situation).
I've seen the same a while ago, when I fixed the BBHbot. A bot account was simply spreading the !BBH tipping call in comments all over the place, without even owning BBH (and thus not having tipping powers). No idea what the use of it was, but it sucked bigtime. It was at that time that I started reworking the comments of the tipping bots I operate, and slowly went towards the daily summary system (or weekly now for large part of my other bots !INDEED), and have added a pretty strong blacklisting system to my bots.
I have the impression that it is not even always about gaining things, but just the "fun" of trying to break things...
Agree, it's the thrill
They'd better grow some real skills and write something useful! There's thrill in that too...
Well said. Almost everything I've ever made code or computer-wise is for one of two reasons: to make something for me to use personally or just of challenge of seeing if I can make it and get it to work.
Exactly that! Totally share those intentions, to learn or to use! And once it is built, to share.
Very important for the data to line up with our temporal observations! I hadn't heard about your project prior to this, and my discovery of the WUSANG command is what led me to follow your account, which may be what lead you to come to this post, I'm not sure!
I plan to exclude such calls to bots in future analysis of comment quality, because, well, they have their place for the communities that enjoy using them and speculating on the Hive Engine tokens and various side chains that they're integrated with.
I won't lie, my query on those token call commands is sort of linked with @aggroed recently posting about changing the ability for bulk transactions to occur on the sidechains.
As with all things, it is all interconnected. I just want to learn more about the over arching comment quality on HIVE :)
Nice use of HiveSQL and thanks for sharing the results on top repeated comments for the recent week.
I see the @tokenfaucet bot call of !TF on the list. That gives HBD, HIVE, LOH or PEPE. And it only works on tokenfaucet posts so it is more self-contained. Also it doesn't reply anything in certain cases, rather than giving error messages.
I have read that there is plenty of room in the blocks we produce every 3 seconds but it's still good practice to limit what we put in there, I would think, especially for bot related comments.
But also, HIVE is more than people posting and commenting. So it is expected to see people come up with other use cases for this blockchain.
Thanks again for the post and data. It is getting good interaction here in the comments too!
It is definitely true that HIVE is more than people just posting and commenting. There are other uses, but they should really be custom json transactions like all the Splinterlands battles were once upon a time. :)
While there might be plenty of room in those blocks, one day, there might not be, and it is important, for me at least, to understand what makes it up, so that one day when there might not be enough room in those blocks, we can look back at things like this as a community and decide what we want to assign the valuable space in our blocks to.
Furthermore, I am not sure how to feel about curation projects leaving a comment on everything they vote. That information is recorded by the chain as the vote action, and I'm not sure what "value" the comment alerting to the author to that is. Particularly for some communities and hives where it seems almost each and every post gets one of those "curation" comments.
Thanks for the reply!
I have always thought the exact same thing. I am not convinced that all HIVE needs is 100X the number of MAUs (Monthly Active Users) and all will be well, although I have read several people who have said that on here. Having that many new people posting and commenting has its own social issues and abuse potential and token depreciation as I think most noobs are mainly sellers. I don't think the reward pool can handle a big increase in people here. And I don't think there enough room in the blocks for 100X people, but I could be wrong. I just think, be careful what you wish for! I have seen niche online communities last decades and I would be fine if we stay small. I don't know how many other people I could follow on here anyway. I'm already on here too many minutes a day!
I also agree. They take up blockchain space and presentation space on the page, especially ones with graphics. But then again, people like the dopamine hit from these comments. I guess we're all like rats in a Skinner box who perform an action for a reward. I'm not sure what that says about modern society but that's a topic for another day.
At tokenfaucet, a reply is necessary because we randomly give one of 4 tokens with one of a few multipliers so we tell them what they've won. I'm actually thinking of moving the "game" to its own website so as not to be a part of potential over use of the blockchain.
Well thanks again for the interaction which got me thinking further, I appreciate you! :)
I don't. I like the dopamine hit from publishing my rambling content and then having further conversations about it. Having discourse and discussing what ifs, and whys and whats and whos and wheres.
If there's the same amount of HIVE, going to less people, I think we could reasonably forecast that the patterns of behavior would likely mirror our existing userbase. Some would HODL for ever, some would be 100% liquid, others would be a bit of a mix, and people like me would sell some, then buy stuff that helps them recontribute to hive.
I put a significant amount of HIVE toward to my dream camera lens for example, and I will be pretty much posting every shoot I ever do with that lens on HIVE! (excepting events like weddings / functions where I don't hold explicit reproduction rights to the images!).
I like to hold strongly onto the delusion that people will engage themselves with something that they are passionate about whether there is an incentive or not.
Twitch moderators for people who are making lots of money are often volunteers. People willfully spout their knowledge about various topics on Reddit.
There's oodles of facebook groups where people discuss very specific and incredibly detailed things in exhausting technical detail. They all do this for free. They get the reward of engagement with their fellow humans (or in the case of those who have failed the Turing Test) - the AI and bots of the Dead Internet Theory, and all they get is served ads every 4th post (sometimes more) for the pleasure.
"Data doesn't lie" — very true. There are a lot of comment bots out there, and if they drop generic comments on every post, they’re bound to get small votes here and there. That adds up over time, so it’s clearly profitable for minimal (or zero) effort.
What’s wild is that these low-effort spam comments don’t seem to get flagged by the same people who claim to uphold the “quality post” standard. Makes you wonder how consistent those standards really are.
What do you think is stopping those accounts from being penalized more often? Is it because they fly under the radar, could one of those "big boss" accounts be the one running it, or is it just not worth the effort to deal with them?
Probably because I haven't seen this analysis done publicly before. :P Comments are also much more difficult to systematically punish, probably because they don't get the same visibility as top-level posts do. I'm not so much interested in the reward, for a good comment, because to me, the endorphins and serotonin I get from having a genuine interaction with someone is the "Drug" I am chasing.
I don't want to punish people who make shitty comments, that would discourage them from improving. Instead, I think we should reward people who make good comments, and try to raise everyone's ability to create, good, engaging comments.
They don't even have to be in English. I'm seeing lots of other languages in the plaintext I've extracted, and I see a lot of "gracias", which I know means Thanks, or something :P It is just that English is the only language I know fluently, other than the various sounds I make when I bend over as I continue to age and lift heavy things at the gym. :D
I found it really interesting how you were able to collect all this data using the magic you call “HiveSQL” — lol.
On the language side, I’m fluent in both Spanish and English (i live in Argentina), so I’m actually thinking of running similar analytics on the Spanish-speaking community to see if the same patterns show up there. Could be cool to compare and see if the issue of low-value comments translates across languages just as clearly. What do you think ?
Yeah, that is probably doable. I plan to publish my methodology once I figure it out, and it should apply regardless of the language, though it will be simple.
There's a whole bunch of other things you can with natural language processing and data to extract insights. I am really looking forward to who swears most on the blockchain :P
Or maybe, we could find out what the most frequently mentioned colour is? blue, red, green, yellow, orange, purple, brown, or something else? :D
As an appendix to this comment, Microsoft Word tells me my grade level for this comment is...
I like to try and keep things mostly simple on HIVE, as not everyone has English as a first language.
I’m really looking forward to a post where you break down the process a bit more. I’m still kind of new to programming and data stuff, but I pick things up pretty fast, and this kind of analysis is super interesting to me.
Also, I was thinking — have you ever considered using ChatGPT or the GPT API to help analyze comments? Not sure what your take on it is, but I’ve used it a lot to help me understand topics that I’ve always struggled with. I just tell it to explain things like I’m five, and honestly, it helps a lot.
Obviously it’s not a replacement for real thinking or research, but as a tool, it can be surprisingly useful.
There's a lot of stuff that pre-dates LLM analytics which is pretty good for text. The whole area of NLP (Natural Language Processing) is probably very foundational to the existence of LLMs like GPT.
I have an idea that I can run a local model to identify some trends on the comments. That is a longer term goal. The problem with doing such analysis is that there's a random seed generated each time you ask a LLM a question, so each time, you'll get a slightly different answer.
As a result, you can't really run tests with different data over different time periods and expect it to treat the analytical process in exactly the same manner, even if you're using the same prompt in the same session, with a different data set.
There's a few commercially available models that focus on repeatable, more empirical analysis, I trialed a few at my old job, but there's a lot of stuff you can do to exhaust the data before you get to "AI" giving you new insights or new paths to go down.
I plan to exercise all those traditional methods before I dive headlong into whatever stuff I can cook up for a LLM to do these data sets.
It will be able to, at a guess - that would be more difficult in traditional NLP methods, if not normally possible:
I am not formally qualified in any of this data science or data reporting stuff. I've just worked with it for years at my old job, and hope to find a new job (I was made redundant about 6 weeks ago) where I can use these skills to make people do things with more integrity, ethics, respect, or.... worst case, y'know capitalism, cos a man like me loves pizza.
$PIZZA slices delivered:
@darkflame(1/15) tipped @holoz0r
Come get MOONed!
Oh I really want to know who swears the most - thankyou! ;)
This doesn't surprise me. I think @tarazkp told me years ago I didn't have to reply to every comment ... 🤯🤯🤯🤯 which made me feel better about the whole thing 😂
Grateful for our meaningful - I think - conversations this week!
They'll always be meaningful. So long as at least 50% of the participants choose to assign value to it. Or.. someone comes along and joins our conversations.