Further analytics regarding comments on HIVE

holoz0r (76)in Hive Statistics • last month

Hello everyone! I have continued to analyse the data set I began looking into a few days ago, and have some new analysis to present to the community.

Before I commence with showing the analytics, a few important distinctions.

Not Complete

This is by no means a true and complete understanding of the commenting behaviours on the HIVE blockchain. It covers just a week of comments, with the data set used being for the period of 18 May through to 24 May. I will be pulling a larger sample of data to further investigate trends.

Classification of Account Types can be improved

For a discussion of this, see the section "non human accounts for the purpose of this analysis" part of this post.

Not every suggestion is yet implemented

I am so very pleased by everyone who took the time to discuss with me in the comments of my previous post about other things that could be investigated and analysed. This is an interim analysis based on what has interested me and been the quickest to go from MVP (my first post) - to something more polished and repeatable.

I'm okay with being proven wrong

If you see stuff in the data that doesn't line up with what you perceive to be facts about the chain, please let me know, and I am happy to further investigate this. Anyway, wtihout further adieu, let's look at the data.

I have completed this analysis using Power Query and Power BI.

I will begin with the homepage of the report, which provides an overview. For the purpose of this example, and to explain the metrics displayed, I will started with only information relating to my own comments for the period:

On the left hand side, we can see the overall information about my comments. The range of dates in which they were found, the total payout value, the average pay out value, the number of comments made, the average length of my comments, how many replies they got, and how long the... longest comment was.

Because my own account is tagged as "human", its human. Below that is the user search function.

The table to the right shows this in a table view, which makes more sense when we want to compare users to Each other for the purpose of analysis.

The table to the right breaks down my conversations with people on the chain. You can see who I replied to the most, and what length my average reply to that person was. I've also added how many votes that reply has gotten.

In my example, you can see that my most common conspirators are @riverflows and @galenkp, along with a whole host of other people.

Comments in General

Now, If I remove the filter showing just myself on the report, things will start to get a little more interesting:

We can now see that I've flagged about 30% of comments as coming from "non human" accounts and when sorting by pay out value, we can see that three non human accounts have received the majority of rewards for comments.

There is, however, an opportunity to improve this analysis, as I am not (at this stage) actively searching to see if there is a beneficiary defined for a user's comments. For instance, redditposh sometimes defines beneficiaries to its comments which go to the author of the top level post.

This is something in my "backlog" to investigate as to where the rewards for such accounts "goes". Take it with a grain of salt for the time being. However, If I flick back over to accounts that I've comfortably defined as a human (again, see the appendices at the bottom of this post for further information on what is used to determine that), then we can start to examine "human" users in more detail.

I can then sort by the other dimensions to find out, for instance who made the most comments in the period:

Then, I can see who got the most votes, where I notice my first BIG HMMM". There's a number of users with a low number of comments, but a HIGH number of votes for relatively short comments. Comments that are around average length, but... why on Earth would these comments be seeing ~200 votes per comment?

200 VOTES PER COMMENT? WHAT?

Here is what that brief investigation uncovers for some of the users in this high vote:comment ratio. It appears that there is an app on hive called skatehype, and well, it looks like there's an incredibly long curataion trail for comments on posts...

Why? I don't know. Does this add value? I don't know? Should I ignore it in future releases of this report? You tell me.

It appears that almost all users of this app / front end / site have around 200 votes per comment. I feel like that is pretty spammy on the chain, but the app looks like it rewards and onboards people for uploading skateboard tricks to hive.

I didn't know this app existed before this analysis, but investigation like this lets us learn new things. The first person who appears to be a genuine author on this list, with solid engagement across their comments (in terms of net positive votes) is @davideownzall

Average length of Comments

Here, a balance between long comments and lots of comments is probably what we're looking for to find people that engage deeply with the content of other people. Already, I am seeing a few accounts listed that may not be classified as "human" appearing in this list. A reminder that this is is not complete analytics or research, and will be a work in progress for some time to come.

Average Depth of Comments:

Who leaves the longest comment chains? Who talks to people a lot more than others. Again, here - I'm seeing some accounts that are likely not to meet the definition of human, but there are definitely a few humans that I can see in this list. I will add this to the list of things to further investigate at a later time.

Longest Comment

Who had the most "points" to make? Or, who just unloaded half a novella onto the chain in someone's comment's section, or ... perhaps is quoting the entire content of a really long chain by other users?

I love the fact that a user by the name of "short segments" has the longest comment here. There's a lot of human acacounts in this list, but again, at a glance, there's a few accounts that may not be human showing up in this list, for instance "dcityrewards". This is something that I will need to look into further to make adjustments to the "likely not human list" in the Appendices to this post.

Who talks to who the most?

Here is a screenshot of who talks to who the most, taken from the right hand side of the report, taking into account the frequency of replies.

Now, moving onto some more exciting stuff....

Duplicate Comments Left by the same author

This will be a series of screenshots, in order to demonstrate the content of the comment, along with the number of times it has appeared on the chain. Please remember that this is just a week's worth of data for the purposes of this analysis, so it is likely that these accounts have made these identical comments many more times than depicted here.

As a discussion point - does the HIVE community as a whole, feel that these identical comments add any value to the chain, or does it serve only to increase the bloat of the block-log, and create boundaries into the abilities of preservation, on the basis that much of the content in these comments is visible in other places on chain, where it serves a purpose, such as in the voting ledger?

Anyhow, I present to you the most prevalent comments by a SINGLE user that are repeated many times on HIVE. This is to say - a user than leaves the EXACT Same comment more than once.

I'll stop here, but it is evident that this page is NOT FILTERED to human creators - and we can see a lot of curation projects voting on content. In the future, I plan to use this page of the report to identify users who make the identical comment multiple times on chain and potentially add them to "not human" user class in further analysis.

But what about the same comment that may be left by more than one user?

This is something that is interesting ... I guess - how many users have left a comment like "thank you", or "great post" or something like that? For this step, I am moving out of PowerBI and into KNIME to use its powerful group by function and aggregation / concatenation features. I've exported my table as a CSV, then completed a group by (body of comment) - so we know what the comment is, followed by the sum (so we know how many times that comment was published), then a concatenation of each of the "authors" who posted that comment, and how many times each one contributed.

This screenshot is pretty hard to read, so here's a table of anything that has appeared more than fifty times. There's probably some accounts here that should be flagged as non human, I'll add that to my list of things to do for future iterations of reporting.

I have stripped out any HTML and images (hopefully) but the table below is a bit of a nightmare owing to the large number of users making calls to bots. I should probably exclude those from calls from future

Comments Published more than ten times:

Finally, Comment Reward Distribution:

This needs further investigation to see if the "non human comment payments" involve beneficiary.

Appendices:

"Non Human" accounts for the purpose of this analysis

The definition of "human" here is a single person acting on their own, not a part of a curation team or community. (And certainly not a bot!)

To create this list, I looked at comments that appeared more than once (where the text of each comment was identical, and the author was also the same).

Most of these accounts relate to curation projects stating that they've voted on something. I'm not counting them as "human", as they're often a collective of humans, which is by definition, not "human".

If you are a human, and you appear on this list, please comment and I will investigate further based on your comment history. This list is likely to grow as a look at data on a longer term basis.

"a-colmena", "actifit", "airhawk-project", "ajolote", "ajolte", "aliveandthriving", "amazingdrinks", "aquarius.academy", "asd09", "asean.hive", "beerlover", "bilpcoinbpc", "bot-bdbhueso", "bpcvoter1", "bpcvoter2", "bpcvoter3", "ccceo.voter", "celf.magazine", "centtoken", "chessbrotherspro", "cinnccf", "commentrewarder", "coolmonsters", "digital.hub", "discovery-it", "diyhub", "dmhafiz", "dookbot", "douglas.life", "drawmatic", "duo-tip", "dw38h", "ecency.waves", "enlace", "entropia", "es-literatos", "f76wz", "fallen.angels", "fgh87", "foodiesunite", "fulldeportes", "guest06", "guest07", "guest08", "guest09", "guest10", "helios.daily", "helios.notify", "helios.voter", "hiq.smartbot", "hispapro", "hive-103505", "hive-106316", "hive-106444", "hive-118507", "hive-124452", "hive-134572", "hive-174680", "hive-177745", "hive-179017", "hive-br.voter", "hive-lu", "hive=14396", "hiveargentina", "hivebits", "hivebuzz", "hivecurious", "hivepakistan", "hivewatchers", "hk14d", "hug.bot", "india-leo", "indiaunited", "indonesianhiver", "innerblocks", "itharagaian", "jamerussell", "jkl65", "keys-defender", "la-colmena", "ladiesofhive", "ladytoken", "leothreads", "liketu.moments", "lilybee", "lolzbot", "lovesniper", "luvshares", "meme.bot", "minimalistliving", "music-community", "musiczone", "nowayjosecuisine", "osomar357", "pandex", "peak.snaps", "pixbee", "pizzabot", "poshtoken", "qurator", "redditposh", "rutablockchain", "sahi1", "scifimultiverse", "sor31", "splinterboost", "ssg-community", "steemmonsters", "stemsocial", "strava2hive", "swc-curation", "terraboost", "theinkwell", "thepimpdistrict", "thoughtfulposts", "tippybot", "tipu", "tokenfaucet", "topcomment", "travelfeed", "tydynrain", "u89gw", "upme.notify", "vibes-voter", "vmn31", "w7ngc", "w95hj", "waivio", "waivio.updates01", "waivio.updates02", "waivio.updates03", "waivio.updates04", "waivio.updates05", "waivio.updates06", "waivio.updates07", "waivio.updates08", "waivio.updates09", "waivio.updates10", "wine.bot", "witnessbot", "wiv01", "womentribe", "worldmappin", "x6oc5", "xcv47", "youhive", "zxc43"

Comments Published More than Once (HTML, PNG, GIF, JPEG, JPG, "<", ">", "/" "" "@" "1" filtered out:

https://pastebin.com/JCzgWukX

Things to still do

- Investigate beneficiary rewards

- Find more accounts that "aren't human, investigating: duplicate comments and curation hive accounts"

- Who Swears the most?

- Operationalise the dataset

Power Query for extraction and taagging:

    Source = Sql.Databases("vip.hivesql.io"),
    DBHive = Source{[Name="DBHive"]}[Data],
    dbo_Comments = DBHive{[Schema="dbo",Item="Comments"]}[Data],
    #"Select Date" = Table.SelectRows(dbo_Comments, each [created] >= #datetime(2025, 5, 18, 0, 0, 0) and [created] <= #datetime(2025, 5, 24, 0, 0, 0)),
    #"Only Comments" = Table.SelectRows(#"Select Date", each ([parent_author] <> null and [parent_author] <> "") and ([title] = "")),
    #"Removed Columns" = Table.RemoveColumns(#"Only Comments",{"author_rewards", "promoted", "body_language", "TS"}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Removed Columns", "body", "body - Copy"),
    #"Lowercased Text" = Table.TransformColumns(#"Duplicated Column",{{"body - Copy", Text.Lower, type text}}),
    #"Added Conditional Column" = Table.AddColumn(#"Lowercased Text", "Swearing?", each if Text.Contains([#"body - Copy"], "fuck") then "Contains Swearing" else if Text.Contains([#"body - Copy"], "shit") then "Contains Swearing" else if Text.Contains([#"body - Copy"], "cunt") then "Contains Swearing" else if Text.Contains([#"body - Copy"], "wanker") then "Contains Swearing" else "No Swearing"),
    #"Added Conditional Column1" = Table.AddColumn(#"Added Conditional Column", "Thankful?", each if Text.Contains([#"body - Copy"], "thank") then "Uses Thank" else "Doesn't Thank"),
    #"Added Conditional Column2" = Table.AddColumn(#"Added Conditional Column1", "Uses Exclamation", each if Text.Contains([#"body - Copy"], "!") then "Uses Exclamation" else "Doesn't use exclamataion"),
    #"Added Conditional Column3" = Table.AddColumn(#"Added Conditional Column2", "Calls Bot", each if Text.Contains([#"body - Copy"], "!bbh") then "Yes" else if Text.Contains([#"body - Copy"], "!beer") then "Yes" else if Text.Contains([#"body - Copy"], "!wusang") then "Yes" else if Text.Contains([#"body - Copy"], "!lady") then "Yes" else if Text.Contains([#"body - Copy"], "!wine") then "Yes" else "Probably Not"),
    #"Added Custom" = Table.AddColumn(#"Added Conditional Column3", "Custom", each let
    textValue = [#"body - Copy"], 
    splitWords = Text.SplitAny(Text.Lower(textValue), " .,;?()[]{}-"),
    startsWithExclamation = List.AnyTrue(List.Transform(splitWords, each Text.StartsWith(_, "!")))
in
    startsWithExclamation),
    #"Duplicated Column1" = Table.DuplicateColumn(#"Added Custom", "body - Copy", "body - Copy - Copy"),
    #"Trimmed Text" = Table.TransformColumns(#"Duplicated Column1",{{"body - Copy - Copy", Text.Trim, type text}}),
    #"Cleaned Text" = Table.TransformColumns(#"Trimmed Text",{{"body - Copy - Copy", Text.Clean, type text}}),
    #"Renamed Columns1" = Table.RenameColumns(#"Cleaned Text",{{"body - Copy - Copy", "body trim clean"}}),
    #"Renamed Columns" = Table.RenameColumns(#"Renamed Columns1",{{"Calls Bot", "Starts with Exclamation Calls Bot"}, {"Custom", "Tokenised Call Bots"}})
in
    #"Renamed Columns" ```

#comments #analysis #hive #powerquery #powerbi #data

last month in Hive Statistics by holoz0r (76)

$42.47

Sort:

Trending

[-]

mobbs (74) last month

Overwhelming amount of data!

For me the bigger problem I've had are the kind of maybe-maybenot bot type of humans who seem to use AI summaries to just spam comments. Y'know, the function on PeakD that you can just click a drop down menu and select the 'write me an appropriate reply to this blog'.

At least, a lot certainly come across that way. On the handful of times I've clicked into their profile to investigate, their responses elsewhere seem to be identical in that AI style. I think @peakd should abolish it entirely. It isn't helping anyone or anything. At the very least, limit its functionality.

Literally turning humans into bots like some kind of loophole around the anti-bot sentiment.

$0.06

2 votes

[-]

holoz0r (76) last month

I don't even think some of those replies are using that feature. The grammar isn't as tidy as GPT, it doesn't contain the enthusiastic SEO-word-salad-friendly-blogger tone.

But yeah, I've seen that feature, and Its a thing. It would be no loss if it were gone.

$0.08

2 votes

[-]

borniet (71) last month

I can confirm that @tydynrain is human, and NOT a bot ;-) He does comment a lot though :-D But always in a good and meaningful way ;-)
As I am the technical guy behind the BBHbot, I'm of course very interested in your results! I saw that it was the second most identical comment. We've been working hard on battling spam and abuse of our bots (together with @tydynrain, we also have some other tipping bots, such as the indeed bot), and have stopped posting comments from the bots quite soon (I think even first, but not sure). If spotting any issue with these bots, feel free to contact me about it!
I'm interested to see how many of the BBH calls are coming from the same (group of) accounts. One of the issues we had in the past, was a bot, just commenting the tip-call on every post that passed by, even though it didn't have any BBH holdings, and thus no tipping power. Not sure what the idea behind the bot was, but it was spamming like hell, so down the blacklist he went. I might have to do the same again, based on your data!
I could almost write a book in this comment, just by looking at & thinking about your vizualisations (I also work in Data ;-) and was thinking about similar things ;-) ), as it clearly shows a lot of issues. Some of the accounts I saw in the lists are very spammy indeed, even though they may appear human. On the other hand, some may be bots, but have their use (as you mentioned).
Anyway, great work! Love the pictures! Keep it up! Thanks for sharing! !BBH ;-)

$0.03

2 votes

[-]

holoz0r (76) last month

I can get you a list of the users who only commented the bot call without any content behind it, give a few days though, have a busy weekend ahead, and a job interview next week, followed by some photography work... so it might not be until late next week actually.

I think the same problem will happen with a lot of bots, but a good call for people to investigate those who are making calls that don't have the resources to actually elicit the token response.

There's no issue, people can do whatever they like with their resource credits, but oe day, we might need to prevent this sort of stuff from being in blocks to let comments like yours get through :P

$0.55

3 votes

[-]

borniet (71) last month

Can't but agree.... if we want to grow, we better use the resources wisely.
That list of users would be useful indeed! But no need to hurry, photography first! Oh, and Live before Hive :-) And good luck with the job interview!!!
!HOPE

$0.03

1 vote

[-]

holoz0r (76) last month

Thank you on all accounts! I have just now at 930PM after a busy Friday Night and a busy Saturday - gotten to sit down and check my email / messages / phone / HIVE ... Its been a fun time!

Tomorrow, Gym, and dropping off my photographic work to the art gallery, then presenting some of the analytics I did for the gallery admin team to them for further feedback - basically, looking into visitor numbers, visitors per sale, average revenue per visitor, etc

Its a fun new way to apply my skills after having worked in telecomms / customer service / complaints / churn / analytics and business improvement for 10+ years!

$0.59

4 votes

[-]

friendlymoose (74) last month

Interesting stats again.
It gives quite a good insight in whatnis happening on Hive.

I wonder why leoquoter is not only spamming their own threads microblogging platform, but also the others.

Overall I think that automated comments don't add value to the chain, but that is my opinion.

Thanks for sharing this!

$0.02

1 vote

[-]

holoz0r (76) last month

I am glad that I am not the only one that feels that automated comments add no value. :)

I have a further 1.7 million rows of data extracted that I will perform the further analysis on in the coming... future, but my main focus right now is on finding some paid employment. While I'm not doing that... I am doing things like this to keep my busy and to stop the ongoing pain from my journeys to the gym, lol.

$0.07

1 vote

[-]

friendlymoose (74) last month

Important things first!

$0.02

1 vote

[-]

galenkp (83) last month

Who swears the most...The last time I saw results on this one by @abh12345 it was me. Lol. I'll be interested to see where I land this time.

$0.00

1 vote

[-]

holoz0r (76) last month

In your words, "fucken oath it will be me" ;)

$0.17

1 vote

[-]

galenkp (83) last month

Lol, I better fucken get a few fucken swear fucken words on the fucken blockchain to make sure I fucken win. Fuck yeah.

Dude, I need to unpack this post some more...it looks interesting though.

$0.00

[-]

holoz0r (76) last month

I look forward to whatever insights you may find.

Regarding the swearing, I wasn't planning on going on a "swears" per comment, but because you just made this post, I might have to.

$0.17

1 vote

[-]

riverflows (80) last month

You might have to group the various swear words though, because there might be many fucks given, but what about the cunts and bollocks? The shits and ass wipes? The bumfucks and dicks and nobheads?

Is your fucking data analysis capable of cunting differentiating?

**PS Clearing trying to give Galen a run for his fucking money.😂

$0.06

1 vote

[-]

holoz0r (76) last month

I can either count the posts that have "swears", or count the total number of swears found. I haven't decided. Swearing frequency or swearing density. I could do both, but that would upset the wave-particle duality of the number of fucks I have to give, once I observe it, then the particle-wave form collapses and I'm left with certainty.

$0.00

[-]

riverflows (80) last month

🤣🤣🤣🤣

$0.00

[-]

galenkp (83) last month

Haha, one must swear...or not I suppose.

$0.00

[-]

davideownzall (69) last month (edited)

Yes make a ranking of the most swearing users!

How you determine if a user is human or not? You check manually some comments or? Not sure if you have already or how hard would it be to integrate an Ai checker, quite some comments are obvious ai

I guess @hive-197333 snaps work gave me a good boost on there

$0.00

[-]

holoz0r (76) last month

I have read the duplicate comments of the user, and manually determined whether they're human, or just a curation service :P

In terms of AI checker, that will be something for a later revision, I will need to find a way to process the comments on a local LLM for the "confidence" that it is AI generated, as I cannot afford an API end point to check each comment, the computational costs would be astronomical, but if I can do it on my local GPU - then I will :P (When the sun is shining and I have solar power to do it for free!)

So I think my definition of "Human" is "Someone who doesn't make the same comment multiple times, constantly, and isn't on the exclusion list that @friendlymoose gave me that is used for the @topcomment project.

There's still much work to be done on this analysis, in order to make it simple and easy to digest for everyone on HIVE, just not nerdy analysts like myself.

$0.08

2 votes

[-]

davideownzall (69) last month

A local llm could be good to go, I'm unsure how "heavy" it is, I only used stable diffusion locally... Worth a try tough

A first version can never be complete but it's pretty clear, you can keep elaborate over time

$0.02

1 vote

[-]

holoz0r (76) last month

I have a 4090, it should be no problem. Have also run SD, Flux, and other models locally in the past. It is fun to play with. Comfy UI is a bit of a nightmare though. LM Studio is pretty easy to use for LLMs, though.

$0.00

1 vote

[-]

davideownzall (69) last month

Yep, I like the local as you can tweak and add lora and such how you prefer rather than be forced to use what is offered online... Oh comfy, I never tried it, it looked complicated but people showed good results with it

$0.00

[-]

holoz0r (76) last month

I used all of the three major front ends for image gen locally, Automatic1111, Invoke, DrawThings on ipad / MacOS, but Comfy UI is ... the most flexible, and I think has the most extensibility.

You do tend to end up with spaghetti string node combinations, reminiscent! of any node based software, which can often end up looking like...

$0.00

1 vote

[-]

davideownzall (69) last month

🤣

I only used A1111, my pc is not really new, it has some years it takes like 20 minutes per image considering I added ADetailer too... I tried forge, it's way faster but it's buggy for me with some models and lora

$0.03

1 vote

[-]

papilloncharity (81) last month

Wow, this is intense, and one can see that you have done a lot of work here. I bet that your future analytics will be very interesting. We also hope that you will be successful in your search for employment.
!BEER

$0.00

[-]

holoz0r (76) last month

Thank you very much! I've had a few interviews, but the time period to wait for an outcome is quite lengthy. In the interim, things like this keep me entertained and off the streets :D

$0.00

[-]

papilloncharity (81) last month

Well, you found a great way to stay busy in the interim, and let's hope that one or more of the interviews will be positive. 🙏

!BEER

$0.03

1 vote

[-]

beerlover (64) last month

^{View or trade BEER.}

Hey @holoz0r, here is a little bit of BEER from @papilloncharity for you. Enjoy it!

Did you know that <a href='https://dcity.io/cityyou can use BEER at dCity game to buy cards to rule the world.

$0.00

[-]

beerlover (64) last month

^{View or trade BEER.}

Hey @holoz0r, here is a little bit of BEER from @papilloncharity for you. Enjoy it!

Learn how to earn FREE BEER each day by staking your BEER.

$0.00

[-]

beerlover (64) last month

^{View or trade BEER.}

Hey @holoz0r, here is a little bit of BEER from @papilloncharity for you. Enjoy it!

We love your support by voting @detlev.witness on HIVE .

$0.00

[-]

beerlover (64) last month

^{View or trade BEER.}

Hey @holoz0r, here is a little bit of BEER from @papilloncharity for you. Enjoy it!

Did you know that <a href='https://dcity.io/cityyou can use BEER at dCity game to buy cards to rule the world.

$0.00

[-]

tokenfaucet (59) last month

The tokenfaucet account is run by a human but 99+% of the comments it makes are generated by a bot to replies to our giveaway posts telling repliers what they were given. So for your purposes, you are correct to classify tokenfaucet as "non human". Thanks and best wishes! :)

$0.00

[-]

holoz0r (76) last month

Cheers, thank you for the clarification here, it is much appreciated!

$0.00

[-]

gadrian (76) last month

I occasionally drop a "damn" here and there, so I might find myself on that swearing list, lol. Good to see you went deeper with your analysis!

$0.00

[-]

holoz0r (76) last month

As an Aussie, I wouldn't come come to treating "damn" as a swear :P