Update on Blocktrades work and the results of HF24

in HiveDevs3 years ago

image.png

I’m publishing this post a little early, as I expect to be pretty busy on Monday. Before I go into my normal reporting on the detailed coding issues that BlockTrades team worked on last week and our plans for the upcoming week, I first wanted to give a brief overview of the hardfork process as it happened last week, since that’s what’s been driving our work flow in the past week.

Review of hardfork 24 (and an “unplanned” hardfork before that)

On the date of hardfork 24 (Oct 14th), the apps developers were still making hardfork 24 related changes and reporting API problems they were finding with hivemind, while the BlockTrades team was working on fixing those bugs as they were reported (I’ll discuss the bug fixes later in this post). Meanwhile, the top 20 witnesses were standing by, waiting for an “all clear” signal that enough apps were stable that we could safely execute the hardfork by upgrading their nodes to the hardfork 24 code (tagged in the hived node repository as either v1.24.2 or v1.24.3).

Several of the top 20 witnesses had already updated their code to the hardfork 24 code, but this was considered OK by me and other devs, as the HF24 code requires a super-majority of the top 20 witnesses to switch to it to trigger the new HF24 protocol it contains. This allowed some top 20 witnesses to not have to hang around as we waited for the apps/hivemind integration to get to an acceptable level before we triggered the hardfork.

Unfortunately, this led to an unexpected side-effect: the HF24 code contained a protocol change that wasn’t properly guarded against execution before hardfork 24. This means that a HF24 node could produce a block that wouldn’t be accepted by HF23 nodes. We had never seen this bug triggered before, because it could only cause a problem when a HF24 node produced a block and only then under special circumstances.

There have been a few times prior to the hardfork date when we’ve run a HF24 node as a producing node, but in the past, such a node was in the minority, so the worst thing that might have happened if this bug got triggered was that the HF24 node would temporarily fork, then fall back into consensus with the chain when the block it generated wasn’t accepted by the HF23 nodes.

But on the hardfork date, even though we didn’t have a super-majority of top 20 nodes running HF24, we did have a majority running HF24. And a majority is enough to do determine how chain forks get resolved. So when one of the HF24 nodes produced a block that was rejected by HF23 nodes, but accepted by HF24 nodes, the fork resolution logic kept the HF24 nodes on a separate unplanned hard fork from the HF23 nodes (effectively splitting the chain into two forks).

The top 20 witnesses quickly realized what was happening, so they decided to execute the hardfork by upgrading the remaining HF23 nodes to HF24, so that all nodes rejoined the majority fork. This also required all the API node operators to upgrade their API nodes to HF24, and all Hive apps switched to their HF24 versions to use those API nodes.

Because the hardfork 24 was executed a little sooner than we would have liked due to the chain split, we still hadn’t resolved all the bugs and performance issues in hivemind and Hive apps at the time of the hardfork. This led to various glitches and slowdowns experienced by apps users over the past few days. But Hive devs have been working hard to resolve the issues as fast as possible and things are already looking much better, and I expect the remaining issues to be resolved quite soon.

One thing for the future: I want to look at ways to detect problems like the chain split before they happen. One possibility could be to setup a special secondary witness node running the new code that signs blocks as a top 20 witness, but where the blocks it produces are only broadcast to one isolated old code node that would report if it was unable to accept any of the blocks it received from the new node. We can also reduce the possibility for this problem occurring in practice by having most of the top 20 witnesses upgrade very near the same time, but that can only get us so far: the ideal solution would be to have a better test method to detect such problems and I think some variation on my proposal above should work.

Hived work (blockchain node software)

We made several changes to API responses returned by hived, mostly in response to reports from apps developers:
https://gitlab.syncad.com/hive/hive/-/merge_requests/125
https://gitlab.syncad.com/hive/hive/-/merge_requests/126
https://gitlab.syncad.com/hive/hive/-/merge_requests/133

We also did general cleanup to docker, scripts, and configuration files for hived:
https://gitlab.syncad.com/hive/hive/-/merge_requests/130

We also fixed a problem with the cli-wallet: it was still using old chain-id after the hardfork, so it couldn’t generate proper transactions. It seems there were very few if any tests written previously for testing the cli-wallet.
https://gitlab.syncad.com/hive/hive/-/merge_requests/128

The cli-wallet fix was necessary for exchanges, so we tagged a new version v1.24.4 that includes this change (and the other fixes above). Note that none of the above changes are needed by consensus witnesses, which is why witnesses are primarily still running 1.24.2. These changes are only needed by API nodes and exchanges.

We started a full replay yesterday to check all the above changes (this takes around 18 hours). We don’t expect any issues, since the changes were designed as non-consensus changes, only changes to the API, but better safe than sorry.

Hivemind (2nd layer social media microservice)

Most of our time was still spent on hivemind, but we made very good progress.

Our improvements to hivemind can be separated into two categories: bug fixes (wrong or missing data in API responses) and slow queries that result in unacceptable response times. Our bug fixes are usually made in response to reports from apps devs, but slow queries are usually detected by observing the performance of the postgres servers used by our API node with the pghero tool. We’ve found pghero to be very handy for finding which SQL queries are consuming the most time to complete (it functions as a profiler). It’s also useful for finding duplicate and unnecessary indexes which can impact performance.

Here’s a list of improvements and bug fixes we made to hivemind:
https://gitlab.syncad.com/hive/hivemind/-/merge_requests/281
https://gitlab.syncad.com/hive/hivemind/-/merge_requests/282
https://gitlab.syncad.com/hive/hivemind/-/merge_requests/286
https://gitlab.syncad.com/hive/hivemind/-/merge_requests/287
https://gitlab.syncad.com/hive/hivemind/-/merge_requests/290
https://gitlab.syncad.com/hive/hivemind/-/merge_requests/294
https://gitlab.syncad.com/hive/hivemind/-/merge_requests/298
https://gitlab.syncad.com/hive/hivemind/-/merge_requests/297
https://gitlab.syncad.com/hive/hivemind/-/merge_requests/302
https://gitlab.syncad.com/hive/hivemind/-/merge_requests/295
https://gitlab.syncad.com/hive/hivemind/-/merge_requests/307
https://gitlab.syncad.com/hive/hivemind/-/merge_requests/306
https://gitlab.syncad.com/hive/hivemind/-/merge_requests/310

Decentralized list changes were also merged into the develop branch, after updates (the code had diverged a lot since these changes were made, so it required a decent amount of manual merging and testing): https://gitlab.syncad.com/hive/hivemind/-/merge_requests/275

Hivemind status

With the latest optimizations (last big one was merge request 310 made on Saturday), hivemind seems to be working fairly well, but we still have a few more optimizations to make, and we also need to re-enable reputation updating (this was temporarily disabled because it needed further optimization to avoid unacceptable sync slow downs that caused excess loading on hivemind nodes when receiving real world traffic).

We have an optimized version of the reputation sync alogorithm on a local dev system, but we’ll be testing it further on one of our experimental API servers with real world API traffic before making it part of the official build.

We are currently running a full hivemind sync (this has generally been a 4 day process) to see if there’s any problems, as we’ve been skipping this process for the past week and doing incremental upgrades to our existing hivemind database in order to test new changes quickly.

Experimenting with optimum API node configuration

Another thing we've been doing this week is experimenting with configuring our API node for optimal performance. I've been sharing some of that information we've discovered in the API node operators channel, but I'll make a full report here later about our findings, once we've completed that work.

Condenser (open-source code for hive.blog)

We made some more changes to hive.blog and it’s wallet related to changes in hardfork 24 (mostly to the wallet as we already made several updates to condenser itself), especially removing usages of the get_state function which is being obsoleted in favor of more efficient API calls.
https://gitlab.syncad.com/hive/wallet/-/merge_requests/38
https://gitlab.syncad.com/hive/wallet/-/merge_requests/39
https://gitlab.syncad.com/hive/wallet/-/merge_requests/40
https://gitlab.syncad.com/hive/wallet/-/merge_requests/42
https://gitlab.syncad.com/hive/wallet/-/merge_requests/43
https://gitlab.syncad.com/hive/wallet/-/merge_requests/45

https://gitlab.syncad.com/hive/condenser/-/merge_requests/118
https://gitlab.syncad.com/hive/condenser/-/merge_requests/121
https://gitlab.syncad.com/hive/condenser/-/merge_requests/119

One fix we need to deploy soon is a change so that condenser correctly updates the vote button state after a user votes:
https://gitlab.syncad.com/hive/condenser/-/merge_requests/129

What’s next for the week?

We have a few more optimizations to make to hivemind, and I expect we’ll get a few more bug reports, plus we still need to deploy the final reputation calculation code. But I expect that work to slow down in the next couple of days, although the full hive sync test won’t likely complete until near the end of the week (we already observed one slow down today in the full sync with the latest changes that needs analysis).

We’ll also be testing condenser and the wallet and looking for fixes and optimizations we can make.

Sort:  

Not wanting to add to that workload but the patch to restore Hive balances to accounts excluded from the airdrop did not work correctly. Not sure what the problem was, some accounts got zero and some accounts got a little it would seem. I assume there must be a snapshot of the balance at the time of the initial fork to Hive that you can look at to see what went wrong.

Yes, it's known issue. @howo has been looking into it and working for a solution. There should be some communication from @hiveio about it soon.

News on this in the next few days. There was an error restoring vested balances (HIVE Power). HIVE and HBD was successfully airdropped

Thx for the update, I think to resolve the majority of issues of hardforks in the future, Hive needs a proper testnet for both layer 1 and layer 2 (Hivemind and Condenser frontend), so that users can test the update before it is activated by the Witnesses. If there would be a beta hive.blog site available with all the new code deployed to a small testnet (single nodes running new hived and hivemind code), most of the obvious bugs should be caught before going into production.

It was the first hardfork on Hive and it was done by the new team. Congratulations!

Do you think the development team is ready to launch the Smart Media Tokens?

How is the communication to exchanges are handled ? Most of the exchanges have deposits / withdrawals suspended. Is it like, they have to follow and do the needful or someone representing the witness community reaches to them and inform, after everything completes ?

We have several technical people keeping them informed and assisting with any issues they report.

That's great, I think, we should have some updates on that as well, like when can we expect them to enable based on these discussions. Also a list of all exchanges along with the progress, because that will be our message to everyone out there. We barely hear from the exchanges in a timely manner.

It's been a general policy not to discuss communications with exchanges in the normal course of things (from what I know, that's because of wishes of exchanges, but I haven't had direct confirmation of that). If someone wants to add to some report on the status of exchange deposit/withdrawal, it could be a useful service (maybe as part of one of those daily automated posts).

I think, those who are dealing with the matter are the most qualified to add some report. Can you please inform someone from that several technical people to start the initiative ?

Putting information from our side, should not hurt anyway to the exchanges. We can even choose not to publish about those exchanges, who wish not to share details, but at least it will be worth sharing about those who are fine.

Good to see time will be taken to examine the issue during the Hard Fork. I was never involved in any Hard Forks on steemit other than as a user, it seemed to me they did not do much post Hard Fork discussions, and that may have been part of the problems in some of the hard forks. A lessons learned meeting should help prevent or put people on alert for the next hard fork.

From a user only point of view, I understand that they all have issues. i don't think there has ever been a seamless hard fork in the three and half years I have been active on the chains.. I kind of missed being able to watch the discord chatter, but that is likely a good thing.

Hopefully all the post test tweaking will not cause to many headaches, and that people will realize that there is after fork and behind the scenes things that need to be done before we all start jumping into the HF25 wagon.

I noticed in the author rewards section, it no longer is showing the amount of HBD earned on a given post, is that something that will correct over the coming week?

Also, the HBD wallets have been down on Bittrex for close to a month at this point. I am assuming this was related to the HF, any idea when those might come back online?

I wasn't aware of author rewards bug, but I've filed an issue on it now.

Bittrex is fixing an issue now with their HBD support. My guess is that it will be fixed soon, but can't say for sure.

Hi @blocktrades, I'm just curious but I discovered that you have delegated over 500K Hive Power to the account @usainvote. This account is also downvoting content on Hives Trending Page.
What is your purpose doing so? Since you are one of the most influential accounts here on Hive.

I vote, you vote, we all vote!

Yeah, I get it, but it would help to get some transparency on why you downvote certain content on the trending page.
Since most users on Hive have the impression that @blocktrades acts as a developing company here on Hive, and not as an individual content manager.
At least that was my impression from what you guys have been doing. I don't mind if you use your HP to curate content, but somehow creating a second account @usainvote and then downvoting content on the trending page seems kind of fishy to me...sorry.

I am not blocktrades.

Ok, but again why is @blocktrades delegating 500K HP to your account. Thats not a small amount.
I can't imagine that @blocktrades is giving out 500K HP just for fun...just saying. And you guys also don't want to answer the question, which is ok...but that makes it seems even more fishy to me, sorry.

It is paid for like most delegations.

@usainvote Hello colleague, I have received downvotes in my account and I do not know why, can you explain me and help me improve if something I am doing incorrectly? From already thank you very much!

Brother i want sorry to say what is my Offense? I shall always grateful to you. Many many thanks.

I see you consistently upvoted @jrcornel, giving the impression that you valued his content. As you perhaps still do not know, a group campaign, based on their own selective criteria which they know not how to express, has downvoted him to zero and basically ruined his existence here on HIVE. As such, I just wanted you to know what one core member of this community had to say about @jrcornel and ask if you still think it’s appropriate not to come to his defence? Really come your own defence? Because not only was that said about him (and it's very representative of the group think, I might add), but it was also said about everyone who valued, curated and upvoted his content.

Yes, that’s how they feel about you too.

Best Regards

TotalDisrespect.jpg

Thanks for your juicy support. I really appreciate. Happy new month to you. Greetings. Gracias.

Hello my friend, can you let me know why you downvoted several of my posts?
I put a lot of effort into creating posts, and getting negative feedback from you makes me very frustrated.

All things considered the unintentional hard fork wasn't a disaster and a good lesson was learned from it. Thank you for the explanation of the hiccups.

That unintentional fork could have gone so much worse. Pretty amazing that it went so smoothly. Great work from the witnesses there

There is a mistake with communities. You can't mute post in hive.blog or peakd

Congratulations @blocktrades! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s) :

Your post got the highest payout of the week

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

To support your work, I also upvoted your post!

Do not miss the last post from @hivebuzz:

October 2020 is the World Mental Heath Month

Communication during a HF needs to improve (maybe 2-3 announcements per day in the HIVE discord channel?) but other than that, there is not much to criticise.

Thanks for the updates. I can only imagine and appreciate all the works you guys are doing at the background to have everything up and running.

Good morning hope you have a wonderful night

Not the worst fork imaginable and for the endusers, it is more frustrating than anything else, as there always seemed to be one frontend or another working.

中午好,晚上好,早上好

Great to hear the postmortems.

Great work done. Hat's off to all witnesses. If communication frequency is increased, i believe we can also avoid small issues. Overall well done.

It's all a learning process and we can hope things will be better next time. At least the system was not totally down for any significant time. I just worry about less technical users who may not see posts like this. The front ends should somehow make them aware that there are ongoing issues and suggest solutions, such as switching node. User confusion can cause bigger issues.

Cheers.

Thank you all the hard work

Thanks for the update guys and keep up the amazing work

Changes always come with some headache tho

Keep up the amazing work, Blocktrades and team!

It wasn't the worst HF and quite smooth. Thanks for the update.

Looks overall like great success, despite some unexpected gobsmackers. Even I, incompetent as I am, am restored to Hive (although I've had to switch browsers).

Thanks!

Nice work.
Good explanation.
I think we are on the right track.

We appreciate your hard works @blocktrades

Oh well, mistakes happen. Monero had the reverse problem at the same time: two transactions validated according to the old protocol were included in a block after the hard fork.

Bitcoin Twitter's reaction: never take any risk whatsoever to improve anything at all...

Dear friend @blocktrades, your work is greatly appreciated. However, since the day of the update, I have noticed that the notifications are late, apart from that, I see that my Posts are less seen and voted, even from you, since it was you who at first gave me a lot of encouragement with your support for. I hope everything can be fixed soon. A greeting.

There's currently work being done to speed up processing of notifications.

Great to hear the news on the updates and improvements

Thank you for the update
$tangent


Congratulations, @asimo You Successfully Trended The Post Shared By @blocktrades.
You Utilized 1/3 Daily Summon Bot Calls.

TAN Current Market Price : 0.200 HIVE

Many thanks for the comprehensive explanation. It is good that the HardFork is now completed. Everything is running fine and there are still some bugs to be fixed that are not system critical. What I think is a pity that the Airdrop did not work. I would like to know why it did not work. Is a new HardFork necessary to give the Airdrop to the affected users or is this also possible with a proposal?

Greetings Michael

!invest_vote
!jeenger

My respect!

A good solution!

I voted for the proposal.

Michael

Talkin in a hole mima. This guy´s do not answer. :-) They are in their heaven.

Yep, you could never be wrong.

If i see i am wrong, i think about changes. Steps back to topic ways of life, looks like the best if i fly in the sky and loose energy on the point of no return. ;-)

@mima2606 denkt du hast ein Vote durch @investinthefutur verdient!
@mima2606 thinks you have earned a vote of @investinthefutur !

Your contribution was curated manually by @mima2606
Keep up the good work!


Congratulations @blocktrades, You Earned 0.652 TAN & Curators Made 0.652 TAN.

upme.link


Join CORE / VAULT Token Discord Channel or Join UPMELINK Web Site
TAN Current Market Price : 0.200 HIVE