How to fix canonical URLs and links in your pre-fork posts

in HiveDevs4 years ago (edited)

As you can see here, I have some links in my blogs written before the hive-fork pointing to steemit.com. It's time to replace them all.

steemit link in my blog

Almost all blog posts written before the fork are written with apps that are not included in hivescript and lead to problems with canonical URLs:

As it can be seen here, my old post update for beem: first release for HF 21 result in different canonical URLs on different front-ends. This is then handled as duplicated content by the search engines.
The post was written through palnet. As there is no entry for palnet in hivescript, the front-ends to not know how to build a proper canonical URL:
wrong canonical url

Fixing The mess

Fixing means:

  • replacing all steemit, steempeak, ... links with relative links
  • setting canonical_url for each post written before 2020-03-20, to fix canonical URLs.

Small update

The script uses now relative links, when found a link to steemit.com ..., it will be replaced by a relative link. A relative link looks like: [holger80](/@holger80) and [this post](/hive-139531/@holger80/how-to-fix-canonical-urls-and-links-in-your-pre-fork-posts)

Small update 2

There are now three boolean parameters, which can be used to set the following:

  • replace_steemit_links: when True, steemit, ... links will be replaced
  • use_relative_links: when True, relative links will be used (starting with /)
  • add_canonical_url: When True, a canonical_url is added to the metadata

Small update 3

It is now possible to use the same script for fixing the canonical links on STEEM for all written post before the fork.
When you want to use the script on STEEM:

  • set target_blockchain = "steem"

When you want to use the script on HIVE:

  • set target_blockchain = "hive"

Python code

The following script is using beem and will do exactly this.

beem can be installed by

pip install beem

or

conda install beem

Store the following as fix_canonical_urls_hive.py:

#!/usr/bin/python
from beem import Hive, Steem
from beem.utils import addTzInfo
from beem.account import Account
from beem.comment import Comment
from beem.nodelist import NodeList
import time
from datetime import datetime
import getpass


if __name__ == "__main__":
    # Parameter
    canonical_url = "https://hive.blog"
    replace_steemit_links = True
    use_relative_links = True
    add_canonical_url = True
    target_blockchain = "hive" # can be hive or steem
    # ----
    # at least one option must be true
    assert replace_steemit_links or add_canonical_url
    assert target_blockchain in ["hive", "steem"]
    # Canonical url must not end with /
    if canonical_url[-1] == "/":
        canonical_url = canonical_url[:-1]
    nodelist = NodeList()
    nodelist.update_nodes()
    test_run_answer = input("Do a test run? [y/n]")
    if test_run_answer in ["y", "Y", "yes"]:
        test_run = True
        print("Doing a test run on %s!" % target_blockchain)
    else:
        test_run = False
    if test_run:
        if target_blockchain == "hive":
            blockchain_instance= Hive(node=nodelist.get_hive_nodes())
        else:
            blockchain_instance= Steem(node="https://api.steemit.com")
    else:
        wif = getpass.getpass(prompt='Enter your posting key for %s.' % target_blockchain)
        if target_blockchain == "hive":
            blockchain_instance = Hive(node=nodelist.get_hive_nodes(), keys=[wif])
        else:
            blockchain_instance = Steem(node="https://api.steemit.com", keys=[wif])
    if target_blockchain == "hive":
        assert blockchain_instance.is_hive
    else:
        assert blockchain_instance.is_steem

    account = input("Account name =")
    account = Account(account, blockchain_instance=blockchain_instance)
    if add_canonical_url:
        print("Start to fix canonical_url on %s for %s" % (target_blockchain, account["name"]))
    if replace_steemit_links:
        print("Start to replace steemit links on %s for %s" % (target_blockchain, account["name"]))
    
    apps_with_cannonical_url = ["hiveblog", "peakd", "esteem", "steempress", "actifit",
                                "travelfeed", "3speak", "steemstem", "leofinance", "clicktrackprofit",
                                "dtube"]
    hive_fork_date = addTzInfo(datetime(2020, 3, 20, 14, 0, 0))
    blog_count = 0
    expected_count = 100
    while expected_count - blog_count == 100:
        
        for blog in account.get_blog_entries(start_entry_id=blog_count, raw_data=False):
            blog_count += 1
            if blog["parent_author"] != "":
                continue
            if blog["author"] != account["name"]:
                continue
            if "canonical_url" in blog.json_metadata and canonical_url in blog.json_metadata["canonical_url"]:
                continue
            if "app" in blog.json_metadata and blog.json_metadata["app"].split("/")[0] in apps_with_cannonical_url and target_blockchain == "hive":
                continue
            if blog["created"] > hive_fork_date:
                continue
            body = blog.body
            if "links" in blog.json_metadata:
                links = blog.json_metadata["links"]
            else:
                links = None
            if "links" in blog.json_metadata and replace_steemit_links:
                for link in blog.json_metadata["links"]:
                    if "steemit.com" in link or "steempeak.com" in link or "busy.org" in link or "partiko.app" in link:
                        authorperm = link.split("@")
                        acc = None
                        post = None
                        new_link = ""
                        if len(authorperm) == 1:
                            continue
                        authorperm = authorperm[1]
                        if authorperm.find("/") == -1:
                            try:
                                acc = Account(authorperm, blockchain_instance=blockchain_instance)
                                if use_relative_links:
                                    new_link = "/@" + acc["name"]
                                else:
                                    new_link = canonical_url + "/@" + acc["name"]
                            except:
                                continue
                        else:
                            try:
                                post = Comment(authorperm, blockchain_instance=blockchain_instance)
                                if use_relative_links:
                                    new_link =  "/" + post.category + "/" + post.authorperm
                                else:
                                    new_link =  canonical_url + "/" + post.category + "/" + post.authorperm
                            except:
                                continue
                        if new_link != "":
                            for i in range(len(links)):
                                if links[i] == link:
                                    links[i] = new_link
                            body = body.replace(link, new_link)
                            print("Replace %s with %s" % (link, new_link))
                            
            json_metadata = blog.json_metadata or {}
            if links is not None and replace_steemit_links:
                json_metadata["links"] = links            
            if add_canonical_url:
                json_metadata["canonical_url"] = canonical_url + "/" + blog["category"] + "/@" + blog["author"] + "/" + blog["permlink"]
                print("Edit post nr %d with canonical_url=%s" % (blog_count, json_metadata["canonical_url"]))
            print("---")
            if not test_run:
                try:
                    blog.edit(body, meta=json_metadata, replace=True)
                except:
                    print("Skipping %s due to error" % blog.authorperm)
                time.sleep(6)

        expected_count += 100
    
    

You can now start the script with:

python fix_canonical_urls_hive.py

If you are on Linux, you should replace pip by pip3 and python by python3.

How does it work

The script goes through all blog posts written before 2020-03-14. Whenever the post was written by an app, that is not properly handled by hivescript, a new canonical_url is set.

You can define your preferred front-end here:

canonical_url = "https://hive.blog"

If you like other front-ends, you can replace this line by

  • canonical_url = "https://peakd.com"
  • canonical_url = "https://leofinance.io"
  • canonical_url = "https://esteem.app"

In the next step, all used links are checked. Whenever a link is pointing to a valid hive post or to a valid hive user, the link is replaced by a releative url (When the link was pointing to steemit.com, steempeak.com, busy.org or partiko.app).

Test run

You can do a test run and checking what will be changed by the script:
test_run

This show now the following information:

test result
The set canonical url is shown as well all links that will be replaced.

Fixing your posts

We can now start to fix all old posts:
starting the script

Results

All changes have been broadcasted:
Broadcasted posts

The links have been corrected, as shown here:

There seems to be a bug with hive.blog, that steemit.com links are shown as internal and hive.blog links are shown as external links.

The canonical url is also fixed:

canonical urls

It seems that esteem.app has not changed its canonical url right now. As I know that esteem.app should read the canonical_url parameter (works for steempress), it may correct the canonical URLs later.

After a fix on esteem.app, esteem.app is using now the correct canonical url:
canonical url from esteem.app

Results on STEEM

Setting canonical_url works also on steemit:

canonical url on steemit

I used seoreviewtools to check the canonical urls.


If you like what I do, consider casting a vote for me as witness on Hivesigner or on PeakD

Sort:  

This changes the canonical link on the Hive blockchain. Shouldn't we do same for the Steem blockchain? Otherwise we may have two canonical links set, one on Hive, one on Steem and search engines will either ignore both, or will consider the domain with the highest authority as the source, and penalize the others, or will penalize all domains.

I updated the script, it can now be used on Steem to set canonical urls.

Hi @holger80! I tried the script for Steem. It works in the test run (with a minor update, to catch unexpected json metadata fields - for example the app parley did set a dict for "app" with more details, rather than the standard string).

But the actual post edits, none are successful (I checked on steemd). When printing the error(s), it says it's (<class 'AttributeError'>, AttributeError("'PointJacobi' object has no attribute '_Point__x'"), <traceback object at 0x7fc4b9dc7aa0>).

I'm not experienced with Python, but after a search this is what I used to get info about the error within the except block of the post edits broadcasts: e = sys.exc_info().

Any idea what could generate this error?

There is a package missing. Which operating system do you use?

Here I'm using Ubuntu 18.04.

EDIT: I also have beem 0.23.9, the latest pip upgraded to.

This is strange, can you double check that you are using the newest version?

beempy about

I will test in the meanwhile beem on a newly installed machine.

Yeah, I was right:

beempy version: 0.23.9

By @holger80

EDITS:
I also have Anaconda 4.8.3, if it matters:

conda --version
conda 4.8.3

Python version is 3.7.7:

python --version
Python 3.7.7

That's true, I will prepare a script that will set the canonical on all steem posts.

Just asked the same question, before I saw yours. I agree we want to update canonical links in as many places as possible.

What are the chances of this becoming an online tool that simple users can use without having to run Python scripts on their own machines?

Don't use canonical URLs. Use relative URLs. If Hive is supposed to be distributed, if we open a Hive link in some other front end, the Hive link should open in the same front end. Centralizing over one front end, is a problem with our culture.

For example, my blog is here

I agree relative links are better. I will change the script and it will replace steemit.com links with relative links now.

Short test if it does also work for posts:
my post

I think it would be best to separate the functionality of updating in-post links and the canonical link. For example, I want to update the canonical link on all my posts, but not sure I'm ready to update in-post links, since oftentimes I chose certain frontends for certain reasons when linking to things.

Would be good to have an option to do each of the actions.

I added parameters at the top of the script, which can be used to define the behavior of the script.

Good idea, I will make it optional.

This is great to help Hive with SEO rankings... But I have no clue how to use this script.

Do you have some tutorials for learning from scratch how to use python to interact with the blockchain?

Which operation system do you use?

I have use Windows 10 on my computer which is the OS I use the most, but also have a partition with a version of Ubuntu installed on it.

The easiest way to start on windows 10 is:

  • download and install anaconda python
  • Open the anaconda prompt
  • Enter conda install beem
  • Store the script as fix_canonical_urls.pyfile
  • got to the directory with cd inside the anaconda prompt
  • run the script with python fix_canonical_urls.py inside the anaconda prompt

Sweet! I'll do this later today (I mostly work on my laptop during nighttime). And about the tutorials about how to learn coding in python focusing on tools for interacting with the blockchain? Have you wrote some or know of somebody who did?

Do you have a specific topic in mind? It is on my todo list to write some tutorials using the beem library.

Well I think about what you built as an airplane, I want to know how to pull the levers so I can take off and find out where it gets me lol... More seriously, I've been thinking about developing two projects... A mobile games for Hive and a site where we can bet on sports. I'm not the guy who knows how to code tho, I can only do 3D modeling and I'm fairly good a project management... Nevertheless it's never too late to learn and this lockdown hell pushes me to seek for challenges to prevent my brain cells to suffer boredom decay lol.

So far I've been working behind the scenes for the betting site, but a mobile game would give cool as well.

Maybe too ambitious, but they say that it's easy to learn if you already have goals about what you want to create

Thanks for the implementation. Looking forward to trying it out once the Steem updates are also figured out.

The code is a bit complex. You might enjoy this talk from PyCon 2019. I learned a lot about code complexity from it!

Thanks for the video, I will watch it :).

Wow ! Thank you this is very very useful.

Thanks very much for this! So this will only replace the canonical links and will not break my blog, right? (Sorry for the questions, just want to be sure :D)

Yes, it will set canonical links and replace steemit links. I tested the script on my posts and it worked :).

So this is updating posts on the Hive blockchain only? Is it possible to also update posts on Steem to set their canonical link to hive.blog?

I improved the script, it is now possible to set parameters in the header. When setting target_blockchain to steem, it adds canonical_urls on blog posts on steam.

Awesome. Running it now. Here's an example transactions.

I checked on steemit.com and it sets the canonical URL to Hive. However, @steempeak hasn't yet switched the canonical URL to the hive.blog one.

Does the script create a totally new in the transaction? It seems like it shouldn't be necessary to repost the entire text, but just edit the json_metadata instead. Is that possible?

Lol i uave like 9000 posts how how fucking hivepower mana wouldd i need ?

I am being punished for having used steem the most. I have crazy people like pfunk running around asking people in steemspeak.com discord to DELETE their steem blogs all because he was wronged. Its so iratuonalky self centered spoiled brat behaviour

Now everyone expexts m3 to take all this rime to erase steem from eveeything after i invested all that time into promoting it? Fuck that and the shitty anarchist philoaphies that got thismplace nowhwre

Top witnesses literaky got scammed and no one will admit that it was always a ponzi ? Leta hope hive can set up some form of actual governance like telos did.... i know behind the scenes top telos and hivedevs are all worki g eslecially eith scot bot

If we can get scot bot for telos and move hive to telos qe dan have free acxounts and just use hive for a social media.dapp ijside telos. The old hive chain can still funcyion as hive classic but will just be used for wocial networkibg front ends etc

We should migrate to telos or merge to telos and then you can apply for block onw angle funcung from the billions at eos vc https://eos.vc i fant do it alone but if u all teamed up and MERGED HIVE with teloa then it would qualify for eosio funding millions of dollars for adveetising can go in and teloa will have all the governance and support and dapp ecosyatem and hive will have all the social networking but come on l3ts adnit we need help thats the first step

If hive axtually merged with felos we could stand a chance at having tye world care and hive could be front page top 20 coin . Qe just need the uwers of hive with the back wnd infrastructure and funding acces of telos eosio and uts 2 million + free wallets imagine not having to pay mollions of dollars just to sign up a million usrrs

Congratulations @holger80! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s) :

Your post got the highest payout of the day

You can view your badges on your board And compare to others on the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Support the HiveBuzz project. Vote for our proposal!
Loading...