Canonical links of Hive posts and duplicate content

in #seo4 years ago (edited)

Canonical links can be used when the same content is available through different links. On the Hive chain, the same content can be viewed through several frontends. Normally, the frontend which is used to write the post will set the app/canonical_url field in the json-metadata field to itself. There is an agreement between all hive frontends to set the canonical URL in the same way based on these both parameters.

E.g. when writing a post with peakd, the app parameter is set to peakd and all hive front-ends will use a canonical URL starting with https://peakd.com.

Using a canonical url such that all frontends use the same canonical url helps search engines to

  • indicate which front-end should be shown in search results
  • validate that there is no duplicate content

When a frontend is not doing this, the search-engine decides on various metrics which frond-end should be shown.

Let's check this on my post howto-to-use-beempy-for-cold-transaction-signing, which I wrote with the peakd-frontend.

Searching with google

When entering some text from my post into google search, I receive the following result:
image.png

Google.com returns only a link to the hive.blog frontend. This is strange as the canonical link points to the peakd.com site.

Check for duplicate content

I'm using the duplicate content checker from seoreviewtools

I will check my post for duplicate content:

image.png

Two sites are identified as duplicate content:
image.png

Frontends with wrong canonical links

It seems that @busy.org and @steem.leo did not use the correct canonical link. It seems they busy.org still works. As busy.org is using anyx.io as RPC-node, it is showing hive content now.

Check for canonical links

I'm using seoreviewtools for checking the canonical links of several hive frontends:

image.png

image.png

Results

hive.blog, peakd.com and esteem.app are using the correct canonical link, whereas

are not pointing to peakd. https://leofinance.io is even using steemit.com as canonical link.

@steemstem, @steem.leo and @busy.org could you please fix your canonical links? That would be great! You can find the standard for hive apps here: https://github.com/ledgerconnect/hivescript

Canonical URLs on blog posts written before the hive fork

I wrote a python script to set canonical link for all blog posts written before the hive fork. You can find it here: How to fix canonical URLs and links in your pre-fork posts

Currently the same content is available in the HIVE and the STEEM blockchain. Blog posts written before the fork are well not handled by the front-ends and each front-end shows a different canonical link.

The python script lets the user decide which blockchain and which frontend should be used and broadcast the choice by setting canonical_url to the blockchain.

Sort:  

Thanks for doing the research on this @holger80. We're working on a fix now 🦁

Posted using LeoFinance

Update, should be good now :)

image.png

image.png

Posted Using LeoFinance

As I wrote https://leofinance.io/seo/@holger80/canonical-links-of-hive-posts-and-duplicate-content on peakd.com,the canonical URL should point to peakd.com and not to hive.blog.

I think, @eonwarped is on it and will fix it soon.

maybe you edit it from hive.blog .... you know when you eddit a post from one front end it automaticaly changes the cononical link to that front end

Canonical links can be used when the same content is available through different links. They are important for search engines:

  • indicate which link should be shown in search results
  • prevent duplicate content detection

As much as it may bode ill for my potential profits, @heimindanger is right at a certain level. The statement which you have made above is factually incorrect.

Canonical links are specifically intended to reference the most referenced content when a particular target piece is referenced from multiple places. It's a form of aliasing. (At least within the context of search engine design.)

If you write an article and it is referenced by three people, two of them on PeakD and one of them on Steemit.com, the proper canonical link should be the PeakD link. If the numbers happened to break the other way, the canonical link should be the Steemit.com link.

The real problem for your position is that the idea of canonicity doesn't make any sense for a distributed social blockchain interface. None of those postings have the blessing of canon because none of them is more authoritative than any of the others. They are all equivalent. The only thing that matters, from a user perspective, is that some of the interfaces through which they can be accessed have different feature sets – and that is not a concern for a search engine.

Trying to force canonicity on platforms you have no control over seems like an aggressive overreach. With content equivalence, none of that matters.

There's a strong argument that those who decided to aggressively split the original blockchain and reproduce the original content they had no rights to created a situation in which canonicity became meaningful to the people on the inside even as it remains unimportant to the wider world. Or, more explicitly, that the creators of Hive decided to create a situation which made it important who blessed certain content was a terrible, game breaking idea.

Search engine maintainers shouldn't have to keep up with the petty politics of every social media blockchain split that happens to make us happy. From their perspective, canonicity is where that content was originally found by link – and honestly, that's probably the best anyone can hope for.

Thank you for your reply. I think that the content producer itself should be able to decide about the canonical link of his content.

I rewrote the introduction and I created a python script How to fix canonical URLs and links in your pre-fork posts that let the user decide which front-end and which blockchain should be set as canonical URL for their content.

Loading...

I know there is this issue with stem.openhive for months, and I have already tried to patch it in several ways (this includes the canocnical. Everything failed. I still have a few things to try out, before having to rewrite links and other stuff too, all related to SEO) entirely the routing system on the app (which is a left-over from the initial developer, i.e. a part of the code I didn't write myself).

It is on the to-do list, and in a quite high position (item number 2).

It has to be fixed, so that more and more links will be searched through Google. Let's see when it will be fixed.

Just here for pinging @lemouth because he might be interested in reading this post. He is developing the STEMsocial frontend as far as I know.

Thanks! I indeed saw the post and replied ;)

Great info...tnx

Great work Holger!

Thanks that is a good update. Kudos for the good work.

$rewarding 10 %

The reward of this comment goes 100 % to the author holger80. This is done by setting the beneficiaries of this comment to 100 %.

@holger80, does this manual delayed vote follow the rules I have set in settings? For example if my VP is lower than what I set, or if I unchecked 'enable voting' - rewarding will not use my vote to vote on this beneficiary comment? (As it seems to have not here I did so manually just now)

Is there a way to make it not listen to that when choosing to do a manual delayed vote?

Just lost a weeks work by an editing glitch, by updating a totally different post it somehow replaced another much more important one.

How do i go back into the Edit History?

hive.blog mikewick77: full edit history

thank you.

Why would peakd be the canonical link instead of busy or hive.blog or leofinance?

I think the idea is that the canonical link is whatever site the user used to post the content. I'd be in favor of this being the default, but also allowing the user to override it from the interface.

For example, I might post using @peakd for its powerful interface, but because it is not open source and I have less confidence in its longterm persistence compared to hive.blog, I might want the canonical link to be hive.blog.

I doubt any project will want to put a canonical link going to a website they don't own, as it would reduce their "google power".

If I'm not mistaken, sites are penalized in terms of SEO ranking for posting duplicate content and not specifying it the canonical link. Once most Hive frontends are setting the proper canonical, it will be clear to search engines which version is the true canonical. Therefore versions that don't specify canonical and are not the true canonical might be penalized.

This is somewhat speculation, but I'd be surprised if Google doesn't do something similar.

I would side with your speculation.