Should we ignore old spec versions with robot.txt ? #506

derberg · 2021-12-13T10:45:17Z

Reason/Context

This is a Google Search Console summary for November:

So https://www.asyncapi.com/docs/specifications/v2.0.0 is top-performing which is not the best, it should be either https://www.asyncapi.com/docs/specifications/latest or https://www.asyncapi.com/docs/specifications/v2.2.0

This is what I get not even in Google Search but Brave Search:

Description

One possible solution could be that we add below to robot.txt (actually add this file too):

Disallow: /docs/specifications/v2.0.0
Disallow: /docs/specifications/v2.1.0

The problem I see is that then if someone queries for asyncapi specification 2.0 they will not find this page. Now, is it really a problem? they can just access latest and then navigate to older? right?

For sure, current issue is a real issue as not all users will notice they ended up reading old stuff.

The alternative would be a huge banner on 2.0 and 2.1 and other older versions saying this is old, we suggest you use the latest

Thoughts?

The text was updated successfully, but these errors were encountered:

mcturco · 2021-12-21T16:09:30Z

I definitely agree with you that we need to attempt to crawl the latest version versus the others. The disallow rules that you have suggested above are perfect for achieving this, ESPECIALLY because it looks like there is duplicate content across all of the versions (there are probably only subtle differences, but Google can still tell that it is duplicated if there are enough similarities).

In addition to this, we can use canonical urls to help Google know which page is the main page (or most recent version) and this way we can keep the other pages indexed but its our way of telling Google "hey, these pages are duplicate content, please only look at the main one" (Here is the docs page from GSC for reference)

For example:
On the following pages
https://www.asyncapi.com/docs/specifications/v2.1.0
https://www.asyncapi.com/docs/specifications/v2.0.0

We will add this tag in the <head>:
<link rel="canonical" href="https://www.asyncapi.com/docs/specifications/v2.2.0" />

And over time, Google will know to only crawl the canonical url, which will be the latest version of the spec. I wouldn't use: https://www.asyncapi.com/docs/specifications/latest as it seems to be a redirect. It would be best practice to use the main URL.

*Disclaimer: I am not an SEO expert but these are just things that I learned as a developer at an agency working with an SEO team

smoya · 2021-12-22T11:17:06Z

And over time, Google will know to only crawl the canonical url, which will be the latest version of the spec. I wouldn't use: https://www.asyncapi.com/docs/specifications/latest as it seems to be a redirect. It would be best practice to use the main URL.

That's right but wrong at the same time. Let me explain. Theoretically, Google won't stop crawling those URLs but will decrease their periodicity. But we don't exactly know, because in fact, Google could consider them equal content or rather different (because they are actually different) and skip those tags.

I think we should spend some time understanding what is currently happening on those paths via the Google Search Console (only the owners of the domain can do unfortunately). That tool shows you information about what URL google decided as canonical, etc etc.

mcturco · 2021-12-22T15:41:05Z

@smoya yeah I guess Google's alg in general is a toss up these days 😆 I wonder too if it would consider those pages duplicate content or if it knows to detect that there are minor differences? Maybe yeah due to this we should analyze the GSC before we do anything, agree with you there!

derberg · 2022-01-03T10:15:12Z

but isn't robot.txt enough? I guess this is still a standard for all bots, where you can tell the bot exactly what pages to ignore. I used it in my previous project -> https://kyma-project.io/robots.txt. It must work as they restructured their docs but forgot to update robots.txt and their docs are not indexed properly.

smoya · 2022-01-04T12:56:44Z

@derberg But you don't want to stop those pages (older spec versions) from being indexed.

derberg · 2022-01-11T09:22:17Z

well this is my main question in the issue title 😄
so I think, we should ignore them, but I'm probably missing some downsides 🤔

smoya · 2022-01-11T10:14:11Z

Let's say I'm a developer of a project that uses AsyncAPI 2.1.0. If I need to check the documentation, I want to quickly get the one for 2.1.0. I will type into google AsyncAPI spec 2.1.0 documentation and I expect a result to drive me to it, not to another version.

derberg · 2022-01-12T15:18:06Z

the thing is that content is almost the same, so when you google AsyncAPI spec 2.1.0 you get 2.0.0. If you mark 2.2 as canonical, then when you google AsyncAPI spec 2.1.0 you will get 2.2.0.

maybe we should experiment with meta tags?

for sure improve description so it is unique per version

<meta name="description" content="AsyncAPI Specification
Disclaimer
Part of this content has been taken from the great work done by the folks at the OpenAPI Initiative. Mainly because it&amp;amp#39;s a great work and we want to keep as mu">

and then also add keywords?

smoya · 2022-01-12T21:50:03Z

the thing is that content is almost the same, so when you google AsyncAPI spec 2.1.0 you get 2.0.0. If you mark 2.2 as canonical, then when you google AsyncAPI spec 2.1.0 you will get 2.2.0.

This is exactly the concern I mentioned few comments above: #506 (comment)

maybe we should experiment with meta tags?

I think we could try it, but I would say in combination of what @mcturco suggested.

and then also add keywords?

You mean on the title? Around the copy? Not sure about the usage of keywords nowadays TBH.

derberg · 2022-01-13T08:05:06Z

This is exactly the concern I mentioned few comments above: #506 (comment)

oh, sorry, now I got it

You mean on the title? Around the copy? Not sure about the usage of keywords nowadays TBH.

yeah, just add version number to keyword, that is it, we have no keywords now, so 🤷🏼

@smoya @mcturco
ok, folks, we had a neat discussion here, thanks!
let's just try different solutions one by one and monitor with the next reports of Search Console, and let's see 👀

I suggest we first try with metadata as we anyway need to fix description as they are not cool cross spec versions
then we try with canonical link, this one I think is more tricky to add. I mean more complex, as it is not as simple as manually changing just HTML files
if above change nothing we go back to topic of robot.txt and just blocking old versions from indexing

Thoughts?

mcturco · 2022-01-13T14:42:30Z

@derberg sounds good to me!

github-actions · 2022-05-14T00:22:22Z

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

github-actions · 2022-09-14T00:23:15Z

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

github-actions · 2023-04-14T00:16:48Z

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

smoya · 2023-04-15T00:15:49Z

still relevant

sambhavgupta0705 · 2024-03-30T09:11:45Z

@smoya @derberg currently we have 3.x version so what should we do with this issue??

smoya · 2024-04-02T14:18:43Z

This got solved organically then 😆. But the reality is that will probably hit us again in the future when v4 gets released.

sambhavgupta0705 · 2024-04-20T16:31:58Z

closing this for now
We will open an issue if this hits again

derberg transferred this issue from asyncapi/.github Dec 13, 2021

github-actions bot added the stale label May 14, 2022

derberg removed the stale label May 16, 2022

github-actions bot added the stale label Sep 14, 2022

akshatnema removed the stale label Sep 14, 2022

github-actions bot added the stale label Apr 14, 2023

github-actions bot removed the stale label Apr 15, 2023

sambhavgupta0705 closed this as completed Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we ignore old spec versions with robot.txt ? #506

Should we ignore old spec versions with robot.txt ? #506

derberg commented Dec 13, 2021

mcturco commented Dec 21, 2021

smoya commented Dec 22, 2021

mcturco commented Dec 22, 2021

derberg commented Jan 3, 2022

smoya commented Jan 4, 2022

derberg commented Jan 11, 2022

smoya commented Jan 11, 2022

derberg commented Jan 12, 2022

smoya commented Jan 12, 2022

derberg commented Jan 13, 2022 •

edited

Loading

mcturco commented Jan 13, 2022

github-actions bot commented May 14, 2022

github-actions bot commented Sep 14, 2022

github-actions bot commented Apr 14, 2023

smoya commented Apr 15, 2023

sambhavgupta0705 commented Mar 30, 2024

smoya commented Apr 2, 2024

sambhavgupta0705 commented Apr 20, 2024

Should we ignore old spec versions with robot.txt ? #506

Should we ignore old spec versions with robot.txt ? #506

Comments

derberg commented Dec 13, 2021

Reason/Context

Description

mcturco commented Dec 21, 2021

smoya commented Dec 22, 2021

mcturco commented Dec 22, 2021

derberg commented Jan 3, 2022

smoya commented Jan 4, 2022

derberg commented Jan 11, 2022

smoya commented Jan 11, 2022

derberg commented Jan 12, 2022

smoya commented Jan 12, 2022

derberg commented Jan 13, 2022 • edited Loading

mcturco commented Jan 13, 2022

github-actions bot commented May 14, 2022

github-actions bot commented Sep 14, 2022

github-actions bot commented Apr 14, 2023

smoya commented Apr 15, 2023

sambhavgupta0705 commented Mar 30, 2024

smoya commented Apr 2, 2024

sambhavgupta0705 commented Apr 20, 2024

derberg commented Jan 13, 2022 •

edited

Loading