Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we ignore old spec versions with robot.txt ? #506

Closed
derberg opened this issue Dec 13, 2021 · 18 comments
Closed

Should we ignore old spec versions with robot.txt ? #506

derberg opened this issue Dec 13, 2021 · 18 comments

Comments

@derberg
Copy link
Member

derberg commented Dec 13, 2021

Reason/Context

This is a Google Search Console summary for November:

Screenshot 2021-12-13 at 11 34 17

So https://www.asyncapi.com/docs/specifications/v2.0.0 is top-performing which is not the best, it should be either https://www.asyncapi.com/docs/specifications/latest or https://www.asyncapi.com/docs/specifications/v2.2.0

This is what I get not even in Google Search but Brave Search:
Screenshot 2021-12-13 at 11 36 30

Description

One possible solution could be that we add below to robot.txt (actually add this file too):

Disallow: /docs/specifications/v2.0.0
Disallow: /docs/specifications/v2.1.0

The problem I see is that then if someone queries for asyncapi specification 2.0 they will not find this page. Now, is it really a problem? they can just access latest and then navigate to older? right?

For sure, current issue is a real issue as not all users will notice they ended up reading old stuff.

The alternative would be a huge banner on 2.0 and 2.1 and other older versions saying this is old, we suggest you use the latest

Thoughts?

@derberg derberg transferred this issue from asyncapi/.github Dec 13, 2021
@mcturco
Copy link
Member

mcturco commented Dec 21, 2021

I definitely agree with you that we need to attempt to crawl the latest version versus the others. The disallow rules that you have suggested above are perfect for achieving this, ESPECIALLY because it looks like there is duplicate content across all of the versions (there are probably only subtle differences, but Google can still tell that it is duplicated if there are enough similarities).

In addition to this, we can use canonical urls to help Google know which page is the main page (or most recent version) and this way we can keep the other pages indexed but its our way of telling Google "hey, these pages are duplicate content, please only look at the main one" (Here is the docs page from GSC for reference)

For example:
On the following pages
https://www.asyncapi.com/docs/specifications/v2.1.0
https://www.asyncapi.com/docs/specifications/v2.0.0

We will add this tag in the <head>:
<link rel="canonical" href="https://www.asyncapi.com/docs/specifications/v2.2.0" />

And over time, Google will know to only crawl the canonical url, which will be the latest version of the spec. I wouldn't use: https://www.asyncapi.com/docs/specifications/latest as it seems to be a redirect. It would be best practice to use the main URL.

*Disclaimer: I am not an SEO expert but these are just things that I learned as a developer at an agency working with an SEO team

@smoya
Copy link
Member

smoya commented Dec 22, 2021

And over time, Google will know to only crawl the canonical url, which will be the latest version of the spec. I wouldn't use: https://www.asyncapi.com/docs/specifications/latest as it seems to be a redirect. It would be best practice to use the main URL.

That's right but wrong at the same time. Let me explain. Theoretically, Google won't stop crawling those URLs but will decrease their periodicity. But we don't exactly know, because in fact, Google could consider them equal content or rather different (because they are actually different) and skip those tags.

I think we should spend some time understanding what is currently happening on those paths via the Google Search Console (only the owners of the domain can do unfortunately). That tool shows you information about what URL google decided as canonical, etc etc.

@mcturco
Copy link
Member

mcturco commented Dec 22, 2021

@smoya yeah I guess Google's alg in general is a toss up these days 😆 I wonder too if it would consider those pages duplicate content or if it knows to detect that there are minor differences? Maybe yeah due to this we should analyze the GSC before we do anything, agree with you there!

@derberg
Copy link
Member Author

derberg commented Jan 3, 2022

but isn't robot.txt enough? I guess this is still a standard for all bots, where you can tell the bot exactly what pages to ignore. I used it in my previous project -> https://kyma-project.io/robots.txt. It must work as they restructured their docs but forgot to update robots.txt and their docs are not indexed properly.

@smoya
Copy link
Member

smoya commented Jan 4, 2022

@derberg But you don't want to stop those pages (older spec versions) from being indexed.

@derberg
Copy link
Member Author

derberg commented Jan 11, 2022

well this is my main question in the issue title 😄
so I think, we should ignore them, but I'm probably missing some downsides 🤔

@smoya
Copy link
Member

smoya commented Jan 11, 2022

Let's say I'm a developer of a project that uses AsyncAPI 2.1.0. If I need to check the documentation, I want to quickly get the one for 2.1.0. I will type into google AsyncAPI spec 2.1.0 documentation and I expect a result to drive me to it, not to another version.

@derberg
Copy link
Member Author

derberg commented Jan 12, 2022

the thing is that content is almost the same, so when you google AsyncAPI spec 2.1.0 you get 2.0.0. If you mark 2.2 as canonical, then when you google AsyncAPI spec 2.1.0 you will get 2.2.0.

maybe we should experiment with meta tags?

for sure improve description so it is unique per version

<meta name="description" content="AsyncAPI Specification
Disclaimer
Part of this content has been taken from the great work done by the folks at the OpenAPI Initiative. Mainly because it&amp;amp#39;s a great work and we want to keep as mu">

and then also add keywords?

@smoya
Copy link
Member

smoya commented Jan 12, 2022

the thing is that content is almost the same, so when you google AsyncAPI spec 2.1.0 you get 2.0.0. If you mark 2.2 as canonical, then when you google AsyncAPI spec 2.1.0 you will get 2.2.0.

This is exactly the concern I mentioned few comments above: #506 (comment)

maybe we should experiment with meta tags?

I think we could try it, but I would say in combination of what @mcturco suggested.

and then also add keywords?

You mean on the title? Around the copy? Not sure about the usage of keywords nowadays TBH.

@derberg
Copy link
Member Author

derberg commented Jan 13, 2022

This is exactly the concern I mentioned few comments above: #506 (comment)

oh, sorry, now I got it

You mean on the title? Around the copy? Not sure about the usage of keywords nowadays TBH.

yeah, just add version number to keyword, that is it, we have no keywords now, so 🤷🏼

@smoya @mcturco
ok, folks, we had a neat discussion here, thanks!
let's just try different solutions one by one and monitor with the next reports of Search Console, and let's see 👀

  • I suggest we first try with metadata as we anyway need to fix description as they are not cool cross spec versions
  • then we try with canonical link, this one I think is more tricky to add. I mean more complex, as it is not as simple as manually changing just HTML files
  • if above change nothing we go back to topic of robot.txt and just blocking old versions from indexing

Thoughts?

@mcturco
Copy link
Member

mcturco commented Jan 13, 2022

@derberg sounds good to me!

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

@github-actions github-actions bot added the stale label May 14, 2022
@derberg derberg removed the stale label May 16, 2022
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

@github-actions github-actions bot added the stale label Apr 14, 2023
@smoya
Copy link
Member

smoya commented Apr 15, 2023

still relevant

@github-actions github-actions bot removed the stale label Apr 15, 2023
@sambhavgupta0705
Copy link
Member

@smoya @derberg currently we have 3.x version so what should we do with this issue??

@smoya
Copy link
Member

smoya commented Apr 2, 2024

This got solved organically then 😆. But the reality is that will probably hit us again in the future when v4 gets released.

@sambhavgupta0705
Copy link
Member

closing this for now
We will open an issue if this hits again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants