-
-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we ignore old spec versions with robot.txt ? #506
Comments
I definitely agree with you that we need to attempt to crawl the latest version versus the others. The disallow rules that you have suggested above are perfect for achieving this, ESPECIALLY because it looks like there is duplicate content across all of the versions (there are probably only subtle differences, but Google can still tell that it is duplicated if there are enough similarities). In addition to this, we can use canonical urls to help Google know which page is the main page (or most recent version) and this way we can keep the other pages indexed but its our way of telling Google "hey, these pages are duplicate content, please only look at the main one" (Here is the docs page from GSC for reference) For example: We will add this tag in the And over time, Google will know to only crawl the canonical url, which will be the latest version of the spec. I wouldn't use: https://www.asyncapi.com/docs/specifications/latest as it seems to be a redirect. It would be best practice to use the main URL. *Disclaimer: I am not an SEO expert but these are just things that I learned as a developer at an agency working with an SEO team |
That's right but wrong at the same time. Let me explain. Theoretically, Google won't stop crawling those URLs but will decrease their periodicity. But we don't exactly know, because in fact, Google could consider them equal content or rather different (because they are actually different) and skip those tags. I think we should spend some time understanding what is currently happening on those paths via the Google Search Console (only the owners of the domain can do unfortunately). That tool shows you information about what URL google decided as canonical, etc etc. |
@smoya yeah I guess Google's alg in general is a toss up these days 😆 I wonder too if it would consider those pages duplicate content or if it knows to detect that there are minor differences? Maybe yeah due to this we should analyze the GSC before we do anything, agree with you there! |
but isn't robot.txt enough? I guess this is still a standard for all bots, where you can tell the bot exactly what pages to ignore. I used it in my previous project -> https://kyma-project.io/robots.txt. It must work as they restructured their docs but forgot to update robots.txt and their docs are not indexed properly. |
@derberg But you don't want to stop those pages (older spec versions) from being indexed. |
well this is my main question in the issue title 😄 |
Let's say I'm a developer of a project that uses AsyncAPI |
the thing is that content is almost the same, so when you google maybe we should experiment with meta tags? for sure improve
and then also add |
This is exactly the concern I mentioned few comments above: #506 (comment)
I think we could try it, but I would say in combination of what @mcturco suggested.
You mean on the title? Around the copy? Not sure about the usage of keywords nowadays TBH. |
oh, sorry, now I got it
yeah, just add version number to keyword, that is it, we have no keywords now, so 🤷🏼 @smoya @mcturco
Thoughts? |
@derberg sounds good to me! |
This issue has been automatically marked as stale because it has not had recent activity 😴 It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation. There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model. Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here. Thank you for your patience ❤️ |
This issue has been automatically marked as stale because it has not had recent activity 😴 It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation. There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model. Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here. Thank you for your patience ❤️ |
This issue has been automatically marked as stale because it has not had recent activity 😴 It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation. There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model. Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here. Thank you for your patience ❤️ |
still relevant |
This got solved organically then 😆. But the reality is that will probably hit us again in the future when v4 gets released. |
closing this for now |
Reason/Context
This is a Google Search Console summary for November:
So https://www.asyncapi.com/docs/specifications/v2.0.0 is top-performing which is not the best, it should be either https://www.asyncapi.com/docs/specifications/latest or https://www.asyncapi.com/docs/specifications/v2.2.0
This is what I get not even in Google Search but Brave Search:
Description
One possible solution could be that we add below to
robot.txt
(actually add this file too):The problem I see is that then if someone queries for
asyncapi specification 2.0
they will not find this page. Now, is it really a problem? they can just access latest and then navigate to older? right?For sure, current issue is a real issue as not all users will notice they ended up reading old stuff.
The alternative would be a huge banner on 2.0 and 2.1 and other older versions saying
this is old, we suggest you use the latest
Thoughts?
The text was updated successfully, but these errors were encountered: