Skip to content
paul van genuchten edited this page May 23, 2016 · 1 revision

Crawlability of spatial data and geonetwork by search engines is a challenge. In the geo4web testbed, organised by Geonovum, we've done some work to improve crawlability by search engines.

Search engines generally can't follow ajax links, the suggested approach is to improve the existing nojs interface and encourage catalogue-administrators to register the nojs interface (or the sitemap) at search engines. Introduce a formatter that displays a full nicely formatted html page of the full metadata, and link to that from the sitemap and nojs-interface.

In the html page embed schema.org/Dataset annotations, so the html representation can also be read as structured data.

Sitemap should be paginated, to manage the full catalogue content. A page should not contain too many references, since it causes memory issues and slow responses.

In the scope of the project we've also added proxy functionality to enable geonetwork to transform any csw-response to schema.og eriched html.

In the project we've also developed software that acts as a wfs-to-html proxy, so any wfs-record can be viewed as html 9and crawled by search engines). For those metadata records that are registered in the ldproxy, the metadata detail page should display a link to the proxied wfs, so the search engine is able to crawl through the link.

Clone this wiki locally