Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate documentation around fingerprint file identity in the Filestream output #42264

Open
rdner opened this issue Jan 8, 2025 · 1 comment
Labels
docs Filebeat Filebeat Team:Docs Label for the Observability docs team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@rdner
Copy link
Member

rdner commented Jan 8, 2025

Describe the enhancement:

We switched to the fingerprint-based file identity in Filestream by default (see #40197) we now should improve the documentation around it.

Prior the new default, due to the nature of the implementation, it required to set multiple parameters in 2 separate config sections to enable the fingerprint-based file identity:

Currently the documentation is scattered between these 2 sections, which is far from ideal.

A consolidated documentation section would be a better solution, it should cover:

  • Reasons behind using the fingerprint-based file identity instead of other options (there was a blog post about it that can be used for the docs https://www.elastic.co/blog/introducing-filestream-fingerprint-mode)
  • Edge cases that might confuse users new to the concept:
    • The default length of the fingerprint header is 1024 bytes and the default offset is 0 – both are configurable
    • Changing any of these two settings leads to changing of file IDs which leads to re-ingesting all files causing duplicates
    • Decreasing the length makes the identity less reliable (more likelihood of a collision)
    • Any file under the offset+length in size will not be ingested until it grows larger in size to cross this threshold.
    • For users who are planning to ingest files < 1KB in size the default configuration would not work anymore
@rdner rdner added docs Filebeat Filebeat Team:Docs Label for the Observability docs team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Jan 8, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@rdner rdner changed the title Consolidated documentation around fingerprint file identity in the Filestream output Consolidate documentation around fingerprint file identity in the Filestream output Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Filebeat Filebeat Team:Docs Label for the Observability docs team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

No branches or pull requests

2 participants