Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Timeout on force_merge #1193

Open
disaster37 opened this issue Jun 17, 2024 · 2 comments
Open

[BUG] Timeout on force_merge #1193

disaster37 opened this issue Jun 17, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@disaster37
Copy link

What is the bug?
I Have ISM policy for hot / warm / delete topology and use data stream index to ingest logs.
On warm phase, I have a force_merge step set to 1, but this step always finished on timeout.

How I can know why this step stuck and finished to timeout ?

How can one reproduce the bug?

I have tested it on Opensearch 2.14.0 (form docker container)

Here, my policy

{
    "id": "policy-log",
    "seqNo": 177028,
    "primaryTerm": 10,
    "policy": {
        "policy_id": "policy-log",
        "description": "Policy for logs index",
        "last_updated_time": 1718199759103,
        "schema_version": 21,
        "error_notification": null,
        "default_state": "hot",
        "states": [
            {
                "name": "hot",
                "actions": [
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "rollover": {
                            "min_index_age": "1d",
                            "min_primary_shard_size": "5gb",
                            "copy_alias": false
                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "warm",
                        "conditions": {
                            "min_index_age": "0d"
                        }
                    }
                ]
            },
            {
                "name": "warm",
                "actions": [
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "allocation": {
                            "require": {
                                "temp": "warm"
                            },
                            "include": {},
                            "exclude": {},
                            "wait_for": false
                        }
                    },
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "index_priority": {
                            "priority": 50
                        }
                    },
                    {
                        "timeout": "1d",
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "force_merge": {
                            "max_num_segments": 1
                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "delete",
                        "conditions": {
                            "min_index_age": "1d"
                        }
                    }
                ]
            },
            {
                "name": "delete",
                "actions": [
                    {
                        "retry": {
                            "count": 3,
                            "backoff": "exponential",
                            "delay": "1m"
                        },
                        "delete": {}
                    }
                ],
                "transitions": []
            }
        ],
        "ism_template": [
            {
                "index_patterns": [
                    "logs-log-*"
                ],
                "priority": 100,
                "last_updated_time": 1718199759103
            }
        ]
    }
}

What is the expected behavior?

It merge segment to 1 instead to stuck on step failed with timeout.

What is your host/environment?

  • OS: [e.g. iOS]
  • Version [e.g. 22]: 2.14.0
  • Plugins

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

@disaster37 disaster37 added bug Something isn't working untriaged labels Jun 17, 2024
@dblock
Copy link
Member

dblock commented Jul 8, 2024

Thanks for opening this issue.

[Catch All Triage, attendees 1, 2, 3, 4, 5, 6, 7]

@dblock dblock removed the untriaged label Jul 8, 2024
@disaster37
Copy link
Author

Hi, I just get again timeout error. It's seems not the same cause as previously describe (disk space). All nodes have some empty space.
I juste get timeout error when I run GET _plugins/_ism/explain/.ds-logs-log-default-000030. No more detail.

The only suspect log that i have found is

{"type": "json_logger", "timestamp": "2024-10-14T15:37:59,195Z", "level": "DEBUG", "component": "o.o.i.i.ManagedIndexRunner", "cluster.name": "logmanagement2-prd", "node.name": "opensearch-hot1-os-1", "message": "Could not acquire lock [null] for .ds-logs-log-default-000030", "cluster.uuid": "ZVJvcMA3TmK00qqKawBTjg", "node.id": "2phZZeScRRCyHuZmDjKWLA"  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants