Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Update smart_open example code to prevent decompression error (fixes #59). #60

Merged
merged 2 commits into from
May 27, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,15 +203,22 @@ session = boto3.Session(

url = 's3://clp-example-s3-bucket/example.clp.zst'
# Using `smart_open.open` to stream the encoded CLP IR:
with smart_open.open(url, "rb", transport_params={'client': session.client('s3')}) as istream:
with smart_open.open(url, mode="rb", compression='disable',
transport_params={'client': session.client('s3')}) as istream:
junhaoliao marked this conversation as resolved.
Show resolved Hide resolved
with ClpIrStreamReader(istream, allow_incomplete_stream=True) as clp_reader:
for log_event in clp_reader:
# Print the log message with its timestamp properly formatted.
print(log_event.get_formatted_message())
```

Note:
When `allow_incomplete_stream` is set to False (default), the reader will raise
- Setting `compression='disable'` is necessary because smart_open by default
infers the compression format from the file extension and decompresses the file
automatically. However, the ClpIrStreamReader expects the input stream to be in
the original compressed format. Disabling the automatic decompression ensures
that the stream passed to ClpIrStreamReader remains compressed, preventing
decompression errors.
junhaoliao marked this conversation as resolved.
Show resolved Hide resolved
- When `allow_incomplete_stream` is set to False (default), the reader will raise
`clp_ffi_py.ir.IncompleteStreamError` if the stream is incomplete (it doesn't end
with the byte sequence indicating the stream's end). In practice, this can occur
if you're reading a stream that is still being written or wasn't properly
Expand Down
Loading