-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add filesize when using the Rest Service #1868
Conversation
As reported in https://discuss.elastic.co/t/358630 Filesize is missing when using the Rest service.
Quality Gate passedIssues Measures |
when i used the follow command: the result is : """, filesize = -1 . |
Argh! I need to do more testing 😬 |
That's weird! When I'm executing the tests from the Java code it works well but from the CLI I can reproduce the behavior you are seeing... Checking... |
I use the python client ,and the filesize = -1 |
The bug exist in the newsest package. fscrawler-distribution-2.10-20240503.070246-354.zip please used the the follow command:
and use the python client like
|
I know. I can reproduce the problem...
|
So apparently curl -v --trace-ascii debug.txt -F "file=@test.txt" "http://127.0.0.1:8080/fscrawler/_document?simulate=true"
cat debug.txt Gives
That's why I can reproduce this... I will do some other checks. |
Apparently the size field does not seem mandatory as per the RFC... See curl/curl#13527. So I need to compute it if not provided. |
So there is a workaround using tags: echo "This is my text" > test.txt
curl -F "file=@test.txt" \
-F "tags={\"file\":{\"filesize\":$(ls -l test.txt | awk '{print $5}')}}" \
"http://127.0.0.1:8080/fscrawler/_document" Let me know if this works for you. |
See discussion at: curl/curl#13527 Calling curl with a file does not provide the `size` field for the file: ```sh curl --trace-ascii debug.txt -F "file=@test.txt" "http://127.0.0.1:8080/fscrawler/_document" ``` Gives: ```txt == Info: Trying 127.0.0.1:8080... == Info: Connected to 127.0.0.1 (127.0.0.1) port 8080 => Send header, 224 bytes (0xe0) 0000: POST /fscrawler/_document?simulate=true HTTP/1.1 0032: Host: 127.0.0.1:8080 0048: User-Agent: curl/8.4.0 0060: Accept: */* 006d: Content-Length: 214 0082: Content-Type: multipart/form-data; boundary=-------------------- 00c2: ----VzJBwyDNXJA2IVvgyzIvvA 00de: => Send data, 214 bytes (0xd6) 0000: --------------------------VzJBwyDNXJA2IVvgyzIvvA 0032: Content-Disposition: form-data; name="file"; filename="test.txt" 0074: Content-Type: text/plain 008e: 0090: This is my text. 00a2: --------------------------VzJBwyDNXJA2IVvgyzIvvA-- == Info: We are completely uploaded and fine <= Recv header, 17 bytes (0x11) 0000: HTTP/1.1 200 OK <= Recv header, 32 bytes (0x20) 0000: Content-Type: application/json <= Recv header, 21 bytes (0x15) 0000: Content-Length: 489 <= Recv header, 2 bytes (0x2) 0000: <= Recv data, 489 bytes (0x1e9) 0000: {. "ok" : true,. "filename" : "test.txt",. "url" : "https://1 0040: 27.0.0.1:9200/rest/_doc/dd18bf3a8ea2a3e53e2661c7fb53534",. "doc 0080: " : {. "content" : "This is my text\n\n",. "meta" : { },. 00c0: "file" : {. "extension" : "txt",. "content_type" : 0100: "text/plain; charset=ISO-8859-1",. "indexing_date" : "2024- 0140: 05-03T10:39:47.685+00:00",. "filesize" : -1,. "filenam 0180: e" : "test.txt". },. "path" : {. "virtual" : "test.tx 01c0: t",. "real" : "test.txt". }. }.} == Info: Connection #0 to host 127.0.0.1 left intact ``` Important part is: ```txt 0000: --------------------------VzJBwyDNXJA2IVvgyzIvvA 0032: Content-Disposition: form-data; name="file"; filename="test.txt" 0074: Content-Type: text/plain 008e: 0090: This is my text. 00a2: --------------------------VzJBwyDNXJA2IVvgyzIvvA-- == Info: We are completely uploaded and fine ``` We can see that the `size` of the file is not provided. But when calling the same endpoint using Java `jakarta.ws.rs.client` client, the `size` is provided: ``` 1 > PUT http://127.0.0.1:8080/fscrawler/_document/1234 1 > Accept: multipart/form-data,application/json 1 > Content-Type: multipart/form-data --Boundary_1_46114008_1714750065797 Content-Type: application/octet-stream Content-Disposition: form-data; filename="test.txt"; modification-date="Fri, 03 May 2024 15:27:44 GMT"; size=30; name="file" This file contains some words. --Boundary_1_46114008_1714750065797-- ``` The [RFC-2183](https://datatracker.ietf.org/doc/html/rfc2183#section-2.7) does not make this parameter mandatory. So the workaround is to compute it from the CLI and send it as a tag: ```sh echo "This is my text" > test.txt curl -F "file=@test.txt" \ -F "tags={\"file\":{\"filesize\":$(ls -l test.txt | awk '{print $5}')}}" \ "http://127.0.0.1:8080/fscrawler/_document" ``` Related to #1868
|
See discussion at: curl/curl#13527 Calling curl with a file does not provide the `size` field for the file: ```sh curl --trace-ascii debug.txt -F "file=@test.txt" "http://127.0.0.1:8080/fscrawler/_document" ``` Gives: ```txt == Info: Trying 127.0.0.1:8080... == Info: Connected to 127.0.0.1 (127.0.0.1) port 8080 => Send header, 224 bytes (0xe0) 0000: POST /fscrawler/_document?simulate=true HTTP/1.1 0032: Host: 127.0.0.1:8080 0048: User-Agent: curl/8.4.0 0060: Accept: */* 006d: Content-Length: 214 0082: Content-Type: multipart/form-data; boundary=-------------------- 00c2: ----VzJBwyDNXJA2IVvgyzIvvA 00de: => Send data, 214 bytes (0xd6) 0000: --------------------------VzJBwyDNXJA2IVvgyzIvvA 0032: Content-Disposition: form-data; name="file"; filename="test.txt" 0074: Content-Type: text/plain 008e: 0090: This is my text. 00a2: --------------------------VzJBwyDNXJA2IVvgyzIvvA-- == Info: We are completely uploaded and fine <= Recv header, 17 bytes (0x11) 0000: HTTP/1.1 200 OK <= Recv header, 32 bytes (0x20) 0000: Content-Type: application/json <= Recv header, 21 bytes (0x15) 0000: Content-Length: 489 <= Recv header, 2 bytes (0x2) 0000: <= Recv data, 489 bytes (0x1e9) 0000: {. "ok" : true,. "filename" : "test.txt",. "url" : "https://1 0040: 27.0.0.1:9200/rest/_doc/dd18bf3a8ea2a3e53e2661c7fb53534",. "doc 0080: " : {. "content" : "This is my text\n\n",. "meta" : { },. 00c0: "file" : {. "extension" : "txt",. "content_type" : 0100: "text/plain; charset=ISO-8859-1",. "indexing_date" : "2024- 0140: 05-03T10:39:47.685+00:00",. "filesize" : -1,. "filenam 0180: e" : "test.txt". },. "path" : {. "virtual" : "test.tx 01c0: t",. "real" : "test.txt". }. }.} == Info: Connection #0 to host 127.0.0.1 left intact ``` Important part is: ```txt 0000: --------------------------VzJBwyDNXJA2IVvgyzIvvA 0032: Content-Disposition: form-data; name="file"; filename="test.txt" 0074: Content-Type: text/plain 008e: 0090: This is my text. 00a2: --------------------------VzJBwyDNXJA2IVvgyzIvvA-- == Info: We are completely uploaded and fine ``` We can see that the `size` of the file is not provided. But when calling the same endpoint using Java `jakarta.ws.rs.client` client, the `size` is provided: ``` 1 > PUT http://127.0.0.1:8080/fscrawler/_document/1234 1 > Accept: multipart/form-data,application/json 1 > Content-Type: multipart/form-data --Boundary_1_46114008_1714750065797 Content-Type: application/octet-stream Content-Disposition: form-data; filename="test.txt"; modification-date="Fri, 03 May 2024 15:27:44 GMT"; size=30; name="file" This file contains some words. --Boundary_1_46114008_1714750065797-- ``` The [RFC-2183](https://datatracker.ietf.org/doc/html/rfc2183#section-2.7) does not make this parameter mandatory. So the workaround is to compute it from the CLI and send it as a tag: ```sh echo "This is my text" > test.txt curl -F "file=@test.txt" \ -F "tags={\"file\":{\"filesize\":$(ls -l test.txt | awk '{print $5}')}}" \ "http://127.0.0.1:8080/fscrawler/_document" ``` Related to #1868
Fix documentation for `filesize` is not provided by curl See discussion at: curl/curl#13527 Calling curl with a file does not provide the `size` field for the file: ```sh curl --trace-ascii debug.txt -F "file=@test.txt" "http://127.0.0.1:8080/fscrawler/_document" ``` Gives: ```txt == Info: Trying 127.0.0.1:8080... == Info: Connected to 127.0.0.1 (127.0.0.1) port 8080 => Send header, 224 bytes (0xe0) 0000: POST /fscrawler/_document?simulate=true HTTP/1.1 0032: Host: 127.0.0.1:8080 0048: User-Agent: curl/8.4.0 0060: Accept: */* 006d: Content-Length: 214 0082: Content-Type: multipart/form-data; boundary=-------------------- 00c2: ----VzJBwyDNXJA2IVvgyzIvvA 00de: => Send data, 214 bytes (0xd6) 0000: --------------------------VzJBwyDNXJA2IVvgyzIvvA 0032: Content-Disposition: form-data; name="file"; filename="test.txt" 0074: Content-Type: text/plain 008e: 0090: This is my text. 00a2: --------------------------VzJBwyDNXJA2IVvgyzIvvA-- == Info: We are completely uploaded and fine <= Recv header, 17 bytes (0x11) 0000: HTTP/1.1 200 OK <= Recv header, 32 bytes (0x20) 0000: Content-Type: application/json <= Recv header, 21 bytes (0x15) 0000: Content-Length: 489 <= Recv header, 2 bytes (0x2) 0000: <= Recv data, 489 bytes (0x1e9) 0000: {. "ok" : true,. "filename" : "test.txt",. "url" : "https://1 0040: 27.0.0.1:9200/rest/_doc/dd18bf3a8ea2a3e53e2661c7fb53534",. "doc 0080: " : {. "content" : "This is my text\n\n",. "meta" : { },. 00c0: "file" : {. "extension" : "txt",. "content_type" : 0100: "text/plain; charset=ISO-8859-1",. "indexing_date" : "2024- 0140: 05-03T10:39:47.685+00:00",. "filesize" : -1,. "filenam 0180: e" : "test.txt". },. "path" : {. "virtual" : "test.tx 01c0: t",. "real" : "test.txt". }. }.} == Info: Connection #0 to host 127.0.0.1 left intact ``` Important part is: ```txt 0000: --------------------------VzJBwyDNXJA2IVvgyzIvvA 0032: Content-Disposition: form-data; name="file"; filename="test.txt" 0074: Content-Type: text/plain 008e: 0090: This is my text. 00a2: --------------------------VzJBwyDNXJA2IVvgyzIvvA-- == Info: We are completely uploaded and fine ``` We can see that the `size` of the file is not provided. But when calling the same endpoint using Java `jakarta.ws.rs.client` client, the `size` is provided: ``` 1 > PUT http://127.0.0.1:8080/fscrawler/_document/1234 1 > Accept: multipart/form-data,application/json 1 > Content-Type: multipart/form-data --Boundary_1_46114008_1714750065797 Content-Type: application/octet-stream Content-Disposition: form-data; filename="test.txt"; modification-date="Fri, 03 May 2024 15:27:44 GMT"; size=30; name="file" This file contains some words. --Boundary_1_46114008_1714750065797-- ``` The [RFC-2183](https://datatracker.ietf.org/doc/html/rfc2183#section-2.7) does not make this parameter mandatory. So the workaround is to compute it from the CLI and send it as a tag: ```sh echo "This is my text" > test.txt curl -F "file=@test.txt" \ -F "tags={\"file\":{\"filesize\":$(ls -l test.txt | awk '{print $5}')}}" \ "http://127.0.0.1:8080/fscrawler/_document" ``` Related to #1868
As reported in https://discuss.elastic.co/t/358630
Filesize is missing when using the Rest service.