Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filesize is not provided by curl #1871

Merged
merged 3 commits into from
May 13, 2024
Merged

filesize is not provided by curl #1871

merged 3 commits into from
May 13, 2024

Conversation

dadoonet
Copy link
Owner

@dadoonet dadoonet commented May 3, 2024

See discussion at: curl/curl#13527

Calling curl with a file does not provide the size field for the file:

curl --trace-ascii debug.txt -F "file=@test.txt" "http://127.0.0.1:8080/fscrawler/_document"

Gives:

== Info:   Trying 127.0.0.1:8080...
== Info: Connected to 127.0.0.1 (127.0.0.1) port 8080
=> Send header, 224 bytes (0xe0)
0000: POST /fscrawler/_document?simulate=true HTTP/1.1
0032: Host: 127.0.0.1:8080
0048: User-Agent: curl/8.4.0
0060: Accept: */*
006d: Content-Length: 214
0082: Content-Type: multipart/form-data; boundary=--------------------
00c2: ----VzJBwyDNXJA2IVvgyzIvvA
00de:
=> Send data, 214 bytes (0xd6)
0000: --------------------------VzJBwyDNXJA2IVvgyzIvvA
0032: Content-Disposition: form-data; name="file"; filename="test.txt"
0074: Content-Type: text/plain
008e:
0090: This is my text.
00a2: --------------------------VzJBwyDNXJA2IVvgyzIvvA--
== Info: We are completely uploaded and fine
<= Recv header, 17 bytes (0x11)
0000: HTTP/1.1 200 OK
<= Recv header, 32 bytes (0x20)
0000: Content-Type: application/json
<= Recv header, 21 bytes (0x15)
0000: Content-Length: 489
<= Recv header, 2 bytes (0x2)
0000:
<= Recv data, 489 bytes (0x1e9)
0000: {.  "ok" : true,.  "filename" : "test.txt",.  "url" : "https://1
0040: 27.0.0.1:9200/rest/_doc/dd18bf3a8ea2a3e53e2661c7fb53534",.  "doc
0080: " : {.    "content" : "This is my text\n\n",.    "meta" : { },.
00c0:    "file" : {.      "extension" : "txt",.      "content_type" :
0100: "text/plain; charset=ISO-8859-1",.      "indexing_date" : "2024-
0140: 05-03T10:39:47.685+00:00",.      "filesize" : -1,.      "filenam
0180: e" : "test.txt".    },.    "path" : {.      "virtual" : "test.tx
01c0: t",.      "real" : "test.txt".    }.  }.}
== Info: Connection #0 to host 127.0.0.1 left intact

Important part is:

0000: --------------------------VzJBwyDNXJA2IVvgyzIvvA
0032: Content-Disposition: form-data; name="file"; filename="test.txt"
0074: Content-Type: text/plain
008e:
0090: This is my text.
00a2: --------------------------VzJBwyDNXJA2IVvgyzIvvA--
== Info: We are completely uploaded and fine

We can see that the size of the file is not provided.

But when calling the same endpoint using Java jakarta.ws.rs.client client, the size is provided:

1 > PUT http://127.0.0.1:8080/fscrawler/_document/1234
1 > Accept: multipart/form-data,application/json
1 > Content-Type: multipart/form-data
--Boundary_1_46114008_1714750065797
Content-Type: application/octet-stream
Content-Disposition: form-data; filename="test.txt"; modification-date="Fri, 03 May 2024 15:27:44 GMT"; size=30; name="file"

This file contains some words.
--Boundary_1_46114008_1714750065797--

The RFC-2183 does not make this parameter mandatory. So the workaround is to compute it from the CLI and send it as a tag:

echo "This is my text" > test.txt
curl -F "file=@test.txt" \
  -F "tags={\"file\":{\"filesize\":$(ls -l test.txt | awk '{print $5}')}}" \
  "http://127.0.0.1:8080/fscrawler/_document"

Related to #1868

@dadoonet dadoonet added bug For confirmed bugs component:rest labels May 3, 2024
@dadoonet dadoonet added this to the 2.10 milestone May 3, 2024
@dadoonet dadoonet self-assigned this May 3, 2024
See discussion at: curl/curl#13527

Calling curl with a file does not provide the `size` field for the file:

```sh
curl --trace-ascii debug.txt -F "file=@test.txt" "http://127.0.0.1:8080/fscrawler/_document"
```

Gives:

```txt
== Info:   Trying 127.0.0.1:8080...
== Info: Connected to 127.0.0.1 (127.0.0.1) port 8080
=> Send header, 224 bytes (0xe0)
0000: POST /fscrawler/_document?simulate=true HTTP/1.1
0032: Host: 127.0.0.1:8080
0048: User-Agent: curl/8.4.0
0060: Accept: */*
006d: Content-Length: 214
0082: Content-Type: multipart/form-data; boundary=--------------------
00c2: ----VzJBwyDNXJA2IVvgyzIvvA
00de:
=> Send data, 214 bytes (0xd6)
0000: --------------------------VzJBwyDNXJA2IVvgyzIvvA
0032: Content-Disposition: form-data; name="file"; filename="test.txt"
0074: Content-Type: text/plain
008e:
0090: This is my text.
00a2: --------------------------VzJBwyDNXJA2IVvgyzIvvA--
== Info: We are completely uploaded and fine
<= Recv header, 17 bytes (0x11)
0000: HTTP/1.1 200 OK
<= Recv header, 32 bytes (0x20)
0000: Content-Type: application/json
<= Recv header, 21 bytes (0x15)
0000: Content-Length: 489
<= Recv header, 2 bytes (0x2)
0000:
<= Recv data, 489 bytes (0x1e9)
0000: {.  "ok" : true,.  "filename" : "test.txt",.  "url" : "https://1
0040: 27.0.0.1:9200/rest/_doc/dd18bf3a8ea2a3e53e2661c7fb53534",.  "doc
0080: " : {.    "content" : "This is my text\n\n",.    "meta" : { },.
00c0:    "file" : {.      "extension" : "txt",.      "content_type" :
0100: "text/plain; charset=ISO-8859-1",.      "indexing_date" : "2024-
0140: 05-03T10:39:47.685+00:00",.      "filesize" : -1,.      "filenam
0180: e" : "test.txt".    },.    "path" : {.      "virtual" : "test.tx
01c0: t",.      "real" : "test.txt".    }.  }.}
== Info: Connection #0 to host 127.0.0.1 left intact
```

Important part is:

```txt
0000: --------------------------VzJBwyDNXJA2IVvgyzIvvA
0032: Content-Disposition: form-data; name="file"; filename="test.txt"
0074: Content-Type: text/plain
008e:
0090: This is my text.
00a2: --------------------------VzJBwyDNXJA2IVvgyzIvvA--
== Info: We are completely uploaded and fine
```

We can see that the `size` of the file is not provided.

But when calling the same endpoint using Java `jakarta.ws.rs.client` client, the `size` is provided:

```
1 > PUT http://127.0.0.1:8080/fscrawler/_document/1234
1 > Accept: multipart/form-data,application/json
1 > Content-Type: multipart/form-data
--Boundary_1_46114008_1714750065797
Content-Type: application/octet-stream
Content-Disposition: form-data; filename="test.txt"; modification-date="Fri, 03 May 2024 15:27:44 GMT"; size=30; name="file"

This file contains some words.
--Boundary_1_46114008_1714750065797--
```

The [RFC-2183](https://datatracker.ietf.org/doc/html/rfc2183#section-2.7) does not make this parameter mandatory.
So the workaround is to compute it from the CLI and send it as a tag:

```sh
echo "This is my text" > test.txt
curl -F "file=@test.txt" \
  -F "tags={\"file\":{\"filesize\":$(ls -l test.txt | awk '{print $5}')}}" \
  "http://127.0.0.1:8080/fscrawler/_document"
```

Related to #1868
Copy link

sonarcloud bot commented May 13, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@dadoonet dadoonet merged commit 32113f9 into master May 13, 2024
13 checks passed
@dadoonet dadoonet deleted the doc/add-filesize branch May 13, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug For confirmed bugs component:rest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant