See discussion at: curl/curl#13527
Calling curl with a file does not provide the `size` field for the file:
```sh
curl --trace-ascii debug.txt -F "file=@test.txt" "http://127.0.0.1:8080/fscrawler/_document"
```
Gives:
```txt
== Info: Trying 127.0.0.1:8080...
== Info: Connected to 127.0.0.1 (127.0.0.1) port 8080
=> Send header, 224 bytes (0xe0)
0000: POST /fscrawler/_document?simulate=true HTTP/1.1
0032: Host: 127.0.0.1:8080
0048: User-Agent: curl/8.4.0
0060: Accept: */*
006d: Content-Length: 214
0082: Content-Type: multipart/form-data; boundary=--------------------
00c2: ----VzJBwyDNXJA2IVvgyzIvvA
00de:
=> Send data, 214 bytes (0xd6)
0000: --------------------------VzJBwyDNXJA2IVvgyzIvvA
0032: Content-Disposition: form-data; name="file"; filename="test.txt"
0074: Content-Type: text/plain
008e:
0090: This is my text.
00a2: --------------------------VzJBwyDNXJA2IVvgyzIvvA--
== Info: We are completely uploaded and fine
<= Recv header, 17 bytes (0x11)
0000: HTTP/1.1 200 OK
<= Recv header, 32 bytes (0x20)
0000: Content-Type: application/json
<= Recv header, 21 bytes (0x15)
0000: Content-Length: 489
<= Recv header, 2 bytes (0x2)
0000:
<= Recv data, 489 bytes (0x1e9)
0000: {. "ok" : true,. "filename" : "test.txt",. "url" : "https://1
0040: 27.0.0.1:9200/rest/_doc/dd18bf3a8ea2a3e53e2661c7fb53534",. "doc
0080: " : {. "content" : "This is my text\n\n",. "meta" : { },.
00c0: "file" : {. "extension" : "txt",. "content_type" :
0100: "text/plain; charset=ISO-8859-1",. "indexing_date" : "2024-
0140: 05-03T10:39:47.685+00:00",. "filesize" : -1,. "filenam
0180: e" : "test.txt". },. "path" : {. "virtual" : "test.tx
01c0: t",. "real" : "test.txt". }. }.}
== Info: Connection #0 to host 127.0.0.1 left intact
```
Important part is:
```txt
0000: --------------------------VzJBwyDNXJA2IVvgyzIvvA
0032: Content-Disposition: form-data; name="file"; filename="test.txt"
0074: Content-Type: text/plain
008e:
0090: This is my text.
00a2: --------------------------VzJBwyDNXJA2IVvgyzIvvA--
== Info: We are completely uploaded and fine
```
We can see that the `size` of the file is not provided.
But when calling the same endpoint using Java `jakarta.ws.rs.client` client, the `size` is provided:
```
1 > PUT http://127.0.0.1:8080/fscrawler/_document/1234
1 > Accept: multipart/form-data,application/json
1 > Content-Type: multipart/form-data
--Boundary_1_46114008_1714750065797
Content-Type: application/octet-stream
Content-Disposition: form-data; filename="test.txt"; modification-date="Fri, 03 May 2024 15:27:44 GMT"; size=30; name="file"
This file contains some words.
--Boundary_1_46114008_1714750065797--
```
The [RFC-2183](https://datatracker.ietf.org/doc/html/rfc2183#section-2.7) does not make this parameter mandatory.
So the workaround is to compute it from the CLI and send it as a tag:
```sh
echo "This is my text" > test.txt
curl -F "file=@test.txt" \
-F "tags={\"file\":{\"filesize\":$(ls -l test.txt | awk '{print $5}')}}" \
"http://127.0.0.1:8080/fscrawler/_document"
```
Related to #1868