Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filesize is not provided by curl #1871

Merged
merged 3 commits into from
May 13, 2024
Merged

filesize is not provided by curl #1871

merged 3 commits into from
May 13, 2024

Commits on May 13, 2024

  1. filesize is not provided by curl

    See discussion at: curl/curl#13527
    
    Calling curl with a file does not provide the `size` field for the file:
    
    ```sh
    curl --trace-ascii debug.txt -F "file=@test.txt" "http://127.0.0.1:8080/fscrawler/_document"
    ```
    
    Gives:
    
    ```txt
    == Info:   Trying 127.0.0.1:8080...
    == Info: Connected to 127.0.0.1 (127.0.0.1) port 8080
    => Send header, 224 bytes (0xe0)
    0000: POST /fscrawler/_document?simulate=true HTTP/1.1
    0032: Host: 127.0.0.1:8080
    0048: User-Agent: curl/8.4.0
    0060: Accept: */*
    006d: Content-Length: 214
    0082: Content-Type: multipart/form-data; boundary=--------------------
    00c2: ----VzJBwyDNXJA2IVvgyzIvvA
    00de:
    => Send data, 214 bytes (0xd6)
    0000: --------------------------VzJBwyDNXJA2IVvgyzIvvA
    0032: Content-Disposition: form-data; name="file"; filename="test.txt"
    0074: Content-Type: text/plain
    008e:
    0090: This is my text.
    00a2: --------------------------VzJBwyDNXJA2IVvgyzIvvA--
    == Info: We are completely uploaded and fine
    <= Recv header, 17 bytes (0x11)
    0000: HTTP/1.1 200 OK
    <= Recv header, 32 bytes (0x20)
    0000: Content-Type: application/json
    <= Recv header, 21 bytes (0x15)
    0000: Content-Length: 489
    <= Recv header, 2 bytes (0x2)
    0000:
    <= Recv data, 489 bytes (0x1e9)
    0000: {.  "ok" : true,.  "filename" : "test.txt",.  "url" : "https://1
    0040: 27.0.0.1:9200/rest/_doc/dd18bf3a8ea2a3e53e2661c7fb53534",.  "doc
    0080: " : {.    "content" : "This is my text\n\n",.    "meta" : { },.
    00c0:    "file" : {.      "extension" : "txt",.      "content_type" :
    0100: "text/plain; charset=ISO-8859-1",.      "indexing_date" : "2024-
    0140: 05-03T10:39:47.685+00:00",.      "filesize" : -1,.      "filenam
    0180: e" : "test.txt".    },.    "path" : {.      "virtual" : "test.tx
    01c0: t",.      "real" : "test.txt".    }.  }.}
    == Info: Connection #0 to host 127.0.0.1 left intact
    ```
    
    Important part is:
    
    ```txt
    0000: --------------------------VzJBwyDNXJA2IVvgyzIvvA
    0032: Content-Disposition: form-data; name="file"; filename="test.txt"
    0074: Content-Type: text/plain
    008e:
    0090: This is my text.
    00a2: --------------------------VzJBwyDNXJA2IVvgyzIvvA--
    == Info: We are completely uploaded and fine
    ```
    
    We can see that the `size` of the file is not provided.
    
    But when calling the same endpoint using Java `jakarta.ws.rs.client` client, the `size` is provided:
    
    ```
    1 > PUT http://127.0.0.1:8080/fscrawler/_document/1234
    1 > Accept: multipart/form-data,application/json
    1 > Content-Type: multipart/form-data
    --Boundary_1_46114008_1714750065797
    Content-Type: application/octet-stream
    Content-Disposition: form-data; filename="test.txt"; modification-date="Fri, 03 May 2024 15:27:44 GMT"; size=30; name="file"
    
    This file contains some words.
    --Boundary_1_46114008_1714750065797--
    ```
    
    The [RFC-2183](https://datatracker.ietf.org/doc/html/rfc2183#section-2.7) does not make this parameter mandatory.
    So the workaround is to compute it from the CLI and send it as a tag:
    
    ```sh
    echo "This is my text" > test.txt
    curl -F "file=@test.txt" \
      -F "tags={\"file\":{\"filesize\":$(ls -l test.txt | awk '{print $5}')}}" \
      "http://127.0.0.1:8080/fscrawler/_document"
    ```
    
    Related to #1868
    dadoonet committed May 13, 2024
    Configuration menu
    Copy the full SHA
    178ed41 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    70d915c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    994161e View commit details
    Browse the repository at this point in the history