Skip to content

Latest commit

 

History

History
203 lines (161 loc) · 7.28 KB

s3.md

File metadata and controls

203 lines (161 loc) · 7.28 KB

S3

Simple Storage Service

  • object based, 0 to 5 TB per object
  • unlimited storage per bucket
  • names must be globally unique
  • key value store
  • API and CLI return HTTP 200 on success
  • success returns HTTP 200
  • minimum file size ????
  • soft limit 100 buckets per account
  • namespace
    • virtual
      • http://{bucketname}.s3.amazonaws.com
      • http://{bucketname}.s3-{region}.amazonaws.com
    • path deprecated for all new buckets Sept 2020
    • prefix, simulates directories
    • should use DNS safe names, otherwise, services that access the objects with RESTful paths will fail, like transfer acceleration
Storage class Designed for Availability Zones Min storage duration Min billable object size Monitoring and automation fees Retrieval fees Latency
Standard Frequently accessed data ≥ 3 - - - ms
Intelligent-Tiering Long-lived data with changing or unknown access patterns ≥ 3 30 days - Per-object fees apply - ms
Standard-IA Long-lived, infrequently accessed data ≥ 3 30 days 128KB - Per-GB fees apply ms
One Zone-IA Long-lived, infrequently accessed, non-critical data ≥ 1 30 days 128KB - Per-GB fees apply ms
Glacier Archive data with retrieval times ranging from minutes to hours ≥ 3 90 days 40KB - Per-GB fees apply min to hours
Glacier Deep Archive Archive data that rarely, if ever, needs to be accessed with retrieval times in hours ≥ 3 180 days 40KB - Per-GB fees apply hours
Reduced Redundancy (Not recommended) Frequently accessed, non-critical data ≥ 3 - - - - ms

Objects

  • values are byte streams
  • version ID, when versioning is enabled
  • meta data
    • http header variables
  • subresources
    • bucket polices,
    • ACL
    • CORS
    • transfer acceleration
  • built for 99.99% availability, guarantee 99.9%
  • 11 x 9's durability
  • can have tags
    • can be used to
      • monitor
      • ACL
      • lifecycle methods
    • key 128 characters
    • values 256 characters
    • 10 tags per object

Consistency models

  • read after write for PUTS of new objects is consistent
  • eventual consistency for overwrite PUTS and DELETES
  • reads can get old data or deleted data
  • latest timestamp wins on multiple writes

Storage Tiers/Classes

Standard

  • sustain the loss of 2 facilities

IA - Infrequently Accessed

  • lower cost, but charge for every access

One Zone IA (RRS)

  • min storage 30 days
  • min billable object size 128KB
  • One zone, 99.5% available, cost 20% less than IA

Reduced redundancy storage

  • 4 9s only. Data than can be recreated.
  • going away, costs more than standard

Glacier

  • min storage 90 days
  • min billable object size 40KB
  • archiving, very cheap.
  • access takes 3*5 hours to retrieve

Glacier Deep Archive

Intelligent Tiering

  • frequent, or infrequent access tiers
  • data moved between as you access
  • access not accessed for 30 days, data is moved to infrequent tier
  • small fee for monitoring objects
  • optimizes cost

Charges

  • per GB
  • requests
  • free to add data to s3, charged to download
  • storage tiers pricing differences
  • transfer acceleration charged
  • cross region replication is used
  • object tagging

Security

  • created private by default, only owner can access objects
  • access logs can be created and written to another s3 bucket
  • policies are for the bucket, cannot create policy for an object
  • ACL is applied at the object level

Encryption

  • In Transit
    • SSL/TLS
  • At Rest
    • Server Side Encryption (SSE), 256 bit
      • S3 Managed Keys - SSE-S3
      • AWS KMS - SSE-KMS, provides audit trail
      • Customer provided keys - SSE-C
    • Client Side Encryption
  • PUT requests
    • Expect: 100-continue
    • x-amz-server-side-encryption
      • AES256 -> SSE-S3
      • ams:kms -> SSE-KMS

CORS

  • put CORS rules into permissions/CORS configuration on the bucket that has the files being loaded/accesed

    <CORSConfiguration>
        <CORSRule>
            <AllowedOrigin>*</AllowedOrigin>
            <AllowedMethod>GET</AllowedMethod>
            <MaxAgeSeconds>3000</MaxAgeSeconds>
            <AllowedHeader>Authorization</AllowedHeader>
        </CORSRule>
    </CORSConfiguration>
    

CDN/CloudFront

  • Edge Location - cache servers, not an AWS Region/AZ (lot more edge locations)
  • Origin - S3 bucket, EC2 instance, ELB, Route53, or any web server including on premise.
  • Distributions
    • web content
    • RTMP - media streaming, realtime
  • TTL - Time to live expires and the cache is cleared. If you clear it yourself, you get charged.
  • first 1000 invalidation paths are free
  • whitelist or blacklist countries
  • upload and download

Transfer Acceleration

  • fast, easy, secure transfer of files over long distances. Users <-> S3
  • can use CloudFront as edge location for data

Performance

  • normal usage
    • 3,500 PUT/LIST/DELETE per sec
    • 5,500 GET per sec
  • heavy usage - use CloudFront
  • [no longer needed] use non-sequencial keynames for objects, allows distributed objects into differernt partition

Versioning

  • store all versions, including delete
  • cannot be disabled once enabled, must delete bucket
  • great backup tool
  • works with lifecycle rules
  • can use MFA delete capability
  • make public only works for specific version
  • delete of file just creates a delete marker as a new version
  • can delete individual versions

Lifecyle Management

  • automate moving objects between storage tiers
  • can filter by tag, file prefix
  • can be used with/without versioning
  • can also delete files after period of time
  • rules
    • transition - move to different tier after X days
    • expiration - delete after X days
  • useful for objects that have defined lifecycles

Replication

  • sent changed objects to bucket in different region
  • must have versioning turned on source and destination bucket
  • can filter by tag, file prefix
  • can be different storage class
  • can object change owner to destination bucket owner
  • can be different accounts
  • when started, does not replicate existing objects in the bucket
  • deletes are not replicated