Skip to content

Running the worker in the Amazon AWS EC2 cloud

Dark42ed edited this page Aug 24, 2021 · 11 revisions

[Note that this page might be somewhat outdated, as it reflects the state in ~2018/2019]

It's possible to run the fishtest worker using compute instances on Amazon Web Services (AWS) Elastic Compute Cloud (EC2). The cheaper option is to use a Spot instance, that allows to get 'spare CPU time' at reduced cost. A Spot instance might be terminated by AWS, but this is no problem for fishtest. This bills your credit card at a rate that you can monitor and control (typically 0.30-0.45$/hour for the high grade instance needed for fishtest).

Get your personal account and password for fishtest

http://tests.stockfishchess.org/signup

Make a note of your username and password, you will need them later.

Get your personal account at Amazon Web Services (AWS)

http://aws.amazon.com/

Creating your Amazon Web Services (AWS) account is free, but requires a credit card. Register for AWS using a name and password that is different from the fishtest one! Keep your AWS password and username secure.

Starting instances to contribute to fishtest

With the above two usernames and passwords, you can launch AWS Elastic Compute Cloud (EC2) instances relatively easily. Ignore the wide variety of options offered by AWS, the defaults will work fine.

Create your instance

  1. After login to AWS, go to the EC2 dashboard.
  2. From the top right-hand corner, select the geographic region where you will be launching your instance. Note there are several geographic regions to choose from, and pricing of your instance will depend on the geographic region. The lowest prices are typically available in the following geographic regions: Asian Pacific (Seoul), Canada (Central), EU (Ireland).
  3. Go to INSTANCES / Spot Requests : Pricing History. To optimize your cost you might iterate between steps 2 and 3 to find the region and zone which seems best for the instance type c4.8xlarge.
  4. Go to INSTANCES / Spot Requests : Request Spot Instances.
  5. Go through the various instance configuration steps:
    1. Request type: Request

    2. Target capacity start with 1. The maximum AWS allows by default is 20, but new AWS users have a lower limit.

    3. AMI: Canonical, Ubuntu, 14.04 LTS. Don't choose Ubuntu 16.04 LTS because the tuning branch triggers a GCC bug.

    4. Instance type: c4.8xlarge. It is essential to take this high-quality instance type.

    5. Maximum price: set your max price up to your budget, with a high bid, your instance will start earlier and run longer. Currently 0.30-0.45$/hour seems to work well.

    6. A crucial step, fill the User data section with the following script (just copy and paste), where you replace USERNAME and PASSWORD (at the start of the script) with your fishtest username and password. Note that spaces, line breaks, etc. matter. The script updates the software and installs the essential tools, including fishtest. Furthermore, it disables hyperthreading as preferred for running fishtest.

      #!/bin/bash
      # replace USERNAME and PASSWORD with your fishtest username and password
      # the double quotes deal with symbols in username or password: don't delete them
      username="USERNAME"
      password="PASSWORD"
      
      # update software
      apt update && apt full-upgrade -y && apt autoremove -y && apt clean        
      apt install -y python3 build-essential unzip
      useradd -m fishtest
      sudo -i -u fishtest wget https://github.com/glinscott/fishtest/archive/master.zip
      
      # disable hyperthreads
      for cpunum in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | cut -s -d, -f2- | tr ',' '\n' | sort -un); do
           echo 0 > /sys/devices/system/cpu/cpu$cpunum/online
      done
      
      # if the 5min load average gets below 2.5 poweroff the instance
      # currently poweroff leads to (the expected) termination of the instance
      (
        # ignore first 5min
        sleep 300
        while [ true ]; do
          sleep 60
          loaded=`awk '{if ($2 > 2.5) {print 1} else {print 0}}' /proc/loadavg`
          if (( $loaded )); then
             echo `date` `cat /proc/loadavg` >> loadlog
          else
             poweroff
          fi
        done
      )&
      
      # generate a script for running fishtest
      cat << EOF > runscript.sh
      #!/bin/bash
      partitionid=0
      # partition the available cores in independent fishtest workers
      # the sum of the partitions should be 17 = the number of physical cores - 1
      # the choice 14 3 is optimized for allowing both 7 threads and 1 thread runs
      # start the workers in descending order wrt the number of cores
      # to minimize the time control error for the first batch of games
      # (the time control is adjusted according to the CPU load)
      for partition in 14 3
      do
        partitionid=\$((partitionid+1))
        mkdir part_\$partitionid
        cd part_\$partitionid
        unzip ../master.zip
        (
        # put fishtest in a loop restarting every 4h to deal with eventual hangs or crashes
        while [ true ]; do
          # some random delay to avoid starting too many fishtest workers at the same time
          sleep "\$((RANDOM%20)).\$((RANDOM%100))"
          timeout 4h python3 fishtest-master/worker/worker.py --concurrency \$partition "$username" "$password" >& out
        done
        ) &
        cd ..
      done
      wait
      EOF
      chmod a+rx runscript.sh
      
      # execute script
      sudo -i -u fishtest /bin/bash `pwd`/runscript.sh
      
      # if we come here, terminate
      poweroff
  6. Click Review, fill the checkbox to ignore the warnings about the key pair.
  7. Click Launch
  8. View spot Instances. Here you see if/when your bid is successful, typically takes a few minutes to start (use the reload sign to reload the page as needed). Wait till the state becomes green: 'active'.

Manage your instance

  1. See http://tests.stockfishchess.org/tests . Within a few minutes of your spot instance starting, your username should appear in the list of Active machines, with two workers, one with 14 and another with 3 cores (depending on the choice of partition in the script, and provided there are pending/running tests). The flag will depend on the region you selected in AWS EC2. If your instance does not appear after a couple of minutes, or does not display non-zero MNps, termination might be in order.
  2. The instance will be terminated if no tests are available, or if fishtest is down.
  3. To manually terminate a running instance (and stop billing your credit card), go to the AWS dashboard / instances / instances. You can right click the instance and select instance state / terminate. Confirm that it is OK to lose all storage.
  4. Note that spot instances can be terminated by AWS at anytime, if the maximum price you selected is not competitive anymore.
  5. To get an overview of your EC2 usage and costs, go to the AWS dashboard / reports / EC2 Instance Usage Report