Skip to content
This repository has been archived by the owner on Jan 21, 2019. It is now read-only.

Commit

Permalink
Update readme, add build script
Browse files Browse the repository at this point in the history
  • Loading branch information
mholt committed Jan 12, 2017
1 parent 3e265c9 commit cd767dc
Show file tree
Hide file tree
Showing 3 changed files with 55 additions and 12 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.DS_Store
_gitignore/
photos_backup/
photos_backup/
builds/
49 changes: 38 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
Photobak
========

Photobak is a media archiver. It downloads your photos and videos from cloud services like Google Photos so you have a local copy of your content. Run it on a regular basis to make sure you have all your new memories.
Photobak is a media archiver. It downloads your photos and videos from cloud services like Google Photos so you have a local copy of your content. Run it on a regular basis to make sure you own all your memories.

Features:

- Integrity checks
- De-duplication
- Fast
- Organizes photos on disk by album
- Fast downloads in parallel
- Updates photos if changes are detected
- Can run on a schedule
- Concurrent downloads
- Supports multiple accounts per service
- Idempotent operations
- Separate additive and destructive commands
Expand All @@ -35,8 +36,30 @@ $ go get github.com/mholt/photobak/cmd/photobak

For help:

```bash
```plain
$ photobak -help
Usage of photobak:
-authonly
Obtain authorizations only; do not perform backups
-concurrency int
How many downloads to do in parallel (default 5)
-every string
How often to run this command, blocking indefinitely
-everything
Whether to store all metadata returned by API for each item
-googlephotos value
Add a Google Photos account to the repository
-log string
Write logs to a file, stdout, or stderr (default "stderr")
-maxalbums int
Maximum number of albums to process (-1 for all) (default -1)
-maxphotos int
Maximum number of photos per album to process (-1 for all) (default -1)
-prune
Clean up removed photos and albums
-repo string
The directory in which to store the downloaded media (default "./photos_backup")
-v Write informational log messages to stdout
```

## Usage
Expand All @@ -63,15 +86,15 @@ A photo or video may appear in more than one album. This is fine, but Photobak w

After a full backup has completed, future backups will be much quicker. Because of this, you can run Photobak as often as you like (I usually do once per day, see below for running on a schedule). Remote items will be checked for changes each time you run a backup. If the service's API reports any changes to a photo from when you downloaded it, Photobak will update the item on disk.

By default, photobak only stores what it needs to do its archiving functions. You can tell it to store everything the cloud service returns with the `-everything` flag, but be aware it will increase the size of the index. For Google Photos, this extra information is things like links to thumbnails of various sizes, whether comments are enabled, license details, etc. You do not need to use this flag to store photo captions, names, or GPS coordinates from EXIF, because Photobak extracts and saves those regardless (they are considered valuable metadata).
By default, photobak only stores what it needs to do its archiving functions and a few valuable metadata fields. You can tell it to store everything the cloud service returns with the `-everything` flag, but be aware it will increase the size of the database. For Google Photos, this would be things like links to thumbnails of various sizes, whether comments are enabled, license details, etc. You do not need to use this flag to store photo captions, names, or GPS coordinates from EXIF, because Photobak extracts and saves those regardless (they are considered valuable metadata).

Repositories are portable. You can move them around, back them up, etc, so long as you do not disturb the structure or contents within a repository.

Photobak never mutates your cloud storage. It is read-only for the online service.
Photobak never mutates your cloud storage. It is read-only to the online service.

## Additive vs. Destructive

By default, Photobak runs backup operations: it only adds to the local index. Photobak will not delete photos or albums once they have been downloaded.
By default, Photobak runs backup operations: it only adds to the local index. Photobak will not delete or move photos or albums once they have been downloaded.

However, you can use the `-prune` flag to delete items locally that no longer appear in your cloud service. With this flag, Photobak will NOT perform a regular backup operation. Instead, it will query the API and delete items locally that have disappeared remotely. This way, you can keep disk space under control.

Expand All @@ -81,7 +104,9 @@ The `-prune` option is destructive, so make sure you trust that the API is healt

Photobak can run indefinitely and perform its backup operations on a regular schedule with the `-every` option: `-every 1d`. This will run the command every 24 hours. Valid units are `m`, `h`, `d` for minute, hour, and day, respectively. You should run this in the background since it will block forever.

You could also use cron, just don't use the `-every` option with a cron command. If a backup is still running when the next cron executes, the second cron command will fail since the database is locked.
You could also use cron, but don't use the `-every` option with a cron command. If a backup is still running when the next cron executes, the second cron command will fail since the database is locked (this is normal).

To get an idea of execution time: my photo library of ~4,000 items downloaded on a fast network with `-concurrency 20` finished in a little over an hour. The final repository size was 16 GB (after de-duplication).

## Logging and Error Handling

Expand All @@ -99,23 +124,25 @@ You can get informational log messages with the `-v` flag. This will output a lo

## Running Headless

Photobak must be authorized to access your accounts before it can be of any use, and obtaining authorization for services that use OAuth requires opening a browser tab for the user to grant access. This does not work so well over SSH.
Photobak must be authorized to access your accounts before it can be of any use. Obtaining authorization for services that use OAuth requires opening a browser tab for the user to grant access. This does not work so well over SSH.

On your local machine, run photobak with the `-authonly` flag, and it will obtain any needed credentials for all configured accounts and store them in the database. You can then copy the database to your remote machine and use its folder as the repository; the credentials in the repo's database that you already obtained will be used.

## Caveats

This program is designed to work with various cloud providers in a generic way, and each one has little things to be aware of.
This program is designed to work with various cloud providers in a generic way, and each service will have its quirks. These shouldn't be dealbreakers (otherwise I wouldn't add support for the service) but you should be aware of them.

### Google Photos

- There is no Google Photos API; it uses a zombied version of the [Picasa Web Albums API](https://developers.google.com/picasa-web/docs/2.0/developers_guide_protocol) which is somewhat crippled. It still works for now, and one advantage is that you don't have to mirror your Google Photos in Google Drive for this program to work.

- Some users [have reported](https://code.google.com/p/gdata-issues/issues/detail?id=7004) that a [maximum of ~10,000 photos can be downloaded](https://github.com/camlistore/camlistore/issues/874) per album. It is still unclear why this is; even Google employees are hitting this. Google Photos puts all your "instant upload" (auto backup) photos into a single album called "Auto Backup". So if you take most of your photos on your phone and they get uploaded to Google Photos, you may hit this limit and there is no way to get photos older than the most recent 10k unless you put them into albums you create. This issue becomes irrelevant as you run backups regularly, assuming later you don't go way back and add really old photos to your cloud service that you don't already have locally.

- Unbelievably, Google Photos does _not_ assign unique IDs to photos in your account. It assigns IDs to unique photos _in albums_, but this is "too" unique, since the same photo may appear in multiple albums. Here, we rely on Photobak's de-duplication features. After a duplicate file is downloaded, it will be replaced with an entry in a text file that points to where it can already be found on disk. We could use another ID I found in the exif tag supplied by the API: the exif ID. This ID is more correctly unique per-photo, except sometimes it is _not unique enough_. But I only saw overlap on an edit (from an external editing app/program) of the same photo, so if one was overwritten (which it was), I still had the picture, just one variant instead of two. This actually works better as far as saving bandwidth and disk space and I was torn for days trying to decide which to use. But for now we use Google Photos' ID field.
- Unbelievably, Google Photos does not expose unique IDs to photos in your account. It assigns IDs to unique photos _in albums_, but this is "too" unique, since the same photo may appear in multiple albums. Here, we rely on Photobak's de-duplication features. After a duplicate file is downloaded, it will be replaced with an entry in a text file that points to where it can already be found on disk. We could use another ID I found in the exif tag supplied by the API: the exif ID. This ID is more correctly unique per-photo, except sometimes it is _not unique enough_. But I only saw overlap on an edit (from an external editing app/program) of the same photo, so if one was overwritten (which it was), I still had the picture, just one variant instead of two. This actually works better as far as saving bandwidth and disk space and I was torn for days trying to decide which to use. But for now we use Google Photos' ID field.

- Sometimes, I've noticed that the same, unedited photo in my stream that is shared in different albums can not only have a different ID as mentioned above, but also a different checksum! Bizarre. Visually they looked identical, and they had the same dimensions, but when I inspected the bytes, one was a few hundred bytes shorter than the other. What's more perplexing is that both photos were exactly identical, byte-for-byte, until line 88443 of the hexdump. Then they were completely different. I've also seen sometimes that photos shared from other accounts that you add to your library can sometimes have different sizes depending on the download URL.

- Media may be available in several formats and sizes for a single item. Photobak will try to get the largest .mp4 video file, if available. If not, it will get the largest video even if it is a .flv file. If there is no video available, it tries the highest-resolution _anything_ it can find.
- Media may be available in several formats and sizes for a single item. Photobak will try to get the largest .mp4 video file, if available. If not, it will get the largest video even if it is a .flv or other type of file. If there is no video available, it tries the highest-resolution _anything_ it can find.

- Filenames for albums and photos are sanitized to remove special characters that sometimes appear but may not play nicely with the file system. For example, "5:5.jpg" becomes "55.jpg".

Expand Down
15 changes: 15 additions & 0 deletions cmd/photobak/build.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env bash
set -ex

# This script builds photobak for most common platforms.

export CGO_ENABLED=0

mkdir -p builds

GOOS=linux GOARCH=386 go build -o builds/photobak_linux_386
GOOS=linux GOARCH=amd64 go build -o builds/photobak_linux_amd64
GOOS=linux GOARCH=arm go build -o builds/photobak_linux_arm7
GOOS=darwin GOARCH=amd64 go build -o builds/photobak_mac_amd64
GOOS=windows GOARCH=386 go build -o builds/photobak_windows_386.exe
GOOS=windows GOARCH=amd64 go build -o builds/photobak_windows_amd64.exe

0 comments on commit cd767dc

Please sign in to comment.