Search This Blog

Wednesday, August 31, 2016

Transfer Data from Amazon's AWS S3 Servers to Google Cloud

A while back I decided it would be a smart idea to archive my Aperture photos (now Apple Photos) since my Macbook, a 2008 model, is near end of life in terms of hardware longevity. A quick research landed me at Amazon's cloud storage, S3 (Simple Storage Service), at just pennies per GB. I have around 100 GB of data within Aperture, so I needed something that was less expensive than using Google Drive.

As of this writing, Google Drive is $1.99/mo for 100 GB, and the next tier is 1 TB for $9.99/mo, with no scaling between the two tiers. S3, on the other hand, is $0.03/GB for standard storage ($3 per 100 GB, but scalable), and $0.007/GB for their long-term infrequent access "Glacier" storage. My plan was to upload all of my data from my Aperture library to S3, and then have it automatically archive to Glacier, costing me about 7 cents per month to safely stow away several years worth of photos. Sounds like a pretty good deal, right?

The problem with Glacier is that it is meant to be a long-term storage solution with very infrequent access. Once the files are in S3, I'd have to run a script to archive them to Glacier, followed by removing the files from S3 to save money. The next archive time I would have to transfer from Glacier to S3 so that I could compare source/destination (computer to S3) for changes on upload, and then repeat the whole sequence of archiving. However, I don't have that much time dedicated to optimizing my backup plan. Rather, I'd just like a cloud storage solution that I can easily access whenever I want, without having to worry where all the data is spread across the AWS platform.

I recently discovered Google Cloud, which I surprisingly had not heard of before, considering how Google-centric all my stuff is. I mean, I have a Nexus phone, Chromebook, this website is hosted on Google (including using Google's DNS services), I use Blogger, as well as Drive. I'm pretty much a Google fanboy at this point. But, it never crossed my mind to see if Google had a solution. So I started to compare AWS to Cloud, and, for my purposes, they are surprisingly similar, yet Google's services seem more intuitive.

There are 3 tiers with Google Cloud Storage: Standard ($0.026/GB), Durable Reduced Availability ($0.02/GB) and Nearline ($0.01/GB). Although the Nearline is more expensive than Glacier, the ease of use far beats that of AWS's option. What's more, Google has a transfer service that talks to Amazon's S3, so I can easily transfer over my bucket - which is what a storage node is called for both services. Google has key term explanations if this is all new to you. But, basically, you create your Project (Google Cloud account), create a bucket (Standard, DRA, or Nearline), and then put objects (or files) into that bucket.

Once you have your Project, or Google Cloud Storage account, to initiate a transfer from S3 to Google Cloud, follow these steps:

  1. Create a user access policy within AWS IAM (Identity and Account Management)
    1. Create a user in IAM (ie GoogleTransfer)
      1. Download the credentials 
      2. Copy the Access Key and Secret Access Key to somewhere you won't lose it (I created a Google Sheets file to keep track of users and their access keys). You will need both of these while creating the transfer later over on Google, and will never have access to Secret Key once you leave this page.
    2. Give that user an Inline Policy
      1. Policy Generator
        1. Effect: Allow
        2. AWS Service: Amazon S3
        3. Actions: All Actions, minus the 5 Delete* options at the top of the list
          1. This is probably overkill, but I ran into access permission problems while trying to use Groups instead of an inline policy, so I just gave blanket permission.
        4. Amazon Resource Name: arn:aws:s3:::*
          1. This gives the user access to everything on your S3, including all your buckets. If you want to restrict it further, have a read here.
      2. Add Statement
      3. Next Step
      4. Apply Policy
  2. Create a Bucket in Google Cloud Storage
    1. Give it a unique name (ie aperture-backup-benmctee)
    2. Select your storage class (pricing and explantions)
    3. Select your storage location. I would stick to multi-regional unless you have a good reason not to.
  3. In your Google Cloud Console, create a new Transfer
    1. Amazon S3 Bucket: s3://bucket-name (this is your unique bucket name, ie benmctee-aperture-archive)
    2. Access Key ID: This is the public key generated in IAM.
    3. Secret Access Key: The secret key generated in IAM - you saved it, right??
    4. Continue
    5. Select the bucket you created
    6. If this is the first time you are transferring, you should not need to select further options. If you are trying it again because a transfer failed, you may want to select Overwrite destination with source, even when identical
    7. Continue
    8. Give it a unique name, if desired
    9. Choose Run Now, and Create
The beauty of cloud computing is that this will all happen without you having to stay on that page to monitor it. If you want to come back later and check the progress, just log back into your Google Cloud Console, go to Transfers, and click the job to see where you're at. From Amazon to Google should be relatively quick, depending on the volume of files ("objects") you are transferring.