DRAFT - How to upgrade project from MySQL to Redshift

This is the best practice of how to proceed when upgrading an existing MySQL-backed project to Redshift backend. There are five steps:

  1. Create RS transformations
  2. Reconcile data
  3. Provision backend
  4. Migrate storage
  5. Finish and clean up

First and foremost, it is recommended to go about this in isolated fashion. Try to avoid "improving" the logic of the project while doing this, or you will give yourself lots of headache when something goes wrong. Ideally, the transition should be made in as mechanical fashion as possible. Any changes should be done (and QC'd) either BEFORE or AFTER the RS transition.

Second, you can not use the shared redshift environment for anything that production depends on, it is only for dev/R&D/sandbox purposes. Make sure you have RS provisioned for the project by Keboola no later than by the storage migration point, and do not make RS-backed transformations part of any production orchestration before that point (they're not guaranteed to run on shared backend and you can also cause trouble to others).

Create a new sys.bucket/transformation structure that is 1:1 to the old one. Use clear naming convention, such as rs_xxxx, where xxxx stands for the name or the original transformation bucket. The names of the tables and hence the transformations may remain the same. 
Create a new output structure. You will need new output buckets for your final data. Again, something like out.rs_xxxx may be a good idea. Unless you have RS already provisioned for the project, make these MySQL buckets. Move any historical data, where needed (when your transformations are using incremental outputs)  from your current output structure over using snapshots. THIS IS NOT A GOOD TIME TO MAKE ANY CHANGES TO THE STRUCTURE OR LOGIC. If you didn't make any planned changes before, leave them for now until the transition to RS is done.
Change the output mapping of the rs_ transformations. Clear, just point all your rs_ transformations to rs_buckets.
Redshift-ize the transformations. (change backend, query code, edit any mapping items etc.)

If you followed the above mechanically, once you run your new transformation group, you should have two output structures containing exactly the same data (one from your old transformations, one from the rs_ tree in the out.rs_ buckets. Compare the data. The simplest and most robust way would be to create a new transformation that takes the two versions of the same table and produces a diff output. Run that in RS sandbox. If you keep the transformation you can use it repeatedly as a test as you're fixing up the tree.

To do anything more than what you've done before, RS will need to be turned on for your project. If you haven't already, contact support@keboola.com, and DO NOT PROCEED FURTHER unless you have confirmation that it is safe to do so in your project.

So far you worked only on transformations. To fully utilize the power of RS backend, storage buckets need to be migrated as well. It is a multi-step process, but can be scripted using the Storage API CLI for larger projects. The manual process is:
  • Create a temp bucket (any name, any backend)
  • Copy tables from your source MySQL bucket into the temp using snapshots
  • Delete tables from the source bucket
  • Delete the source bucket
  • Re-create bucket with the same name, but with RS backend
  • Copy tables from the temp bucket into the new RS bucket using snapshots
  • Repeat for remaining buckets
Now, you're pretty much done. If you did your reconciliation along the way, there will be no surprises. Now, just change the output mapping of the rs_ transformations to the original bucket names (which are now RS backed as well), run the whole thing and enjoy the disappearance of data transfer times. Delete the temp buckets and the temporary out.rs_ buckets (unless, of course, you decided to keep using those). Change your orchestrations to use the new transformations. When brave enough, delete the old transformations to keep the board neat. Congrats!