As part of deploying the Metaflow UI we’ll want to create a replica of our Postgres metadata DB to ensure the UI is isolated from the core Metaflow runtime.
To do this, we’ll create and configure a logical replica from a snapshot of the Metaflow metadata service. Lots of credit to the Instacart tech blog post in the references that was heavily used for this.
Caveat/Caution per Romain Cledat from the Netflix Metaflow team:
Great @russell; very nicely done and exactly what I did within Netflix. I would add though that it is not the recommended way to start a replica (from a snapshot) and if your DB is not super large, it may be best to start from scratch (takes a little longer since all tables have to sync) but is potentially safer. See here for some caveats: https://ardentperf.com/2021/07/26/postgresql-logical-replicas-and-snapshots-proceed-carefully/. That said, I didn’t have a choice and did do what you mention here and it worked fine; I did double check to make sure that all tables were continuous (ie: no missing data) at the end.
Diagrams stolen from the Metaflow UI release presentation.
In the simple model, the UI service is deployed as a separate, standalone container to have strong isolation between the UI and the underlying metadata service. This helps ensure that the UI service cannot have adverse effects on the core Metaflow runtime.
Both containers having a shared DB is great for getting realtime updates in the UI, however that has its own set of risks. If a bunch of people use the UI at once, it can ramp up the number of queries to the DB. Similarly, someone could run an expensive query over a broad time range while navigating the UI or interacting with a large number of artifacts – degrading the performance of the DB.
Me playing around with the UI while still using a t2.small primary DB for everything.
To be super paranoid and have even stronger guarantees that the UI cannot adversely affect the core Metaflow runtime, a logical replica can be made to receive a stream of all the changes from the primary DB into a replica which the UI service will connect to instead.