On Database Refreshes
Index
Why Refresh?
Developing processes for database refreshes has many benefits including:
- Consistency across environments (which means PROD bugs are more likely to be replicable on lower environments)
- Reveals data which should be moved to environment configs
- Reveals environment specific data and URLs within your items
- Time savings (if automation is used)
- Can help identify areas for performance improvements
- Can help identify problematic customizations
- Helps get local development environments set up
The Caveman Refresh
Perhaps the quickest and easiest method of refreshing a database is to generate a content package from PROD and install it on the downstream site(s). In many cases, this is sufficient, and it's a good starting point that will help you identify problem areas and automation opportunities.
PRO TIP
When generating a content package on PROD, you can add items dynamically instead of statically. This way, you can filter items based on language, modification / create date, and more. The performance of this approach is also much better than using a static list of items.
However, there are some areas to watch out for with the package approach:
- Item buckets can be a problem; especially if you have custom tree structure rules. There could be a big performance hit while installing the package. You also may need to resync buckets after installing the package.
- See if you have any custom save / update / event handlers. These can cause performance issues and lead to unexpected issues due to bulk item adding / editing / moving / etc.
Now let's discuss a more comprehensive approach to database refreshes.
Preparing for the New Process
- Define a schedule for when the refreshes should occur
- Identify who the users are and consider that user passwords should not be the same across environments (unless you are using SSO)
- Identify which items may need to be updated after the refresh. Scan the database for environment-specific URLs, values, and API keys. SXA site settings items often contain environment specific values. Content such as blogs are also frequent culprits for containing environment specific URLs due to content authors not using dynamic sitecore links. You do not want environment specific URLs in most places, because it can cause confusion if users think they are viewing a staging site but click a link and are actually then viewing the live site. See the script below for an example of scanning items for a specific value. This script can be run in the Sitecore PowerShell Extensions (SPE) console. If you have a large database, you may want to modify this script to search against indexes rather than the database directly.
Once you've done your initial preparation, consider these pre and post-refresh steps.
Pre-refresh Steps
General Steps
- Ensure there aren't pending changes or test items on the downstream environment. It's always a good idea to fully back up the database prior to refreshing.
- Schedule a blackout period in which content authors should not attempt to access the site.
Database Backup and Restoration
Understand that exporting databases can result in a DTU hit, which can have performance implications and incur additional charges. You can temporarily scale up your databases while performing such operations.
Ideally, you want to base your refreshes on a point in time backup, else data may still be changing while you are exporting.
Post-refresh steps
Connection Strings and Database Credentials
Adjust your connection string names as necessary, depending on the names of your newly created backup DBs. You will also need to ensure that the databases are accessible with the user credentials you specify (ex. masteruser
, coreuser
, webuser
). The easiest way to do this on downstream environments is to use your sa
(system admin) credentials when connecting to your master, core, and web databases. This is, less secure than having individual users for each database, but it's a fast and effective way of avoiding authentication / permissions issues.
App Pool Recycle
Make sure to restart all of your sites after you restore the DB and connect to it. If you don't, the refresh may appear to have been unsuccessful (data may appear unchanged).
Database Cleanup
After restoring the databases, and prior to reindexing, you will want to clear the event and publish queues as well as property data which doesn't need to be retained during the pulldown. Not doing this cleanup can result in performance issues. The queries below can either be run in the Azure portal query editor or by connecting to the DBs via SSMS:
If database access is an issue, the publish and event queues can also be cleared on these admin pages:
/sitecore/admin/PublishQueueStats.aspx
/sitecore/admin/EventQueueStats.aspx
Run the Content Update Script
If you identified items with environment-specific URLs or values, you'll need to update those items after the database refresh. You can do this manually, but it's best to automate this process. The following script is an example of how you can do this using Sitecore PowerShell Extensions (SPE). This script will prompt you to select an environment, and then it will update the specified items with the appropriate values. You can modify this script to suit your needs.
Reindexing
You need to reindex after restoring a database, else the indexes will be out of sync with the restored database. Downstream environments may be underpowered compared to PROD, so if reindexing takes too long or consistency fails, you may need to scale up the environment resources (especially when using App Services).
Now you just need to refresh. A good way to speed up this process with a more surgical approach is this solution by Ben Golden (https://bgolden.digital/2017/01/21/re-index-tree-for-selected-index-with-spe/):
This script lets you select one index to refresh for the selected subtree. Note that it just starts the job and closes the window. It does not monitor the job's progress. You could add this if you desire or simply use the jobs admin page. The script needs to be saved under /sitecore/system/Modules/PowerShell/Script Library/Content Editor/Context Menu after you have enabled the context menu integration point in the module.
Restore Users
If you are not using SSO for authentication, remember that restoring the core database will also restore user credentials from PROD. You'll need to update these credentials to match the credentials used in the environment you are restoring to, and your passwords should be different across environments.
Cleanup
Remember to delete any database exports, backups, or databases that are no longer needed.
KEEP BUILDING,
MG