Zen and the Art of Sitecore Maintenance / Audits
Index
Introduction
Let's talk about maintaining and auditing on-prem Sitecore implementations that run on Azure.
Consider this guide as:
- A checklist for new implementations.
- A maintenance checklist for existing implementations.
- A list of potential upsells or value-adds for your clients.
- Ideas for making your manual auditing services more comprehensive / valuable.
The guide below is part case study and part checklist. It is based on my unique experience maintaining a number of Sitecore 8, 9, and 10 implementations that run on virtual machines or App Services in Azure.
Asking the Right Questions
Whenever I maintain or audit a Sitecore implementation, I ask the following questions:
- Is the implementation running smoothly?
- Is the implementation secure?
- Is the implementation performant?
- Is the implementation scalable?
- Is the implementation reliable?
- Is the implementation maintainable?
- Is the implementation cost effective?
- Is the implementation future proof?
- Is the implementation easy to use?
- Is the implementation easy to extend?
- Is the implementation easy to onboard new developers?
- Is the implementation easy to deploy?
- Is the implementation easy to test?
- Is the implementation easy to debug?
- Is the implementation easy to monitor?
The Priority Framework
I use the following framework to prioritize tasks / findings. Assign a value for each of the following:
- Status: Acceptable / Suboptimal / Action Required /
- Risk: Beneficial / Important / Critical
- Complexity: Low / Medium / High
Developer Onboarding
IMO, the ease of developer onboarding should be part of a Sitecore audit and part of the maintenance process. If your implementation is difficult to onboard new developers, it will be difficult to maintain. This is especially true if you have a high turnover rate. No one likes burning valuable productive hours getting a site running.
- Is it hard for new developers to get started?
- Could your documentation be better?
- Could your development environment setup be automated / standardized (using Docker containers)?
- Should you update your solution to support development in new versions of Visual Studio?
Local Development Particulars
- Does your solution build quickly?
- Does Sitecore start quickly?
- Are any
npm
libraries outdated and in need of security updates? - Is your version of node getting old? Aim to keep your node version decently up to date; else what can happen is that even though you have a
package.lock
, libraries can still break if those libraries have dependencies that change, which CAN occur retroactively. If you see strange errors during your builds, that could be why. If you try to update package versions to overcome the issue, you will likely find that they require a higher version of node. You don't want to discover this in the middle of a crucial build / deploy cycle.
Build Server / DevOps Servers
- Enough disk space?
- Automated file cleanup scripts?
- Sufficient build artifact history?
- Software licenses still valid (Octopus, TeamCity)?
Windows Updates
The nice thing about App Services is that you don't need to worry about this. If you have VMs, Azure has a good dashboard (Azure Update Management) by which you can manage OS updates:
Make sure to consider Microsoft "patch Tuesday" when scheduling updates.
Infrastructure Maintenance Checklist
- Service health dashboard https://portal.azure.com/#view/Microsoft_Azure_Health/AzureHealthBrowseBlade/~/healthHistory
- Sitecore security updates https://support.sitecore.com/kb?id=kb_search&kb_knowledge_base=44035465db70dc109e54320a689619bf
- Windows Updates
- Backups
- Disk Space
- CPU Utilization
- Server Logs
- DTU utilization (for databases)
- SSL Certificate Expiry
- WAF logs
- Sitecore audit logs (if applicable) -- ensure they are working
- Azure audit logs (what changes have users made / actions have been performed recently?)
- Search / indexing infrastructure
Automation
- Sitecore PowerShell Extensions (SPE) is a fantastic tool that can be used to automate reports and cleanup.
- Partially automated SSL certificate updates via deploy / release pipelines WAF logs
- Automated tests
Sitecore Maintenance Tools
The Sitecore content management interface provides various useful tools for maintenance. These tools are often overlooked!
- Control Panel /sitecore/client/Applications/ControlPanel.aspx?sc_bw=1
- Broken link report
- Rebuild link databases
- Clean up databases
- Display database usage
- Indexing manager
- Admin Tools: /sitecore/admin/
- Jobs Viewer: /sitecore/admin/Jobs.aspx
- Publish Queue Stats: /sitecore/admin/PublishQueueStats.aspx
- Event Queue Stats: /sitecore/admin/EventQueueStats.aspx
- Database Cleanup: /sitecore/admin/DBCleanup.aspx
- Rendering Statistics: /sitecore/admin/stats.aspx
DevOps
- Reducing local and upstream build / deploy times
- Improving build / deploy reliability
- Improving safety -- will a failed build / deploy cause the site to go down?
- Ideally, security updates will be applied during the build / deploy process without application / code changes
Database Refreshes
- Environment refreshes -- can process this be automated? What needs to happen to fully automate it? For example, some items may store config values that are environment specific. Can you move those into config files?
- Scheduled task items aren't great -- use config + code instead (a subject worthy of its own post).
Note that whenever a database refresh is performed, additional tasks are required to ensure proper functioning of indexes. Execute the following SQL statement on Core, Master, and Web DBs:
Execute this on the Web DB:
After running these, reindex as needed.
Now ask: can this entire process be automated with something like Octopus and PowerShell scripts?
Refactoring
Here are some ideas for your backlog:
- Field / template architecture improvements
- Field order improvements
- Automatic model generation
- Making hardcoded labels dynamic / translatable
- Self documenting code
- Readable variable names
- Reduction of cyclomatic complexity
- Insert options review
- Security roles (preventing item duplication, for example)
- Fix Helix violations
- Experience Editor usability improvements
- Visual Studio code linting / formatting standardization (a subject worthy of its own post)
- Addressing TODO comments in code
- Accessibility improvements / WCAG compliance -- steep fines for non-compliance
- Accessibility isn't just important for people with disabilities; it's also important for:
- People with short term injuries (repetitive strain injuries, broken arms / fingers, etc.)
- People who are using mobile devices
- People who are using older browsers
- It's also important for SEO
- Accessibility isn't just important for people with disabilities; it's also important for:
- SEO improvements
- Improving docs / onboarding
- Fixes based on feedback from various scanning tools such as https://observatory.mozilla.org/analyze/www.site.com
- Sitecore cache size tuning. Look for Sitecore log entries such as
cache is cleared by
. If you see these frequently it may be time to increase the size. - Pipeline Profiler for performance tuning: /sitecore/admin/pipelines.aspx
- Fixing build warnings. This is valuable in that it reduces noise and makes it easier to spot real issues
- Adjusting Sitecore logs to not log certain types of events in order to reduce noise
- Browser console logs (JS errors / warnings)
- Usability / clarity improvements for content authors
Azure Billing
Azure has a great billing dashboard, and you can automate with notifications as well.
If you have access to this area of the dashboard, and I would argue that any technical people maintaining a site should, you should review the billing regularly, as it can give you an early indication of potential infrastructure problems.
It can also provide insights into cost savings opportunities:
Notifications
Notifications are the backbone of any great maintenance setup.
- Have some fun with IFTTT services
- Use services such as iHook for monitoring API endpoints for approximate number of search results in order to identify search indexing issues
- Slack Webhook notifications
- Error logging / reporting services such as Sentry, Rollbar, BugSnag, Loggly are CRUCIAL for maintenance. They can help you catch errors before your client does. You can easily inspect all errors in one place, and you can even set up alerts to notify you when a certain error occurs. This is a huge time saver.
- Pingdom, etc. for health checks and uptime monitoringYou can also use these tools to monitor when domains and SSL certificates expire. I have seen far too many implementations that don't do this, and it never ends well.
Conclusion
Industry is trending towards SaaS which offloads most infrastructure maintenance. This can free up resources to work on higher value tasks. However, there is still a very healthy market for on-prem Sitecore implementations. This will be the case for many years to come. If you are tasked with maintaining or auditing such an implementation, I hope this guide helps keep you profitable, helps you sleep peacefully at night, and keeps your clients happy.
Keep moving,
Marcel