Zen and the Art of Sitecore Maintenance / Audits

> A short overview
Cover Image for Zen and the Art of Sitecore Maintenance / Audits

Introduction

Let's talk about maintaining and auditing on-prem Sitecore implementations that run on Azure.

Consider this guide as:

  • A checklist for new implementations.
  • A maintenance checklist for existing implementations.
  • A list of potential upsells or value-adds for your clients.
  • Ideas for making your manual auditing services more comprehensive / valuable.

The guide below is part case study and part checklist. It is based on my unique experience maintaining a number of Sitecore 8, 9, and 10 implementations that run on virtual machines or App Services in Azure.

Asking the Right Questions

Whenever I maintain or audit a Sitecore implementation, I ask the following questions:

  • Is the implementation running smoothly?
  • Is the implementation secure?
  • Is the implementation performant?
  • Is the implementation scalable?
  • Is the implementation reliable?
  • Is the implementation maintainable?
  • Is the implementation cost effective?
  • Is the implementation future proof?
  • Is the implementation easy to use?
  • Is the implementation easy to extend?
  • Is the implementation easy to onboard new developers?
  • Is the implementation easy to deploy?
  • Is the implementation easy to test?
  • Is the implementation easy to debug?
  • Is the implementation easy to monitor?

The Priority Framework

I use the following framework to prioritize tasks / findings. Assign a value for each of the following:

  • Status: Acceptable / Suboptimal / Action Required /
  • Risk: Beneficial / Important / Critical
  • Complexity: Low / Medium / High

Developer Onboarding

IMO, the ease of developer onboarding should be part of a Sitecore audit and part of the maintenance process. If your implementation is difficult to onboard new developers, it will be difficult to maintain. This is especially true if you have a high turnover rate. No one likes burning valuable productive hours getting a site running.

  • Is it hard for new developers to get started?
  • Could your documentation be better?
  • Could your development environment setup be automated / standardized (using Docker containers)?
  • Should you update your solution to support development in new versions of Visual Studio?

Local Development Particulars

  • Does your solution build quickly?
  • Does Sitecore start quickly?
  • Are any npm libraries outdated and in need of security updates?
  • Is your version of node getting old? Aim to keep your node version decently up to date; else what can happen is that even though you have a package.lock, libraries can still break if those libraries have dependencies that change, which CAN occur retroactively. If you see strange errors during your builds, that could be why. If you try to update package versions to overcome the issue, you will likely find that they require a higher version of node. You don't want to discover this in the middle of a crucial build / deploy cycle.

Build Server / DevOps Servers

  • Enough disk space?
  • Automated file cleanup scripts?
  • Sufficient build artifact history?
  • Software licenses still valid (Octopus, TeamCity)?

Windows Updates

The nice thing about App Services is that you don't need to worry about this. If you have VMs, Azure has a good dashboard (Azure Update Management) by which you can manage OS updates:

Azure Update Management

Make sure to consider Microsoft "patch Tuesday" when scheduling updates.

Infrastructure Maintenance Checklist

Automation

  • Sitecore PowerShell Extensions (SPE) is a fantastic tool that can be used to automate reports and cleanup.
  • Partially automated SSL certificate updates via deploy / release pipelines WAF logs
  • Automated tests

Sitecore Maintenance Tools

The Sitecore content management interface provides various useful tools for maintenance. These tools are often overlooked!

  • Control Panel /sitecore/client/Applications/ControlPanel.aspx?sc_bw=1
    • Broken link report
    • Rebuild link databases
    • Clean up databases
    • Display database usage
    • Indexing manager
  • Admin Tools: /sitecore/admin/
    • Jobs Viewer: /sitecore/admin/Jobs.aspx
    • Publish Queue Stats: /sitecore/admin/PublishQueueStats.aspx
    • Event Queue Stats: /sitecore/admin/EventQueueStats.aspx
    • Database Cleanup: /sitecore/admin/DBCleanup.aspx
    • Rendering Statistics: /sitecore/admin/stats.aspx

DevOps

  • Reducing local and upstream build / deploy times
  • Improving build / deploy reliability
  • Improving safety -- will a failed build / deploy cause the site to go down?
  • Ideally, security updates will be applied during the build / deploy process without application / code changes

Database Refreshes

  • Environment refreshes -- can process this be automated? What needs to happen to fully automate it? For example, some items may store config values that are environment specific. Can you move those into config files?
  • Scheduled task items aren't great -- use config + code instead (a subject worthy of its own post).

Note that whenever a database refresh is performed, additional tasks are required to ensure proper functioning of indexes. Execute the following SQL statement on Core, Master, and Web DBs:


_4
-- If using Coveo, DO MORE RESEARCH BEFORE RUNNING THIS!
_4
TRUNCATE TABLE [dbo].[EventQueue];
_4
TRUNCATE TABLE [dbo].[PublishQueue];
_4
TRUNCATE TABLE [dbo].[Properties];

Execute this on the Web DB:


_1
TRUNCATE TABLE [dbo].[History];

After running these, reindex as needed.

Now ask: can this entire process be automated with something like Octopus and PowerShell scripts?

Refactoring

Here are some ideas for your backlog:

  • Field / template architecture improvements
  • Field order improvements
  • Automatic model generation
  • Making hardcoded labels dynamic / translatable
  • Self documenting code
  • Readable variable names
  • Reduction of cyclomatic complexity
  • Insert options review
  • Security roles (preventing item duplication, for example)
  • Fix Helix violations
  • Experience Editor usability improvements
  • Visual Studio code linting / formatting standardization (a subject worthy of its own post)
  • Addressing TODO comments in code
  • Accessibility improvements / WCAG compliance -- steep fines for non-compliance
    • Accessibility isn't just important for people with disabilities; it's also important for:
      • People with short term injuries (repetitive strain injuries, broken arms / fingers, etc.)
      • People who are using mobile devices
      • People who are using older browsers
      • It's also important for SEO
  • SEO improvements
  • Improving docs / onboarding
  • Fixes based on feedback from various scanning tools such as https://observatory.mozilla.org/analyze/www.site.com
  • Sitecore cache size tuning. Look for Sitecore log entries such as cache is cleared by. If you see these frequently it may be time to increase the size.
  • Pipeline Profiler for performance tuning: /sitecore/admin/pipelines.aspx
  • Fixing build warnings. This is valuable in that it reduces noise and makes it easier to spot real issues
  • Adjusting Sitecore logs to not log certain types of events in order to reduce noise
  • Browser console logs (JS errors / warnings)
  • Usability / clarity improvements for content authors

Azure Billing

Azure has a great billing dashboard, and you can automate with notifications as well.

If you have access to this area of the dashboard, and I would argue that any technical people maintaining a site should, you should review the billing regularly, as it can give you an early indication of potential infrastructure problems.

Azure Billing Trends

It can also provide insights into cost savings opportunities:

Azure Billing Recommendations

Notifications

Notifications are the backbone of any great maintenance setup.

  • Have some fun with IFTTT services
  • Use services such as iHook for monitoring API endpoints for approximate number of search results in order to identify search indexing issues
  • Slack Webhook notifications
  • Error logging / reporting services such as Sentry, Rollbar, BugSnag, Loggly are CRUCIAL for maintenance. They can help you catch errors before your client does. You can easily inspect all errors in one place, and you can even set up alerts to notify you when a certain error occurs. This is a huge time saver.
  • Pingdom, etc. for health checks and uptime monitoringYou can also use these tools to monitor when domains and SSL certificates expire. I have seen far too many implementations that don't do this, and it never ends well.

Conclusion

Industry is trending towards SaaS which offloads most infrastructure maintenance. This can free up resources to work on higher value tasks. However, there is still a very healthy market for on-prem Sitecore implementations. This will be the case for many years to come. If you are tasked with maintaining or auditing such an implementation, I hope this guide helps keep you profitable, helps you sleep peacefully at night, and keeps your clients happy.

Keep moving,

Marcel


More Stories

Cover Image for Hello World

Hello World

> Welcome to the show

Cover Image for Tips for Applying Cumulative Sitecore XM/XP Patches and Hotfixes

Tips for Applying Cumulative Sitecore XM/XP Patches and Hotfixes

> It's probably time to overhaul your processes

Cover Image for JSS: Reducing Bloat in Multilist Field Serialization

JSS: Reducing Bloat in Multilist Field Serialization

> Because: performance, security, and error-avoidance

Cover Image for Early Returns in React Components

Early Returns in React Components

> When and how should you return early in a React component?

Cover Image for How to Run Old Versions of Solr in a Docker Container

How to Run Old Versions of Solr in a Docker Container

> Please don't make me install another version of Solr on my local...

Cover Image for NextJS: Unable to Verify the First Certificate

NextJS: Unable to Verify the First Certificate

> UNABLE_TO_VERIFY_LEAF_SIGNATURE

Cover Image for Add TypeScript Type Checks to RouteData fields

Add TypeScript Type Checks to RouteData fields

> Inspired by error: Conversion of type may be a mistake because neither type sufficiently overlaps with the other.

Cover Image for On Mentorship and Community Contributions

On Mentorship and Community Contributions

> Reflections and what I learned as an MVP mentor

Cover Image for Tips for New Sitecore Developers

Tips for New Sitecore Developers

> If I had more time, I would have written a shorter letter

Cover Image for Sitecore Symposium 2022

Sitecore Symposium 2022

> What I'm Watching 👀

Cover Image for Symposium 2022 Reflections

Symposium 2022 Reflections

> Sitecore is making big changes

Cover Image for Don't Ignore the HttpRequestValidationException

Don't Ignore the HttpRequestValidationException

> Doing so could be... potentially dangerous

Cover Image for NextJS: Access has been blocked by CORS policy

NextJS: Access has been blocked by CORS policy

> CORS is almost as much of a nuisance as GDPR popups

Cover Image for On Sitecore Stack Exchange (SSE)

On Sitecore Stack Exchange (SSE)

> What I've learned, what I see, what I want to see

Cover Image for Super Fast Project Builds with Visual Studio Publish

Super Fast Project Builds with Visual Studio Publish

> For when solution builds take too long

Cover Image for Azure PaaS Cache Optimization

Azure PaaS Cache Optimization

> App Services benefit greatly from proper configuration

Cover Image for SPE Script Performance & Troubleshooting

SPE Script Performance & Troubleshooting

> Script never ends or runs too slow? Get in here.

Cover Image for Troubleshooting 502 Responses in Azure App Services

Troubleshooting 502 Responses in Azure App Services

> App Services don't support all libraries

Cover Image for Ideas For Docker up.ps1 Scripts

Ideas For Docker up.ps1 Scripts

> Because Docker can be brittle

Cover Image for Security Series: App Service IP Restrictions

Security Series: App Service IP Restrictions

> How to manage IP rules "at scale" using the Azure CLI

Cover Image for Year in Review: 2022

Year in Review: 2022

> Full steam ahead

Cover Image for NextJS/JSS Edit Frames Before JSS v21.1.0

NextJS/JSS Edit Frames Before JSS v21.1.0

> It is possible. We have the technology.

Cover Image for Tips for Forms Implementations

Tips for Forms Implementations

> And other pro tips

Cover Image for JSS + TypeScript Sitecore Project Tips

JSS + TypeScript Sitecore Project Tips

> New tech, new challenges

Cover Image for On Sitecore Development

On Sitecore Development

> Broadly speaking

Cover Image for NextJS: Short URL for Viewing Layout Service Response

NextJS: Short URL for Viewing Layout Service Response

> Because the default URL is 2long4me

Cover Image for Critical Security Bulletin SC2024-001-619349 Announced

Critical Security Bulletin SC2024-001-619349 Announced

> And other scintillating commentary

Cover Image for Script: Boost SIF Certificate Expiry Days

Script: Boost SIF Certificate Expiry Days

> One simple script that definitely won't delete your system32 folder

Cover Image for Content Editor Search Bar Not Working

Content Editor Search Bar Not Working

> Sometimes it works, sometimes not