The constant monitoring of application performance (APM) and alerting to any adverse conditions is the best place to start with understanding the performance, availability and capability of your applications.
Beyond the obvious benefits of knowing when performance is sub-optimal so we can respond in real-time, having a continuous historic record of application performance can also help correlate when changes to the application (such as new version or optimisations) have affected performance and any degradation over time.
Comparing the performance metrics of the applications against the resource use of any underlying compute, network and storage can help better understand where bottlenecks may exist during both expected and unexpected traffic volumes.
Repeatedly testing your application, software and infrastructure stacks is a critical part of modern application development and maintenance.
Basic load simulation involves defining a set of actions typical to a user, executing those actions with volume and concurrency, while continually monitoring the availability, performance and resource usage resulting from. These metrics typically help define a baseline capability of the stacks and identify where any basic bottlenecks exist for further exploration and optimisation.
Smoke load testing is a set of user typical simulated actions executed on-demand over a short period of time so as to to verify there are no changes in performance or capability after a change has been made within the stacks.
Letting your application “soak” in a non-extreme but substantial amount of traffic over a long period of time can help identify resource run-away which might not otherwise be apparent. Common examples include memory exhaustion due to application leaks, storage exhaustion due to missing log rotation and auto-increment exhaustion due to integer sizing.
Stress and Spike
Simulating an extreme concurrency and/or volume of traffic over normal (stress) and short (spike) periods of time in intended to take you application up to and then beyond the point of unavailability. Only when the application has been broken in this was can one be certain it will gracefully or otherwise recover as expected.
Last modified: 2021/03/12 at 10:16 by Carl Heaton