How to monitor and troubleshoot NGINXPlus

December 22, 2022 · 8 min read

Technical Product Manager

As a continuation of our series for monitoring web servers with NGINX and APACHE, let us find out how to effectively and easily monitor and troubleshoot NGINXPlus using Netdata!

logo

What is NGINXPlus

NGINXPLus is an open source web server and load balancer. It is an enterprise-grade version of the popular open source NGINX web server, with additional features for scalability, performance, and monitoring. NGINXPLus includes features such as load balancing, content caching, and HTTP/2 support, and is suitable for applications of any size. NGINXPLus is highly customizable and can be used to deploy web applications in a variety of environments.

Monitoring NGINXPlus with Netdata

The prerequisites for monitoring NGINXPlus with Netdata are to have NGINXPlus and Netdata installed on your system.

Netdata auto discovers hundreds of services, and for those it doesn't, turning on manual discovery is a one line configuration. For more information on configuring Netdata for NGINXPlus monitoring please read the collector documentation.

You should now see the NGINXPlus section on the Overview tab in Netdata Cloud already populated with charts about all the metrics you care about.

Netdata has a public demo space (no login required) where you can explore different monitoring use-cases and get a feel for Netdata.

What NGINXPlus metrics are important to monitor?

The metrics that Netdata collects are organized into subsections within the nginxplus section for easier navigation. Each metric is represented by a composite chart that aggregates the data across multiple nodes/instances etc.

Below, you can find a brief description of the NGINXPlus metrics being collected and visualised on Netdata:

client_connections_rate

Accepted and dropped (not handled) connections. A connection is considered dropped if the worker process is unable to get a connection for the request by establishing a new connection or reusing an open one.

Client Connections rate

client_connections_count

The current number of client connections. A connection is considered idle if there are currently no active requests.

Client Connections Count

ssl_handshakes_rate

Successful and failed SSL handshakes.

SSL Handshakes Rate

ssl_session_reuses_rate

The number of session reuses during SSL handshake.

SSL Session Reuse Rate

http_requests_rate

The number of HTTP requests received from clients.

HTTP Requests Rate

http_requests_count

The current number of client requests.

HTTP Requests Count

http_server_zone_requests_rate

The number of requests to the HTTP Server Zone.

HTTP Server Zone Requests Rate

http_server_zone_responses_per_code_class_rate

The number of responses from the HTTP Server Zone. Responses grouped by HTTP status code class.

HTTP Server Zone Responses

http_server_zone_traffic_rate

The amount of data transferred to and from the HTTP Server Zone.

HTTP Server Zone Traffic Rate

http_server_zone_requests_processing_count

The number of client requests that are currently being processed by the HTTP Server Zone.

HTTP Zone Requests Processing Count

http_server_zone_requests_discarded_rate

The number of requests to the HTTP Server Zone completed without sending a response.

HTTP Zone requests discarded rate

http_location_zone_requests_rate

The number of requests to the HTTP Location Zone.

HTTP Location Zone Requests Rate

http_location_zone_responses_per_code_class_rate

The number of responses from the HTTP Location Zone. Responses grouped by HTTP status code class.

HTTP Location Zone Responses

http_location_zone_traffic_rate

The amount of data transferred to and from the HTTP Location Zone.

HTTP Location Zone Traffic Rate

http_location_zone_requests_discarded_rate

The number of requests to the HTTP Location Zone completed without sending a response.

HTTP Location Zone Discarded Rate

http_upstream_peers_count

The number of HTTP Upstream servers.

HTTP Upstream Peers Count

http_upstream_zombies_count

The current number of HTTP Upstream servers removed from the group but still processing active client requests.

HTTP Upstream Zombies Count

http_upstream_keepalive_count

The current number of idle keepalive connections to the HTTP Upstream.

HTTP Upstream Keepalive Count

http_upstream_server_requests_rate

The number of client requests forwarded to the HTTP Upstream Server.

Upstream Server Requests Rate

http_upstream_server_responses_per_code_class_rate

The number of responses received from the HTTP Upstream Server. Responses grouped by HTTP status code class.

Upstream Server Responses Rate

http_upstream_server_response_time

The average time to get a complete response from the HTTP Upstream Server.

Upstream Server Response Time

http_upstream_server_response_header_time

The average time to get a response header from the HTTP Upstream Server.

Upstream Server Response Header time

http_upstream_server_traffic_rate

The amount of traffic transferred to and from the HTTP Upstream Server.

Upstream Server Traffic Rate

http_upstream_server_state

The current state of the HTTP Upstream Server. Status active if set to 1.

Upstream Server State

http_upstream_server_connections_count

The current number of active connections to the HTTP Upstream Server.

Upstream Server Connections Count

http_upstream_server_downtime

The time the HTTP Upstream Server has spent in the unavail, checking, and unhealthy states.

Upstream Server Downtime

http_cache_state

HTTP cache current state. Cold means that the cache loader process is still loading data from disk into the cache.

HTTP Cache State

http_cache_iops

HTTP cache IOPS in responses per second.
- Served - valid, expired, and revalidated responses read from the cache.
- Written - miss, expired, and bypassed responses written to the cache.
- Bypassed - miss, expired, and bypass responses.

HTTP Cache IOPS

http_cache_io

HTTP cache IO in bytes per second.
- Served - valid, expired, and revalidated responses read from the cache.
- Written - miss, expired, and bypassed responses written to the cache.
- Bypassed - miss, expired, and bypass responses.

HTTP Cache IO

http_cache_size

The current size of the cache.

HTTP Cache Size

stream_server_zone_connections_rate

The number of accepted connections to the Stream Server Zone.

Stream Server Zone Connections Rate

stream_server_zone_sessions_per_code_class_rate

The number of completed sessions for the Stream Server Zone. Sessions grouped by status code class.

Stream Server Sessions Rate

stream_server_zone_traffic_rate

The amount of data transferred to and from the Stream Server Zone.

Stream Server Zone Traffic Rate

stream_server_zone_connections_processing_count

The number of client connections to the Stream Server Zone that are currently being processed.

Stream Server Zone Connections Processing Count

stream_server_zone_connections_discarded_rate

The number of connections to the Stream Server Zone completed without creating a session.

Stream Server Zone Connections Discarded Rate

stream_upstream_peers_count

The number of Stream Upstream servers.

Stream Upstream Peers Count

stream_upstream_zombies_count

The current number of HTTP Upstream servers removed from the group but still processing active client connections.

Stream Upstream Zombies Count

stream_upstream_server_connections_rate

The number of connections forwarded to the Stream Upstream Server.

Stream Upstream Server Connections Rate

stream_upstream_server_traffic_rate

The amount of traffic transferred to and from the Stream Upstream Server.

Stream Upstream Server Traffic Rate

stream_upstream_server_state

The current state of the Stream Upstream Server. Status active if set to 1.

Stream Upstream Server State

stream_upstream_server_downtime

The time the Stream Upstream Server has spent in the unavail, checking, and unhealthy states.

Stream Upstream Server Downtime

stream_upstream_server_connections_count

The current number of connections to the Stream Upstream Server.

Stream Upstream Server Connections Count

resolver_zone_requests_rate

Resolver zone DNS requests.
- Name - requests to resolve names to addresses.
- Srv - requests to resolve SRV records.
- Addr - requests to resolve addresses to names.

Resolver Zone Requests Rate

resolver_zone_responses_rate

Resolver zone DNS responses.
- NoError - successful responses.
- FormErr - format error responses.
- ServFail - server failure responses.
- NXDomain - host not found responses.
- NotImp - unimplemented responses.
- Refused - operation refused responses.
- TimedOut - timed out requests.
- Unknown - requests completed with an unknown error.

Resolver Zone Responses Rate

uptime

The time elapsed since the NGINX process was started.

Uptime

Troubleshooting NGINXPlus with Netdata

Alerts

Netdata has built-in alerts to reduce the monitoring burden for you.

If you would like to update the alert thresholds for any of these alerts or want to create your own alert for another metric – please follow the instructions here.

By default you will receive email notifications whenever an alert is triggered – if you would not like to receive these notifications you can turn them off from your profile settings.

Anomaly Advisor

Anomaly Advisor lets you quickly identify if the system you are monitoring has any anomalies and allows you to drill down into which metrics are behaving anomalously.

To learn more about how to use Anomaly Advisor to troubleshoot your Apache web server check out the documentation or visit the anomalies tab in the demo space to play with it right now.

Metric Correlations

Metric Correlations lets you quickly find metrics and charts related to a particular window of interest that you want to explore further. By displaying the standard Netdata dashboard, filtered to show only charts that are relevant to the window of interest, you can get to the root cause sooner.

Let us hear from you

If you haven’t already, sign up now for a free Netdata account!

We’d love to hear from you – if you have any questions, complaints or feedback please reach out to us on Discord or Github.

Happy Troubleshooting!

What is NGINXPlus​

Monitoring NGINXPlus with Netdata​

What NGINXPlus metrics are important to monitor?​

client_connections_rate​

client_connections_count​

ssl_handshakes_rate​

ssl_session_reuses_rate​

http_requests_rate​

http_requests_count​

http_server_zone_requests_rate​

http_server_zone_responses_per_code_class_rate​

http_server_zone_traffic_rate​

http_server_zone_requests_processing_count​

http_server_zone_requests_discarded_rate​

http_location_zone_requests_rate​

http_location_zone_responses_per_code_class_rate​

http_location_zone_traffic_rate​

http_location_zone_requests_discarded_rate​

http_upstream_peers_count​

http_upstream_zombies_count​

http_upstream_keepalive_count​

http_upstream_server_requests_rate​

http_upstream_server_responses_per_code_class_rate​

http_upstream_server_response_time​

http_upstream_server_response_header_time​

http_upstream_server_traffic_rate​

http_upstream_server_state​

http_upstream_server_connections_count​

http_upstream_server_downtime​

http_cache_state​

http_cache_iops​

http_cache_io​

http_cache_size​

stream_server_zone_connections_rate​

stream_server_zone_sessions_per_code_class_rate​

stream_server_zone_traffic_rate​

stream_server_zone_connections_processing_count​

stream_server_zone_connections_discarded_rate​

stream_upstream_peers_count​

stream_upstream_zombies_count​

stream_upstream_server_connections_rate​

stream_upstream_server_traffic_rate​

stream_upstream_server_state​

stream_upstream_server_downtime​

stream_upstream_server_connections_count​

resolver_zone_requests_rate​

resolver_zone_responses_rate​

uptime​

Troubleshooting NGINXPlus with Netdata​

Alerts​

Anomaly Advisor​

Metric Correlations​

Let us hear from you​