Status Monitoring and Healthchecks Requirements

Project Title	Status Monitoring and Healthchecks
Target Release
Epic
Document Status	DRAFT
Document Owner
Document Sign-Off
Subject Matter Expert(s)
Technical Expert(s)

Background & Business Value

We would like to ensure that the API Proxy endpoints are up and running. We believe the ability to do this can be helped by using healthcheck and monitoring tool which can send out notifications if any proxies become unavailable (upcheck) or start returning error messages (healthcheck).

Goals

Establish a monitoring system for the API Proxy Endpoints
- Ensure it can notify the Team
Establish Standards for types of Monitoring
- Upcheck (required)
  - Simply checks if the API Proxy is up and listening. It does not check anything further than the API Gateway.
- Healthcheck (opt-in)
  - Checks if the backend API service is healthy and responding. Check flows through the proxy to the backend application server.
Establish procedures for setting up uptime and healthcheck monitoring
- Document the features

Assumptions

Will use UptimeRobot.
- Ask Steven Maglio for credentials

Out of Scope

Being responsible for notifying the API service owners
Being responsible for the uptime of backend API services

Requirements

Title	User Story	Priority	Notes
/upcheck	As an API Gateway Admin, I want to know if one of our API Proxies is no longer available (is no longer deployed).	MUST HAVE	The is required on all API Proxies. `Path: /upcheck` This will be provided by the API Gateway system. This is a reserved endpoint and can't be defined in the API swagger doc. The response will be a simple `200 OK`. No security is required on this endpoint.
/healthcheck	As an API Gateway Admin, I want to give the API service developers a standardized way that they can monitor the health of their applications through the API Gateway. (Testing that a call going through the API Gateway will make it all the way to the backend service and verify that the backend service is responding correctly.)	MUST HAVE	This is an opt-in addition for API Proxies. Path: `/healthcheck` This is a reserved endpoint and can't be defined in the API swagger doc. The response will be a simple `200 OK` for success. All other responses will be considered failures and will require notification to be sent out. If the API developer has a particular response they expect to see, they can provide that at the time of configuration. A common Healthcheck API Key will be used to ensure that it's the Uptime Robot healthcheck system that is calling the endpoint.
Healthcheck should be Opt-In	As an API Developer, I don't want to be forced to provide a `/healthcheck` endpoint. I do want the ability to provide on in the future.	MUST HAVE	As part of the API Creation Request process we should ask if they would like for a Healthcheck to be setup for them. If they do want it setup they should provide: If they want the normal `200 OK` response is enough to determine if the service is healthy (and no notification should be sent). If they have a specific message body they want to see returned, in order to determine if the service is healthy or not. The functionality in Uptime Robot checks for keywords and not exact bodies. So, just a few keywords to search for is what will actually be used. Should the notifications be sent to the functional account that is associated with the application

User Interaction, Design & Architecture

Creating a new monitor

Examples and References

Questions

Below is a list of questions to be addressed as a result of this requirements document:

Question	Outcome	Decision Date
For the `/healthcheck` endpoints (the ones that flow through the API Gateway to the backend), should we secure them with an API key? Should it be a single API Key that we use on UptimeRobot for all healthchecks? Would this mean that the shared flow that accepted healthcheck requests would only check against that API Key; so other legitimate keys for the overall API Proxy would not work?
When a `/healthcheck` reports itself as down, should we standardize that the notification will only be sent to the functional/shared account address? Do we want to be more loose and let the API developer specific other addresses to send to?