idnits 2.17.1 draft-inadarei-api-health-check-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([2], [3], [4], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: o componentName: (optional) human-readable name for the component. MUST not contain a colon, in the name, since colon is used as a separator. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: o metricName: (optional) name of the metrics that the status is reported for. MUST not contain a colon, in the name, since colon is used as a separator and can be one of: -- The document date (March 24, 2018) is 2218 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 565 -- Looks like a reference, but probably isn't: '2' on line 567 -- Looks like a reference, but probably isn't: '3' on line 569 -- Looks like a reference, but probably isn't: '4' on line 571 == Unused Reference: 'RFC6838' is defined on line 548, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5988 (Obsoleted by RFC 8288) ** Obsolete normative reference: RFC 7234 (Obsoleted by RFC 9111) -- Obsolete informational reference (is this intentional?): RFC 7230 (Obsoleted by RFC 9110, RFC 9112) -- Obsolete informational reference (is this intentional?): RFC 7231 (Obsoleted by RFC 9110) Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group I. Nadareishvili 3 Internet-Draft March 24, 2018 4 Intended status: Informational 5 Expires: September 25, 2018 7 Health Check Response Format for HTTP APIs 8 draft-inadarei-api-health-check-01 10 Abstract 12 This document proposes a service health check response format for 13 HTTP APIs. 15 Note to Readers 17 *RFC EDITOR: please remove this section before publication* 19 The issues list for this draft can be found at 20 https://github.com/inadarei/rfc-healthcheck/issues [1]. 22 The most recent draft is at https://inadarei.github.io/rfc- 23 healthcheck/ [2]. 25 Recent changes are listed at https://github.com/inadarei/rfc- 26 healthcheck/commits/master [3]. 28 See also the draft's current status in the IETF datatracker, at 29 https://datatracker.ietf.org/doc/draft-inadarei-api-health-check/ 30 [4]. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at https://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on September 25, 2018. 49 Copyright Notice 51 Copyright (c) 2018 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (https://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 67 2. Notational Conventions . . . . . . . . . . . . . . . . . . . 3 68 3. API Health Response . . . . . . . . . . . . . . . . . . . . . 3 69 4. The Details Object . . . . . . . . . . . . . . . . . . . . . 5 70 5. Example Output . . . . . . . . . . . . . . . . . . . . . . . 7 71 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 72 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 73 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 74 9. Creating and Serving Health Responses . . . . . . . . . . . . 10 75 10. Consuming Health Check Responses . . . . . . . . . . . . . . 11 76 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 77 11.1. Normative References . . . . . . . . . . . . . . . . . . 11 78 11.2. Informative References . . . . . . . . . . . . . . . . . 12 79 11.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 12 80 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12 82 1. Introduction 84 The vast majority of modern APIs driving data to web and mobile 85 applications use HTTP [RFC7230] as their protocol. The health and 86 uptime of these APIs determine availability of the applications 87 themselves. In distributed systems built with a number of APIs, 88 understanding the health status of the APIs and making corresponding 89 decisions, for failover or circuit-breaking, are essential for 90 providing highly available solutions. 92 There exists a wide variety of operational software that relies on 93 the ability to read health check response of APIs. There is 94 currently no standard for the health check output response, however, 95 so most applications either rely on the basic level of information 96 included in HTTP status codes [RFC7231] or use task-specific formats. 98 Usage of task-specific or application-specific formats creates 99 significant challenges, disallowing any meaningful interoperability 100 across different implementations and between different tooling. 102 Standardizing a format for health checks can provide any of a number 103 of benefits, including: 105 o Flexible deployment - since operational tooling and API clients 106 can rely on rich, uniform format, they can be safely combined and 107 substituted as needed. 109 o Evolvability - new APIs, conforming to the standard, can safely be 110 introduced in any environment and ecosystem that also conforms to 111 the same standard, without costly coordination and testing 112 requirements. 114 This document defines a "health check" format using the JSON format 115 [RFC8259] for APIs to use as a standard point for the health 116 information they offer. Having a well-defined format for this 117 purpose promotes good practice and tooling. 119 2. Notational Conventions 121 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 122 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 123 document are to be interpreted as described in [RFC2119]. 125 3. API Health Response 127 The API Health Response Format (or, interchangeably, "health check 128 response format") uses the JSON format described in [RFC8259] and has 129 the media type "application/health+json". 131 Its content consists of a single mandatory root field ("status") and 132 several optional fields: 134 o status: (required) indicates whether the service status is 135 acceptable or not. API publishers SHOULD use following values for 136 the field: 138 * "pass": healthy, 140 * "fail": unhealthy, and 142 * "warn": healthy, with some concerns. 144 The value of the status field is tightly related with the HTTP 145 response code returned by the health endpoint. For "pass" and 146 "warn" statuses HTTP response code in the 2xx-3xx range MUST be 147 used. For "fail" status HTTP response code in the 4xx-5xx range 148 MUST be used. In case of the "warn" status, endpoint SHOULD 149 return HTTP status in the 2xx-3xx range and additional information 150 SHOULD be provided, utilizing optional fields of the response. 152 A health endpoint is only meaningful in the context of the 153 component it indicates the health of. It has no other meaning or 154 purpose. As such, its health is a conduit to the health of the 155 component. Clients SHOULD assume that the HTTP response code 156 returned by the health endpoint is applicable to the entire 157 component (e.g. a larger API or a microservice). This is 158 compatible with the behavior that current infrastructural tooling 159 expects: load-balancers, service discoveries and others, utilizing 160 health-checks. 162 o version: (optional) public version of the service. 164 o releaseID: (optional) in well-designed APIs, backwards-compatible 165 changes in the service should not update a version number. APIs 166 usually change their version number as infrequently as possible, 167 to preserve stable interface. However implementation of an API 168 may change much more frequently, which leads to the importance of 169 having separate "release number" or "releaseID" that is different 170 from the public version of the API. 172 o notes: (optional) array of notes relevant to current state of 173 health 175 o output: (optional) raw error output, in case of "fail" or "warn" 176 states. This field SHOULD be omitted for "pass" state. 178 o details: (optional) an object representing status of sub- 179 components of the service in question. Please refer to the "The 180 Details Object" section for more information. 182 o links: (optional) an array of objects containing link relations 183 and URIs [RFC3986] for external links that MAY contain more 184 information about the health of the endpoint. Per web-linking 185 standards [RFC5988] a link relationship SHOULD either be a common/ 186 registered one or be indicated as a URI, to avoid name clashes. 187 If a "self" link is provided, it MAY be used by clients to check 188 health via HTTP response code, as mentioned above. 190 o serviceID: (optional) unique identifier of the service, in the 191 application scope. 193 o description: (optional) human-friendly description of the service. 195 4. The Details Object 197 The "details" object MAY have a number of unique keyes, one for each 198 logical sub-components. Since each sub-component may be backed by 199 several nodes with varying health statuses, the key points to an 200 array of objects. In case of a single-node sub-component (or if 201 presence of nodes is not relevant), a single-element array should be 202 used as the value, for consistency. 204 The key identifying an element in the object should be a unique 205 string within the details section. It MAY have two parts: 206 "{componentName}:{metricName}", in which case the meaning of the 207 parts SHOULD be as follows: 209 o componentName: (optional) human-readable name for the component. 210 MUST not contain a colon, in the name, since colon is used as a 211 separator. 213 o metricName: (optional) name of the metrics that the status is 214 reported for. MUST not contain a colon, in the name, since colon 215 is used as a separator and can be one of: 217 * Pre-defined value from this spec. Pre-defined values include: 219 + utilization 221 + responseTime 223 + connections 225 + uptime 227 * A common and standard term from a well-known source such as 228 schema.org, IANA or microformats. 230 * A URI that indicates extra semantics and processing rules that 231 MAY be provided by a resource at the other end of the URI. 232 URIs do not have to be dereferenceable, however. They are just 233 a namespace, and the meaning of a namespace CAN be provided by 234 any convenient means (e.g. publishing an RFC, Swagger document 235 or a nicely printed book). 237 On the value eside of the equation, each "component details" object 238 in the array MAY have one of the following object keys: 240 o componentId: (optional) unique identifier of an instance of a 241 specific sub-component/dependency of a service. Multiple objects 242 with the same componentID MAY appear in the details, if they are 243 from different nodes. 245 o componentType: (optional) SHOULD be present if componentName is 246 present. Type of the component. Could be one of: 248 * Pre-defined value from this spec. Pre-defined values include: 250 + component 252 + datastore 254 + system 256 * A common and standard term from a well-known source such as 257 schema.org, IANA or microformats. 259 * A URI that indicates extra semantics and processing rules that 260 MAY be provided by a resource at the other end of the URI. 261 URIs do not have to be dereferenceable, however. They are just 262 a namespace, and the meaning of a namespace CAN be provided by 263 any convenient means (e.g. publishing an RFC, Swagger document 264 or a nicely printed book). 266 o metricValue: (optional) could be any valid JSON value, such as: 267 string, number, object, array or literal. 269 o metricUnit: (optional) SHOULD be present if metricValue is 270 present. Could be one of: 272 * A common and standard term from a well-known source such as 273 schema.org, IANA, microformats, or a standards document such as 274 [RFC3339]. 276 * A URI that indicates extra semantics and processing rules that 277 MAY be provided by a resource at the other end of the URI. 278 URIs do not have to be dereferenceable, however. They are just 279 a namespace, and the meaning of a namespace CAN be provided by 280 any convenient means (e.g. publishing an RFC, Swagger document 281 or a nicely printed book). 283 o time: the date-time, in ISO8601 format, at which the reading of 284 the metricValue was recorded. This assumes that the value can be 285 cached and the reading typically doesn't happen in real time, for 286 performance and scalability purposes. 288 o output: (optional) has the exact same meaning as the top-level 289 "output" element, but for the sub-component. 291 o links: (optional) has the exact same meaning as the top-level 292 "output" element, but for the sub-component. 294 5. Example Output 296 GET /health HTTP/1.1 297 Host: example.org 298 Accept: application/health+json 300 HTTP/1.1 200 OK 301 Content-Type: application/health+json 302 Cache-Control: max-age=3600 303 Connection: close 305 { 306 "status": "pass", 307 "version": "1", 308 "releaseID": "1.2.2", 309 "notes": [""], 310 "output": "", 311 "serviceID": "f03e522f-1f44-4062-9b55-9587f91c9c41", 312 "description": "health of authz service", 313 "details": { 314 "cassandra:responseTime": [ 315 { 316 "componentId": "dfd6cf2b-1b6e-4412-a0b8-f6f7797a60d2", 317 "componentType": "datastore", 318 "metricValue": 250, 319 "metricUnit": "ms", 320 "status": "pass", 321 "time": "2018-01-17T03:36:48Z", 322 "output": "" 323 } 324 ], 325 "cassandra:connections": [ 326 { 327 "componentId": "dfd6cf2b-1b6e-4412-a0b8-f6f7797a60d2", 328 "type": "datastore", 329 "metricValue": 75, 330 "status": "warn", 331 "time": "2018-01-17T03:36:48Z", 332 "output": "", 333 "links": { 334 "self": "http://api.example.com/dbnode/dfd6cf2b/health" 335 } 336 } 337 ], 338 "uptime": [ 339 { 340 "componentType": "system", 341 "metricValue": 1209600.245, 342 "metricUnit": "s", 343 "status": "pass", 344 "time": "2018-01-17T03:36:48Z" 345 } 346 ], 347 "cpu:utilization": [ 348 { 349 "componentId": "6fd416e0-8920-410f-9c7b-c479000f7227", 350 "node": 1, 351 "componentType": "system", 352 "metricValue": 85, 353 "metricUnit": "percent", 354 "status": "warn", 355 "time": "2018-01-17T03:36:48Z", 356 "output": "" 357 }, 358 { 359 "componentId": "6fd416e0-8920-410f-9c7b-c479000f7227", 360 "node": 2, 361 "componentType": "system", 362 "metricValue": 85, 363 "metricUnit": "percent", 364 "status": "warn", 365 "time": "2018-01-17T03:36:48Z", 366 "output": "" 367 } 368 ], 369 "memory:utilization": [ 370 { 371 "componentId": "6fd416e0-8920-410f-9c7b-c479000f7227", 372 "node": 1, 373 "componentType": "system", 374 "metricValue": 8.5, 375 "metricUnit": "GiB", 376 "status": "warn", 377 "time": "2018-01-17T03:36:48Z", 378 "output": "" 379 }, 380 { 381 "componentId": "6fd416e0-8920-410f-9c7b-c479000f7227", 382 "node": 2, 383 "componentType": "system", 384 "metricValue": 5500, 385 "metricUnit": "MiB", 386 "status": "pass", 387 "time": "2018-01-17T03:36:48Z", 388 "output": "" 389 } 390 ] 391 }, 392 "links": { 393 "about": "http://api.example.com/about/authz", 394 "http://api.x.io/rel/thresholds": 395 "http://api.x.io/about/authz/thresholds" 396 } 397 } 399 6. Security Considerations 401 Clients need to exercise care when reporting health information. 402 Malicious actors could use this information for orchestrating 403 attacks. In some cases the health check endpoints may need to be 404 authenticated and institute role-based access control. 406 7. IANA Considerations 408 The media type for health check response is application/health+json. 410 o Media type name: application 412 o Media subtype name: health+json 414 o Required parameters: n/a 416 o Optional parameters: n/a 418 o Encoding considerations: binary 420 o Security considerations: Health+JSON shares security issues common 421 to all JSON content types. See RFC 8259 Section #12 for 422 additional information. 424 Health+JSON allows utilization of Uniform Resource Identifiers 425 (URIs) and as such shares security issues common to URI usage. 426 See RFC 3986 Section #7 for additional information. 428 Since Hyper+JSON can carry wide variety of data, some data may 429 require privacy or integrity services. This specification does 430 not prescribe any specific solution and assumes that concrete 431 implementations will utilize common, trusted approaches such as 432 TLS/HTTPS, OAuth2 etc. 434 o Interoperability considerations: None 435 o Published specification: this RFC draft 437 o Applications which use this media: Various 439 o Fragment identifier considerations: Health+JSON follows RFC6901 440 for implementing URI Fragment Identification standard to JSON 441 content types. 443 o Restrictions on usage: None 445 o Additional information: 447 1. Deprecated alias names for this type: n/a 449 2. Magic number(s): n/a 451 3. File extension(s): .json 453 4. Macintosh file type code: TEXT 455 5. Object Identifiers: n/a 457 o General Comments: 459 o Person to contact for further information: 461 1. Name: Irakli Nadareishvili 463 2. Email: irakli@gmail.com 465 o Intended usage: Common 467 o Author/Change controller: Irakli Nadareishvili 469 8. Acknowledgements 471 Thanks to Mike Amundsen, Erik Wilde, Justin Bachorik and Randall 472 Randall for their suggestions and feedback. And to Mark Nottingham 473 for blueprint for authoring RFCs easily. 475 9. Creating and Serving Health Responses 477 When making an health check endpoint available, there are a few 478 things to keep in mind: 480 o A health response endpoint is best located at a memorable and 481 commonly-used URI, such as "health" because it will help self- 482 discoverability by clients. 484 o Health check responses can be personalized. For example, you 485 could advertise different URIs, and/or different kinds of link 486 relations, to afford different clients access to additional health 487 check information. 489 o Health check responses must be assigned a freshness lifetime 490 (e.g., "Cache-Control: max-age=3600") so that clients can 491 determine how long they could cache them, to avoid overly frequent 492 fetching and unintended DDOS-ing of the service. 494 o Custom link relation types, as well as the URIs for variables, 495 should lead to documentation for those constructs. 497 10. Consuming Health Check Responses 499 Clients might use health check responses in a variety of ways. 501 Note that the health check response is a "living" document; links 502 from the health check response MUST NOT be assumed to be valid beyond 503 the freshness lifetime of the health check response, as per HTTP's 504 caching model [RFC7234]. 506 As a result, clients ought to cache the health check response (as per 507 [RFC7234]), to avoid fetching it before every interaction (which 508 would otherwise be required). 510 Likewise, a client encountering a 404 (Not Found) on a link is 511 encouraged to obtain a fresh copy of the health check response, to 512 assure that it is up-to-date. 514 11. References 516 11.1. Normative References 518 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 519 Requirement Levels", BCP 14, RFC 2119, 520 DOI 10.17487/RFC2119, March 1997, 521 . 523 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 524 Resource Identifier (URI): Generic Syntax", STD 66, 525 RFC 3986, DOI 10.17487/RFC3986, January 2005, 526 . 528 [RFC5988] Nottingham, M., "Web Linking", RFC 5988, 529 DOI 10.17487/RFC5988, October 2010, 530 . 532 [RFC7234] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, 533 Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching", 534 RFC 7234, DOI 10.17487/RFC7234, June 2014, 535 . 537 [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data 538 Interchange Format", STD 90, RFC 8259, 539 DOI 10.17487/RFC8259, December 2017, 540 . 542 11.2. Informative References 544 [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 545 Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002, 546 . 548 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 549 Specifications and Registration Procedures", BCP 13, 550 RFC 6838, DOI 10.17487/RFC6838, January 2013, 551 . 553 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 554 Protocol (HTTP/1.1): Message Syntax and Routing", 555 RFC 7230, DOI 10.17487/RFC7230, June 2014, 556 . 558 [RFC7231] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 559 Protocol (HTTP/1.1): Semantics and Content", RFC 7231, 560 DOI 10.17487/RFC7231, June 2014, 561 . 563 11.3. URIs 565 [1] https://github.com/inadarei/rfc-healthcheck/issues 567 [2] https://inadarei.github.io/rfc-healthcheck/ 569 [3] https://github.com/inadarei/rfc-healthcheck/commits/master 571 [4] https://datatracker.ietf.org/doc/draft-inadarei-api-health-check/ 573 Author's Address 574 Irakli Nadareishvili 575 114 5th Avenue 576 New York 577 United States 579 Email: irakli@gmail.com 580 URI: http://www.freshblurbs.com