idnits 2.17.1 draft-inadarei-api-health-check-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([2], [3], [4], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: o componentName: (optional) human-readable name for the component. MUST not contain a colon, in the name, since colon is used as a separator. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: o measurementName: (optional) name of the measurement type (a data point type) that the status is reported for. MUST not contain a colon, in the name, since colon is used as a separator. The observation's name can be one of: -- The document date (October 14, 2018) is 1992 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 633 -- Looks like a reference, but probably isn't: '2' on line 635 -- Looks like a reference, but probably isn't: '3' on line 637 -- Looks like a reference, but probably isn't: '4' on line 639 == Unused Reference: 'RFC6838' is defined on line 616, but no explicit reference was found in the text ** Obsolete normative reference: RFC 7234 (Obsoleted by RFC 9111) -- Obsolete informational reference (is this intentional?): RFC 7230 (Obsoleted by RFC 9110, RFC 9112) -- Obsolete informational reference (is this intentional?): RFC 7231 (Obsoleted by RFC 9110) Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group I. Nadareishvili 3 Internet-Draft October 14, 2018 4 Intended status: Informational 5 Expires: April 17, 2019 7 Health Check Response Format for HTTP APIs 8 draft-inadarei-api-health-check-02 10 Abstract 12 This document proposes a service health check response format for 13 HTTP APIs. 15 Note to Readers 17 *RFC EDITOR: please remove this section before publication* 19 The issues list for this draft can be found at 20 https://github.com/inadarei/rfc-healthcheck/issues [1]. 22 The most recent draft is at https://inadarei.github.io/rfc- 23 healthcheck/ [2]. 25 Recent changes are listed at https://github.com/inadarei/rfc- 26 healthcheck/commits/master [3]. 28 See also the draft's current status in the IETF datatracker, at 29 https://datatracker.ietf.org/doc/draft-inadarei-api-health-check/ 30 [4]. 32 Status of This Memo 34 This Internet-Draft is submitted in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF). Note that other groups may also distribute 39 working documents as Internet-Drafts. The list of current Internet- 40 Drafts is at https://datatracker.ietf.org/drafts/current/. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 This Internet-Draft will expire on April 17, 2019. 49 Copyright Notice 51 Copyright (c) 2018 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (https://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 67 2. Notational Conventions . . . . . . . . . . . . . . . . . . . 3 68 3. API Health Response . . . . . . . . . . . . . . . . . . . . . 4 69 3.1. status . . . . . . . . . . . . . . . . . . . . . . . . . 4 70 3.2. version . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 3.3. releaseId . . . . . . . . . . . . . . . . . . . . . . . . 5 72 3.4. notes . . . . . . . . . . . . . . . . . . . . . . . . . . 5 73 3.5. output . . . . . . . . . . . . . . . . . . . . . . . . . 5 74 3.6. details . . . . . . . . . . . . . . . . . . . . . . . . . 5 75 3.7. links . . . . . . . . . . . . . . . . . . . . . . . . . . 5 76 3.8. serviceId . . . . . . . . . . . . . . . . . . . . . . . . 5 77 3.9. description . . . . . . . . . . . . . . . . . . . . . . . 5 78 4. The Details Object . . . . . . . . . . . . . . . . . . . . . 6 79 4.1. componentId . . . . . . . . . . . . . . . . . . . . . . . 7 80 4.2. componentType . . . . . . . . . . . . . . . . . . . . . . 7 81 4.3. observedValue . . . . . . . . . . . . . . . . . . . . . . 7 82 4.4. observedUnit . . . . . . . . . . . . . . . . . . . . . . 7 83 4.5. status . . . . . . . . . . . . . . . . . . . . . . . . . 8 84 4.6. time . . . . . . . . . . . . . . . . . . . . . . . . . . 8 85 4.7. output . . . . . . . . . . . . . . . . . . . . . . . . . 8 86 4.8. links . . . . . . . . . . . . . . . . . . . . . . . . . . 8 87 5. Example Output . . . . . . . . . . . . . . . . . . . . . . . 8 88 6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 89 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 90 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 91 9. Creating and Serving Health Responses . . . . . . . . . . . . 12 92 10. Consuming Health Check Responses . . . . . . . . . . . . . . 13 93 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 94 11.1. Normative References . . . . . . . . . . . . . . . . . . 13 95 11.2. Informative References . . . . . . . . . . . . . . . . . 14 96 11.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 14 98 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 14 100 1. Introduction 102 The vast majority of modern APIs driving data to web and mobile 103 applications use HTTP [RFC7230] as their protocol. The health and 104 uptime of these APIs determine availability of the applications 105 themselves. In distributed systems built with a number of APIs, 106 understanding the health status of the APIs and making corresponding 107 decisions, for caching, failover or circuit-breaking, are essential 108 to the ability of providing highly-available solutions. 110 There exists a wide variety of operational software that relies on 111 the ability to read health check response of APIs. However, there is 112 currently no standard for the health check output response, so most 113 applications either rely on the basic level of information included 114 in HTTP status codes [RFC7231] or use task-specific formats. 116 Usage of task-specific or application-specific formats creates 117 significant challenges, disallowing any meaningful interoperability 118 across different implementations and between different tooling. 120 Standardizing a format for health checks can provide any of a number 121 of benefits, including: 123 o Flexible deployment - since operational tooling and API clients 124 can rely on rich, uniform format, they can be safely combined and 125 substituted as needed. 127 o Evolvability - new APIs, conforming to the standard, can safely be 128 introduced in any environment and ecosystem that also conforms to 129 the same standard, without costly coordination and testing 130 requirements. 132 This document defines a "health check" format using the JSON format 133 [RFC8259] for APIs to use as a standard point for the health 134 information they offer. Having a well-defined format for this 135 purpose promotes good practice and tooling. 137 2. Notational Conventions 139 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 140 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 141 document are to be interpreted as described in [RFC2119]. 143 3. API Health Response 145 Health Check Response Format for HTTP APIs uses the JSON format 146 described in [RFC8259] and has the media type "application/ 147 health+json". 149 Its content consists of a single mandatory root field ("status") and 150 several optional fields: 152 3.1. status 154 status: (required) indicates whether the service status is acceptable 155 or not. API publishers SHOULD use following values for the field: 157 o "pass": healthy (acceptable aliases: "ok" to support Node's 158 Terminius and "up" for Java's SpringBoot), 160 o "fail": unhealthy (acceptable aliases: "error" to support Node's 161 Terminius and "down" for Java's SpringBoot), and 163 o "warn": healthy, with some concerns. 165 The value of the status field is case-insensitive and is tightly 166 related with the HTTP response code returned by the health endpoint. 167 For "pass" and "warn" statuses, HTTP response code in the 2xx-3xx 168 range MUST be used. For "fail" status, HTTP response code in the 169 4xx-5xx range MUST be used. In case of the "warn" status, endpoints 170 SHOULD return HTTP status in the 2xx-3xx range, and additional 171 information SHOULD be provided, utilizing optional fields of the 172 response. 174 A health endpoint is only meaningful in the context of the component 175 it indicates the health of. It has no other meaning or purpose. As 176 such, its health is a conduit to the health of the component. 177 Clients SHOULD assume that the HTTP response code returned by the 178 health endpoint is applicable to the entire component (e.g. a larger 179 API or a microservice). This is compatible with the behavior that 180 current infrastructural tooling expects: load-balancers, service 181 discoveries and others, utilizing health-checks. 183 3.2. version 185 version: (optional) public version of the service. 187 3.3. releaseId 189 releaseId: (optional) in well-designed APIs, backwards-compatible 190 changes in the service should not update a version number. APIs 191 usually change their version number as infrequently as possible, to 192 preserve stable interface. However implementation of an API may 193 change much more frequently, which leads to the importance of having 194 separate "release number" or "releaseID" that is different from the 195 public version of the API. 197 3.4. notes 199 notes: (optional) array of notes relevant to current state of health 201 3.5. output 203 output: (optional) raw error output, in case of "fail" or "warn" 204 states. This field SHOULD be omitted for "pass" state. 206 3.6. details 208 details (optional) is an object that provides more details about the 209 status of the service as it pertains to the information about the 210 downstream dependencies of the service in question. Please refer to 211 the "The Details Object" section for more information. 213 3.7. links 215 links (optional) is an array of objects containing link relations and 216 URIs [RFC3986] for external links that MAY contain more information 217 about the health of the endpoint. Per web-linking standards 218 [RFC8288] a link relationship SHOULD either be a common/registered 219 one or be indicated as a URI, to avoid name clashes. If a "self" 220 link is provided, it MAY be used by clients to check health via HTTP 221 response code, as mentioned above. 223 3.8. serviceId 225 serviceId (optional) is a unique identifier of the service, in the 226 application scope. 228 3.9. description 230 description (optional) is a human-friendly description of the 231 service. 233 4. The Details Object 235 The "details" object MAY have a number of unique keyes, one for each 236 logical downstream dependencies or sub-components. Since each sub- 237 component may be backed by several nodes with varying health 238 statuses, these keys point to arrays of objects. In case of a 239 single-node sub-component (or if presence of nodes is not relevant), 240 a single-element array should be used as the value, for consistency. 242 The key identifying an element in the object should be a unique 243 string within the details section. It MAY have two parts: 244 "{componentName}:{measurementName}", in which case the meaning of the 245 parts SHOULD be as follows: 247 o componentName: (optional) human-readable name for the component. 248 MUST not contain a colon, in the name, since colon is used as a 249 separator. 251 o measurementName: (optional) name of the measurement type (a data 252 point type) that the status is reported for. MUST not contain a 253 colon, in the name, since colon is used as a separator. The 254 observation's name can be one of: 256 * A pre-defined value from this spec. Pre-defined values 257 include: 259 + utilization 261 + responseTime 263 + connections 265 + uptime 267 * A common and standard term from a well-known source such as 268 schema.org, IANA or microformats. 270 * A URI that indicates extra semantics and processing rules that 271 MAY be provided by a resource at the other end of the URI. 272 URIs do not have to be dereferenceable, however. They are just 273 a namespace, and the meaning of a namespace CAN be provided by 274 any convenient means (e.g. publishing an RFC, Swagger document 275 or a nicely printed book). 277 On the value side of the equation, each "component details" object in 278 the array MAY have one of the following object keys: 280 4.1. componentId 282 componentId: (optional) is a unique identifier of an instance of a 283 specific sub-component/dependency of a service. Multiple objects 284 with the same componentID MAY appear in the details, if they are from 285 different nodes. 287 4.2. componentType 289 componentType: (optional) SHOULD be present if componentName is 290 present. It's a type of the component and could be one of: 292 o Pre-defined value from this spec. Pre-defined values include: 294 * component 296 * datastore 298 * system 300 o A common and standard term from a well-known source such as 301 schema.org, IANA or microformats. 303 o A URI that indicates extra semantics and processing rules that MAY 304 be provided by a resource at the other end of the URI. URIs do 305 not have to be dereferenceable, however. They are just a 306 namespace, and the meaning of a namespace CAN be provided by any 307 convenient means (e.g. publishing an RFC, Swagger document or a 308 nicely printed book). 310 4.3. observedValue 312 observedValue: (optional) could be any valid JSON value, such as: 313 string, number, object, array or literal. 315 4.4. observedUnit 317 observedUnit (optional) SHOULD be present if observedValue is 318 present. Calrifies the unit of measurement in which observedUnit is 319 reported, e.g. for a time-based value it is important to know whether 320 the time is reported in seconds, minutes, hours or something else. 321 To make sure unit is denoted by a well-understood name or an 322 abbreviation, it should be one of: 324 o A common and standard term from a well-known source such as 325 schema.org, IANA, microformats, or a standards document such as 326 [RFC3339]. 328 o A URI that indicates extra semantics and processing rules that MAY 329 be provided by a resource at the other end of the URI. URIs do 330 not have to be dereferenceable, however. They are just a 331 namespace, and the meaning of a namespace CAN be provided by any 332 convenient means (e.g. publishing an RFC, Swagger document or a 333 nicely printed book). 335 4.5. status 337 status (optional) has the exact same meaning as the top-level 338 "output" element, but for the sub-component/downstream dependency 339 represented by the details object. 341 4.6. time 343 time (optional) is the date-time, in ISO8601 format, at which the 344 reading of the observedValue was recorded. This assumes that the 345 value can be cached and the reading typically doesn't happen in real 346 time, for performance and scalability purposes. 348 4.7. output 350 output (optional) has the exact same meaning as the top-level 351 "output" element, but for the sub-component/downstream dependency 352 represented by the details object. 354 4.8. links 356 links (optional) has the exact same meaning as the top-level "output" 357 element, but for the sub-component/downstream dependency represented 358 by the details object. 360 5. Example Output 362 GET /health HTTP/1.1 363 Host: example.org 364 Accept: application/health+json 366 HTTP/1.1 200 OK 367 Content-Type: application/health+json 368 Cache-Control: max-age=3600 369 Connection: close 371 { 372 "status": "pass", 373 "version": "1", 374 "releaseID": "1.2.2", 375 "notes": [""], 376 "output": "", 377 "serviceID": "f03e522f-1f44-4062-9b55-9587f91c9c41", 378 "description": "health of authz service", 379 "details": { 380 "cassandra:responseTime": [ 381 { 382 "componentId": "dfd6cf2b-1b6e-4412-a0b8-f6f7797a60d2", 383 "componentType": "datastore", 384 "observedValue": 250, 385 "observedUnit": "ms", 386 "status": "pass", 387 "time": "2018-01-17T03:36:48Z", 388 "output": "" 389 } 390 ], 391 "cassandra:connections": [ 392 { 393 "componentId": "dfd6cf2b-1b6e-4412-a0b8-f6f7797a60d2", 394 "type": "datastore", 395 "observedValue": 75, 396 "status": "warn", 397 "time": "2018-01-17T03:36:48Z", 398 "output": "", 399 "links": { 400 "self": "http://api.example.com/dbnode/dfd6cf2b/health" 401 } 402 } 403 ], 404 "uptime": [ 405 { 406 "componentType": "system", 407 "observedValue": 1209600.245, 408 "observedUnit": "s", 409 "status": "pass", 410 "time": "2018-01-17T03:36:48Z" 411 } 412 ], 413 "cpu:utilization": [ 414 { 415 "componentId": "6fd416e0-8920-410f-9c7b-c479000f7227", 416 "node": 1, 417 "componentType": "system", 418 "observedValue": 85, 419 "observedUnit": "percent", 420 "status": "warn", 421 "time": "2018-01-17T03:36:48Z", 422 "output": "" 423 }, 424 { 425 "componentId": "6fd416e0-8920-410f-9c7b-c479000f7227", 426 "node": 2, 427 "componentType": "system", 428 "observedValue": 85, 429 "observedUnit": "percent", 430 "status": "warn", 431 "time": "2018-01-17T03:36:48Z", 432 "output": "" 433 } 434 ], 435 "memory:utilization": [ 436 { 437 "componentId": "6fd416e0-8920-410f-9c7b-c479000f7227", 438 "node": 1, 439 "componentType": "system", 440 "observedValue": 8.5, 441 "observedUnit": "GiB", 442 "status": "warn", 443 "time": "2018-01-17T03:36:48Z", 444 "output": "" 445 }, 446 { 447 "componentId": "6fd416e0-8920-410f-9c7b-c479000f7227", 448 "node": 2, 449 "componentType": "system", 450 "observedValue": 5500, 451 "observedUnit": "MiB", 452 "status": "pass", 453 "time": "2018-01-17T03:36:48Z", 454 "output": "" 455 } 456 ] 457 }, 458 "links": { 459 "about": "http://api.example.com/about/authz", 460 "http://api.x.io/rel/thresholds": 461 "http://api.x.io/about/authz/thresholds" 462 } 463 } 465 6. Security Considerations 467 Clients need to exercise care when reporting health information. 468 Malicious actors could use this information for orchestrating 469 attacks. In some cases the health check endpoints may need to be 470 authenticated and institute role-based access control. 472 7. IANA Considerations 474 The media type for health check response is application/health+json. 476 o Media type name: application 478 o Media subtype name: health+json 480 o Required parameters: n/a 482 o Optional parameters: n/a 484 o Encoding considerations: binary 486 o Security considerations: Health+JSON shares security issues common 487 to all JSON content types. See RFC 8259 Section #12 for 488 additional information. 490 Health+JSON allows utilization of Uniform Resource Identifiers 491 (URIs) and as such shares security issues common to URI usage. 492 See RFC 3986 Section #7 for additional information. 494 Since health+json can carry wide variety of data, some data may 495 require privacy or integrity services. This specification does 496 not prescribe any specific solution and assumes that concrete 497 implementations will utilize common, trusted approaches such as 498 TLS/HTTPS, OAuth2 etc. 500 o Interoperability considerations: None 502 o Published specification: this RFC draft 504 o Applications which use this media: Various 506 o Fragment identifier considerations: Health+JSON follows RFC6901 507 for implementing URI Fragment Identification standard to JSON 508 content types. 510 o Restrictions on usage: None 512 o Additional information: 514 1. Deprecated alias names for this type: n/a 516 2. Magic number(s): n/a 518 3. File extension(s): .json 519 4. Macintosh file type code: TEXT 521 5. Object Identifiers: n/a 523 o General Comments: 525 o Person to contact for further information: 527 1. Name: Irakli Nadareishvili 529 2. Email: irakli@gmail.com 531 o Intended usage: Common 533 o Author/Change controller: Irakli Nadareishvili 535 8. Acknowledgements 537 Thanks to Mike Amundsen, Erik Wilde, Justin Bachorik and Randall 538 Randall for their suggestions and feedback. And to Mark Nottingham 539 for blueprint for authoring RFCs easily. 541 9. Creating and Serving Health Responses 543 When making an health check endpoint available, there are a few 544 things to keep in mind: 546 o A health response endpoint is best located at a memorable and 547 commonly-used URI, such as "health" because it will help self- 548 discoverability by clients. 550 o Health check responses can be personalized. For example, you 551 could advertise different URIs, and/or different kinds of link 552 relations, to afford different clients access to additional health 553 check information. 555 o Health check responses SHOULD be assigned a freshness lifetime 556 (e.g., "Cache-Control: max-age=3600") so that clients can 557 determine how long they could cache them, to avoid overly frequent 558 fetching and unintended DDOS-ing of the service. Any method of 559 cach lifetime negotiation provided by HTTP spec is acceptable 560 (e.g. ETags are just fine). 562 o Custom link relation types, as well as the URIs for variables, 563 should lead to documentation for those constructs. 565 10. Consuming Health Check Responses 567 Clients might use health check responses in a variety of ways. 569 Note that the health check response is a "living" document; links 570 from the health check response MUST NOT be assumed to be valid beyond 571 the freshness lifetime of the health check response, as per HTTP's 572 caching model [RFC7234]. 574 As a result, clients ought to cache the health check response (as per 575 [RFC7234]), to avoid fetching it before every interaction (which 576 would otherwise be required). 578 Likewise, a client encountering a 404 (Not Found) on a link is 579 encouraged to obtain a fresh copy of the health check response, to 580 assure that it is up-to-date. 582 11. References 584 11.1. Normative References 586 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 587 Requirement Levels", BCP 14, RFC 2119, 588 DOI 10.17487/RFC2119, March 1997, 589 . 591 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 592 Resource Identifier (URI): Generic Syntax", STD 66, 593 RFC 3986, DOI 10.17487/RFC3986, January 2005, 594 . 596 [RFC7234] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, 597 Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching", 598 RFC 7234, DOI 10.17487/RFC7234, June 2014, 599 . 601 [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data 602 Interchange Format", STD 90, RFC 8259, 603 DOI 10.17487/RFC8259, December 2017, 604 . 606 [RFC8288] Nottingham, M., "Web Linking", RFC 8288, 607 DOI 10.17487/RFC8288, October 2017, 608 . 610 11.2. Informative References 612 [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: 613 Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002, 614 . 616 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 617 Specifications and Registration Procedures", BCP 13, 618 RFC 6838, DOI 10.17487/RFC6838, January 2013, 619 . 621 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 622 Protocol (HTTP/1.1): Message Syntax and Routing", 623 RFC 7230, DOI 10.17487/RFC7230, June 2014, 624 . 626 [RFC7231] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 627 Protocol (HTTP/1.1): Semantics and Content", RFC 7231, 628 DOI 10.17487/RFC7231, June 2014, 629 . 631 11.3. URIs 633 [1] https://github.com/inadarei/rfc-healthcheck/issues 635 [2] https://inadarei.github.io/rfc-healthcheck/ 637 [3] https://github.com/inadarei/rfc-healthcheck/commits/master 639 [4] https://datatracker.ietf.org/doc/draft-inadarei-api-health-check/ 641 Author's Address 643 Irakli Nadareishvili 644 114 5th Avenue 645 New York 646 United States 648 Email: irakli@gmail.com 649 URI: http://www.freshblurbs.com