Internet-Draft Candidate Selection May 2023
Hoffman Expires 30 November 2023 [Page]
Network Working Group
Intended Status:
P. Hoffman

Simple Random Candidate Selection


This document describes a process to randomly select a subset of candidates from a larger set of candidates. The process uses an unpredictable value can be trusted by all candidates. It uses randomizing based on a hash function to make the description of the process easy to understand.

This draft has a GitHub repository. Issues and pull requests can be made there.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 30 November 2023.

Table of Contents

1. Introduction

It is common to need to pick a subset of people from a larger group using a random selection method. This is often done on an ad hoc basis, but for some selections, a more formal process is needed, particularly if the people in the larger group don't all trust the administrator of the selection process to be unbiased.

This document gives a simple, understandable process that can be done for groups and subsets of arbitrary size. The process is purposely transparent and reproducible. It works with any group of entities that nave names: people, companies, locations, dates ("18 August 2022"), and so on.

As a simple example, a future leadership committee will have a fixed size. The members of the committee will be selected from a large pool of volunteers. Someone is in charge of collecting the names of the volunteers and making a randomized selection among them for the leadership committee. They can use the process in this document to make that selection in a way that is both provably random and understandable.

Due to the formatting used in this document, the reader is encouraged to read the HTML version, although the text version is still usable.

1.1. Earlier Work

RFC 3797 [NomCom] defines the process that the IETF uses to select the NomCom each year. It uses a random input to select from a list of candidate names in ASCII using modulo aritmetic the position of the names in alphabetical order. That process has some significant pitfalls described in the document updating RFC 3797, [NomCom-bis]. The process in this document uses a very different selection process to avoid those pitfalls.

After this document has passed IETF review, the IETF might consider using the process described here as a replacement for the process in RFC 3797, but the document is valuable even if the IETF does not want to change the NomCom process.

2. Overview of the Process

A few terms are used throughout this document:


The act of collecting names, making the random selection, and publishing the entire process.

ceremony administrator (CA):

The person who performs the steps of the ceremony.


The person, organization, or other namable entity that is possibly being selected during the ceremony.

candidate name:

The name selected by each candidate for the selection process. The candidate name is expressed as a string of Unicode characters [Unicode] in UTF-8 format [UTF-8].

difficult-to-predict number (D):

A publicly-visible number that is not known before the pool of candidates has been closed. Note that this is different from what is normally called a "random number". True random numbers are designed to be nearly impossible to predict, whereas D in this process has weak (but sufficient) randomness.

2.1. Basic Steps

The steps in a ceremony that follows this process is given here. See Section 3 for more detail on the steps.

  1. The CA starts the ceremony by performing the following steps at the same time:

    • Announces an end date for when the pool will be complete.
    • Announces a later date on which the difficult-to-predict number, D, will be selected.
    • Announces the source where D will be found.
    • Announces the number of candidates that will be selected (called S).
    • Opens up the pool of candidates for submission.
  2. Candidates submit their names to the pool until the closing date.
  3. On the closing date, the CA publishes the set of candidate names with the hexadecimal value of the UTF-8 encoding for each candidate name.
  4. On the date for selecting D, the CA gets D from the announced source.
  5. The CA calculates the hashes used to make the selection. They concatenate each candidate name with D (name first, then D), uses the SHA-256 hash function [SHA-2] on the new string, and records the value of the output as a UTF-8 string.
  6. The CA arranges the set of hash values in alphabetic order from highest to lowest. They then select the S candidates from the top of the list (that is, the names whose hash values are largest).

3. Specifics for the Process

3.1. Start of Ceremony

Much of the trust in the selection process is based on the CA not being able to influence the selection. If the CA can choose, or influence, the value of D, they can establish the ordering of the differences of the integers. Similarly, if one or more of the candidates can influence the value of D, they can increase their chance of being selected.

To make the process trustworthy, the value of D must be unrelated to the CA or the candidates, and it must be selected after the list of candidates is completed There are many sources of such values: stock market closing values, numbers chosen for large public lotteries, and so on. Section 3.1 of RFC 3797 lists many such sources.

The most important things for a ceremony is that the source is announced before the ceremony starts, that all participants and viewers of a ceremony can find the source on the date specified by the CA, and that everyone gets the same value when they go to the source for that day.

If the CA chooses to use stock market closing values, a common open source of those values is the Wall Street Journal. For example, the FTSE 100 Index is a long-established index based on 100 stocks; it is sometimes known by its stock ticker as "UXK". The daily closing for the FTSE 100 Index at the Wall Street Journal can currently be found here.

Note that the location for sources of daily closing values can change over time. The CA must check that the source they intende to use is still active, and still available when the ceremony starts.

3.2. Name Submission

The CA is the sole arbitrator for whether a candidate is allowed to enter the pool. The CA is also the sole arbitrator of what name string (in UTF-8) the candidate can use in the pool.

The order that the candidates join the pool does not affect the outcome of the selection process. Said another way, the pool is kept as an unordered set of candidates, not an ordered list of candidates.

3.3. Closing Submissions

At the closing of submissions, the CA verifies that the length of the set of candidates in the pool is larger than S. If the length is the same as S, the rest of the steps are unneeded (and could be confusing), because all candidates will automatically be selected. If the length is shorter than S, the ceremony stops because there are too few candidates.

The method for publishing the set of candidates is determined by the CA. Figure 3 gives an example of how a CA might publish this information.

3.4. Selecting D

On the day that the CA announced for the selection of D, the CA goes the the source they announced and gets D. They encode D as a UTF-8 string, which is fairly easy if the source is numeric. In the example of the FTSE 100 from Figure 4, a closing value for the day announced at the beginning of the ceremony might be "7623.10". This would be encoded in UTF-8 as the string of characters whose value is U+373632372e3130.

3.5. Calculating Hashes

Different programming libraries have different requirements for the input to hash functions. Section 5 uses the built-in hashlib library in Python, which requires that text strings come with a specified encoding.

3.6. Selecting S Candidates

The process of selecting is simply taking the S candidates whose hash value is. This can easily be determined by sorting the text representation of the hash values because in UTF-8 and ASCII, digits have lower codepoints than letters.

To complete the process, the CA should publish all known data for the ceremony. This includes S, D, the hexadecimal value of D, all of the information for each candidate, and the full list of selected candidates. Figure 5 shows an example of what this publication might look like.

4. Handling Process Issues

Ceremonies don't always go as planned. For example, after a ceremony completes, one or more of the selected candidates might be removed from the selected set due to voluntary withdrawal or established rules (such as no two candidates being from the same geographic region). In such cases, no new ceremony is needed: the CA simply selects the next candidate(s) on the list that is ordered by hash values.

(Note that the process in this document is quite different than that in [NomCom] and [NomCom-bis], where unselecting an already-selected candidate can cause a new random selection for all the remaining candidates. Such reestablishing of the candidate pool can cause confusion and hurt among the candidates who were first selected, then later unselected, due to no fault of the their own.)

Similarly, if after the selection process is completed, the size S of the selected set needs to increase, the CA simply selects the next candidate(s) on the list that is ordered by hash values.

5. Sample Code

The following is a list of figures for an implementation of the procedure shown in this document.

#!/usr/bin/env python3

# Program to randomly select some candidates from a group
#  See draft-hoffman-genarea-reandom-candidate-selection

import hashlib, sys
from pathlib import Path

# Helper function to turn a UTF-8 string into its hex representation
def hexify(in_str):
  return "".join([hex(c)[2:] for c in in_str.encode("utf8")])

# Santity check the input files given on the command line
if len(sys.argv) == 1:
  exit("Must give the name of the candidate file, and possibly " + \
    "the selection file, on the command line. Exiting.")
candidate_path = Path(sys.argv[1])
if not candidate_path.exists():
  exit(f"The file {str(candidate_path)} does not exist. Exiting.")
  file_contents ="rt", encoding="utf8")
  exit("The candidates file do not appear to be in UTF-8. Exiting.")
candidate_lines =
# See if there is a second file for selecting
if len(sys.argv) == 3:
  run_including_selection = True
  selection_path = Path(sys.argv[2])
  if not selection_path.exists():
    exit(f"The file {str(selection_path)} does not exist. Exiting.")
    file_contents ="rt", encoding="utf8")
    exit("The selection file does not appear to be UTF-8. Exiting.")
  selection_lines =
  # Extract D and S from the selection file
  S_str = selection_lines[0]
    S = int(S_str)
    print(f"The first line of the selection file, '{S_str}', " + \
      "is not an integer. Exiting.")
  D_str = selection_lines[1]
  D_hex = hexify(D_str)
  run_including_selection = False

# Get the candidates information
C_info = []
c_names = candidate_lines
diff_set = set()
for C_str in c_names:
  C_hex = "".join([hex(c)[2:] for c in C_str.encode("utf8")])
  if run_including_selection:
    C_with_D_str = C_str + D_str
    C_with_D_hex = hexify(C_with_D_str)
    C_with_D_hash = hashlib.sha256(C_with_D_hex.encode("utf-8"))
    C_info.append([C_str, C_hex, C_with_D_str, C_with_D_hex, \
    C_info.append([C_str, C_hex])

# Print the results
if run_including_selection:
  print(f"S is {S}")
  print(f"D is \"{D_str}\"")
  print(f" U+{D_hex}\n")
  print("Candidate information, sorted by hash of name including D")
  selected = []
  for this_info in sorted(C_info, key=lambda a: a[4], reverse=True):
    if S > 0:
      S -= 1
    print(f" U+{this_info[3]}")
    print(f" {this_info[4]}")
  print("\nSelected:\n    " + "\n    ".join(selected))
  for this_info in C_info:
    print(f" U+{this_info[1]}")
Figure 1: Example Python code for this procedure
Bob Dylan
石黒 一雄
Olga Tokarczuk
Peter Handke
Louise Glück
Abdulrazak Gurnah
Figure 2: Sample name list file
Bob Dylan
石黒 一雄
Olga Tokarczuk
Peter Handke
Louise Glück
Abdulrazak Gurnah
Figure 3: Full information for the names
S is 3
D is "7623.10"

Candidate information, sorted by hash of name including D
石黒 一雄7623.10
Louise Glück7623.10
Peter Handke7623.10
Olga Tokarczuk7623.10
Abdulrazak Gurnah7623.10
Bob Dylan7623.10

    石黒 一雄
    Louise Glück
    Peter Handke
Figure 5: Output of running the program on the list of names and selection information

6. IANA Considerations

This document has no IANA considerations.

7. Security Considerations

The value D used in this process is explicitly not cryptographically strong; in fact, it might provide only a few bits of randomness. For example, stock indexes that contain many stocks might be predictable after the third digit from the right, meaning that they only have randomness of about 10 bits. A candidate who has a lot of leeway in choosing their name can possibly increase their chance of being selected by as much as 0.1% with such source of randomness. If the CA feels that candidates have too much leeway in selecting their names and is concerned about candidates gaming the ceremony even to that tiny extent, that CA needs to choose a source for D with more randomness.

Such selection is outside the scope of this document because it would make the process more complicated or less understandable. Instead, this document relies on the CA to only allow sensible representation of candidate names and to accept a tiny chance that candidates can both predict D and be able to change the name they use to reflect that predicted value.

8. References

8.1. Normative References

Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF)", RFC 6234, DOI 10.17487/RFC6234, , <>.
The Unicode Consortium, "The Unicode Standard (latest version)", n.d., <>.
Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, , <>.

8.2. Informative References

Eastlake 3rd, D., "Publicly Verifiable Nominations Committee (NomCom) Random Selection", RFC 3797, DOI 10.17487/RFC3797, , <>.
Eastlake, D. E., "Publicly Verifiable Nominations Committee (NomCom) Random Selection", Work in Progress, Internet-Draft, draft-eastlake-rfc3797bis-02, , <>.

Author's Address

Paul Hoffman