]>
PSI based on ECDHAnt Grouptianwu.wyc@antgroup.comAnt GroupPostfach 330440BeijingD-28359+49-421-218-63921+49-421-218-7000bainuan.cwt@antgroup.comAnt Groupyuwen.lyf@antgroup.comAnt Groupvince.hc@alibaba-inc.comAnt Groupjim.pj@antgroup.com
Security Area
Security DispatchInternet-Draft
This document describes Elliptic Curve Diffie-Hellman Private Set Intersection (ECDH-PSI).
It instantiates the classical Meadows match-making protocol with standard elliptic curves and hash-to-curve methods.
In ECDH-PSI, data items are encoded to points on an elliptic curve, and masked by the private keys of both parties.
After collecting the mutually masked datasets from both parties, a participant computes their intersection and outputs the corresponding original data items as result.
IntroductionPrivate Set Intersection (PSI) schemes enable the discovery of shared elements among different parties' datasets without revealing individual data.
They are widely used when there's a need to identify overlapping data elements between two or more parties
while preserving the confidentiality of each party's original data.
PSI is one of the most frequently used privacy preserving techniques in business, e.g. it enables a user to detect whether his/her password is leaked without giving away the password to the server , or multiple companies to find their common customers without giving each other their raw data.
The classical Diffie-Hellman PSI has been published for more than thirty years.
The academic has also proposed numerous PSI schemes with new functionalities and secure guarantees, such as , but DH-PSI is still preferable due to its simplicity and communication efficiency.
This document describes a widely deployed instance of DH-PSI denoted by Elliptic Curve Diffie-Hellman PSI (ECDH-PSI).
Compared with the finite field parameters used in the original version, elliptic curves are commonly considered to be more efficient and secure , and are extensively adopted by recent standards including .
In document describes 2-party ECDH-PSI as a two stage protocol, including a handshake phase and a data exchange phase.
In the handshake phase, the parties agree on the cipher suite and message flow to be used in data exchange phase.
Then, during the data exchange phase, original data elements are masked and transmitted in batches, allowing one or both parties to output the result.
Requirements Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP 14
when, and only when, they appear in all capitals, as shown here.
Data Structures and Notations
In this document, data structures are defined and encoded according to the conventions laid out in Section 3 of .
BackgroundThis section gives brief introductions for elliptic curves and hash-to-curve methods. Elliptic Curves
An elliptic curve E is defined by a two-variable equation over a finite field F with prime characteristic p larger than three.
Except for a special point called point at infinity, a point on E is a (x,y)-coordinate pair that satisfies the curve equation, where x and y are elements of F.
All distinct points of curve E consists an algebraic group of order n, where the point at infinity is the identity element.
Applications and standards generally use a prime order (sub)group <G> generated by a point G on E.
The order of <G> is denoted by r.
This document uses term "addition" to refer to the group operation of a elliptic curve group, meaning two points P and Q can be added together to get a third point R.
Scalar multiplication refers to the operation of adding a point P with itself repeatedly.
This document uses scalar_mul to denote the scalar multiplication process.
It takes a integer x<r and a point P as inputs and outputs the multiplication result Q, which is written by Q=scalar_mul(P,x).
Many cryptographic applications require the participants to generate elliptic curve key pairs.
Usually, a key pair on refers to a (PK,sk) tuple, where sk is a integer generated uniformly between 1 and r-1, and PK=scalar_mul(G,sk).
In the context of ECDH-PSI, only sk, namely the private key, is used.
Thus, this document uses the notation sk=keygen() to refer to the process of generating a private key on the elliptic curve and omits PK.
Hashing to Elliptic CurvesA hash_to_curve function encodes data of arbitrary length to a point on a elliptic curve.
The encoding process first hashes the byte string to one or more elements of fixed length, and then calculates a point on the curve with the elements using so-called map_to_curve and clear_cofactor functions.
describes a series of hash_to_curve methods (which are also called "suites") for standard curves such as NIST P-256 and curve25519.
It specifies two encoding types denoted by "encode_to_curve" and "hash_to_curve".
Both encodings can be proven indifferentiable from random oracles (ROs), but have different output distributions.
This document only uses "hash_to_curve" encodings which output uniformly random points.
It also uses the notation P=hash_to_curve(data) to refer to the process of encoding data of arbitrary length to a point P on the curve given in the context.
requires all uses of the hash_to_curve suites must enforce domain separation with domain separation tags (DSTs).
In particular, DSTs are distinct strings that enable different applications to simulate distinct RO instances with the same hash function.
That is, each application concatenates the DST with its message as the input to hash function.
This document allocates each suit with a unique DST.
For simplicity, DST strings are omitted when writing the process of calculating the hash value of a string.
The notation tag=hash(str) means that the hash function actually takes "dst||str" as its input rather than str alone, where dst is the DST string defined for the corresponding hash_to_curve suite.
Similarly, this document also omits DST strings when describing hash_to_curve functions, assuming that the corresponding DST has been taken as a part of the input.
A suite defined by includes a series of parameters such as the curve, hash function, and encoding type.
In applications, these suites are very efficient to be referred to, as the included parameters can also be used beyond the scenario of mapping data to curves.
For example, the participants can negotiate a hash_to_curve suite, and then use the curve deduced from the suite for other cryptographic functions such as encryption and key establishment.
To extend the usage of hash_to_curve suites, also describes methods and naming conventions for defining new suites, which enables developers to define suites with new curves and hash functions.
Protocol
We begin with a very brief description on how ECDH-PSI works.
Firstly, the participants, A and B, agree on a EC group <G> along with the hash_to_curve suites, and generate private keys over <G>.
Then, both A and B convert their data to points over <G> and multiply the points with both parties' private keys.
Next, they exchange the sets of masked elements as sets of EC points, and compute the intersection over these sets.
Finally, they output the original data items corresponding to the intersection of masked sets.
Assume that the parties have agreed on the suite, and generated private keys of sk_A and sk_B,
an simplified protocol flow of ECDH-PSI is shown by , and explained step-by-step as follows:

A participant maps its data items to EC points with hash_to_curve,
and masks the points locally with its own private key by scalar_mul.

A and B exchange their locally masked elements.

Upon receiving the masked data from its partner, a participant doubly masks the received points with its private key.

A participant sends the doubly-masked elements back to its partner.

The participant calculates intersection of the set calculated in Step 3 and the set received in Step 4,
and finally outputs the corresponding original elements.

In order to clarify the statement, this document uses "the first round" to refer to the data exchange process of Step 2, and "the second round" for the process of Step 4.
The classical description of ECDH-PSI enables both parties to output PSI results and thus requires two complete rounds (i.e. Step 2 and Step 4) to exchange masked data.
However, many application scenarios require that should only one participant output the intersection and the other one get nothing.
In such scenarios, Step 4 and Step 5 are executed unidirectionally,
where only one party sends doubly-masked data in Step 4 and only the receiver of that message can output the result.
To output PSI result, the participants should also match the original data items with their corresponding EC points.
In particular, the relationship can be established with unique indexes,
where the original data item and the masked EC point(s) are linked to the same index.
Overview
The specification of ECDH-PSI consists of two phases as shown in ,
including a handshake phase which negotiates the protocol parameters
and a data exchange phase that performs the operations described by .

In the handshake phase, a participant, which is denoted by requester, sends a HandshakeRequest message to initiate the ECDH-PSI protocol flow.
This message carries lists of parameters supported by the requester and information about the requester's data.
Then, the other participant, denoted by responder, selects parameters from the requester's lists, and sends the parameters with its data information in a HandshakeResponse message.

Next, in the data exchange phase, both parties perform ECDH-PSI operations with parameters selected in the handshake phase,
where EcdhPsiBatch messages are exchanged.
The receiver of an EcdhPsiBatch replies with a EcdhPsiBatchResponse message,
which tells sender the handling status of last EcdhPsiBatch message.

This specification enables masked data to be transferred by many batches, as the entire data set may be extremely large and requires numerous resources to be handled with properly.
For example, the size of a data set may exceed the limit of a single package of the underlying network protocol.
The participants can divide the data set to multiple batches and use multiple EcdhPsiBatch messages to send them so as to meet the limits of network, storage or computation resources.
Besides hash_to_curve suites and data formats, the handshake phase also determines the interaction pattern for data exchange phase. The pattern is decided by the following options:

The participant that outputs the result.
If only one participant (for example, A) has the privilege of obtaining the result, then, in the second round, EcdhPsiBatch messages will be only sent from B to A.

The maximum size of a single EcdhPsiBatch message.
Upon the determination of maximum batch size, the participants can deduce the number of batch messages sent or received in each round by dividing the size of entire data set with the maximum size.

The order of sending EcdhPsiBatch messages.
When both participants need to send batch messages in the same round, the requester will always send the first one.
Then, they can send EcdhPsiBatch messages in turn, or let the requester send all batches in sequence before the responder sending its data.

Handshake
The handshake phase includes a HandshakeRequest and a HandshakeResponse message.
The structures of these messages are documented as follows.
HandshakeRequestStructure of HandshakeRequest:
struct {
uint8 version = 1;
ProtocolProposal protocol_param;
DataProposal data_param;
} HandshakeRequest;
version: The version of ECDH-PSI protocol.protocol_param: ECDH-PSI protocol configurations supported by the requester.data_param: The requester's data information, including its data set size and proposals for data exchange pattern.Protocol Proposal
The ProtocolProposal message describes ECDH-PSI protocol parameters supported by the requester.
Structure of this message:
enum {
null (0),
P256_XMD:SHA-256_SSWU_NU_ (1),
P384_XMD:SHA-384_SSWU_NU_ (2),
P521_XMD:SHA-512_SSWU_NU_ (3),
curve25519_XMD:SHA-512_ELL2_NU_ (4),
(255)
} HashToCurveSuite
enum { compressed (0), uncompressed (1), (255) } PointOctetFormat
enum {
no_truncation (0),
128_bit_truncation (1),
(255)
} TruncationOption
struct {
HashToCurveSuite suites<1..255>;
PointOctetFormat point_octet_formats<1..255>;
TruncationOption truncation_options<1..255>;
} ProtocolProposal
suites: The list of supported HashToCurveSuite in order of the requester's preference.
.
Different suites use different DST strings as required by .
In particular, ECDH-PSI uses "ECDH-PSI-V<xx>-<suites>" as its DST, where <xx> is a two-digit number indicating the current protocol version as specified by the version field of HandshakeRequest, and <suites> is the ASCII string representation of the currently adopted suite as defined by .
As an example, when both participants agree to use the suite for P256 curve, the corresponding DST is "ECDH-PSI-V01-P256_XMD:SHA-256_SSWU_NU_".
point_octet_formats: The list of octet string formats representing EC points, in the order of requester's preference.
This field enables the participants to weigh the tradeoffs between the costs of communication and computation as discussed by .

For the curves defined in , the PointOctetFormat values refer to the corresponding compressed/uncompressed formats documented by .

Points on Curve25519 are always encoded as the input and output of X25519 function defined by Section 5 of as 32 byte strings.
Unlike the other curves, the scalar multiplication of curve25519 only relies on one 32-byte coordinate without the cost of decompressing to the (x,y)-coordinate representation, which eliminates the need of making a tradeoff.

truncation_options:
Whether the requester supports truncation for data transferred in the second round.
The options are listed in the requester's preference.
The truncation could decrease the costs of data transmission and storage, and even accelerate the computation process of intersection, at the expense of a (negligible) false-positive rate.
In this document, only 128-bit truncation is allowed, with a restriction that the total number of data items owned by both participants SHOULD be no more than 2^40, ensuring that the probability of collision is smaller than 2^-48. See for a detailed discussion.
The requester MUST set this field by a single no_truncation option when its data set size has already been equal or larger than 2^40.
The truncation process utilizes Key Derivation Functions (KDFs) as many key-exchange protocols that use shared secrets to derive shorter strings which can be seen as uniformly distributed cryptographic keys (e.g., ).
In this document, the KDFs are instantiated by Hashed Message Authentication Code (HMAC) , along with HMAC-based KDF (HKDF), and the hash function included in hash_to_curve suite.
Following , a KDF function kdf() takes a input keying material ikm, a salt value salt, an application specific information info and output length len as its inputs.
Let mask_data be the string representation of a doubly-masked element, a participant SHOULD truncate mask_data with:
truncated_mask_data = kdf(mask_data, nil, "ECDH-PSI", 128)
In particular, the KDF takes mask_data as the input keying material.
It does not take a salt value, uses ASCII string "ECDH-PSI" as the application specification information, and outputs a 128-bit string as the truncation result.
Data Proposal
The data_param field describes requester's data size and provides supported options on how to exchange EcdhPsiBatch messages.
Structure of this message:
enum { continuous (0), interactive (1), (255) } BatchMode
enum { both (0), requester (1), responder (2), (255) } OutputMode
struct{
BatchMode supported_batch_modes<1..255>;
OutputMode supported_output_modes<1..255>;
uint64 max_batch_size;
uint64 item_num;
} DataProposal
supported_batch_modes: The list of interaction patterns supported by the requester in the order of preference.
BatchMode describes the message interaction pattern when both parties need to send their data sets with multiple EcdhPsiBatch messages in the same round.
continuous: A participant sends its entire data set with continuous EcdhPsiBatch messages.
interactive: The participants send EcdhPsiBatch messages in turn.
If one party has sent all its batch messages before its partner, the partner SHOULD send its messages continuously.
When both parties need to send data batches in the same round, the requester SHOULD always send the first EcdhPsiBatch message.
supported_output_modes: The list of output modes supported by the requester in the order of its preference,
where OutputMode describes which party has the privilege of obtaining the result.
The meaning of OutputMode is explained as follows:
both: Both parties output the intersection result, which means EcdhPsiBatch are sent mutually in the second round.
requester: Only the requester outputs PSI result, which means only the responder sends message batches in the second round.
responder: Only the responder outputs PSI result, which means only the requester sends message batches in the second round.
max_batch_size: The maximum byte size of a single EcdhPsiBatch message allowed by the requester.
That is, the requester will not send a batch message beyond this limit, and it cannot receive a batch message having larger size.
The requester SHOULD decide the value of this field comprehensively according to its hardware/software configuration and environment.
item_num: The total number of requester's data items.
HandshakeResponse
The responder sends HandshakeResponse to reply to a HandshakeRequest message.
It includes selected parameters and the responder's data information.
Structure of this message:
enum {
success(0),
generic_error (0x01),
unsupported_version (0x02)
invalid_request (0x03),
out_of_resource (0x04),
unsupported_parameter (0x05),
(0xFF)
} HandshakeStatus
struct {
HandshakeStatus status;
ProtocolResult protocol_param;
DataResult data_param;
} HandshakeResponse;
status: Indicating the status of handshake. The meanings of error statuses are listed as follows:
generic_error: The error cannot be categorized to the rest error statuses defined by HandshakeStatus.unsupported_version: The responder does not support the ECDH-PSI protocol version given by version.invalid_request: The responder cannot parse the request correctly.
The message may be in wrong format and cannot be parsed as a normal HandshakeRequest message.
out_of_resource: The responder does not have enough resource to handle the ECDH-PSI process following to the options provided by the requester.unsupported_parameter: The responder rejects all options proposed by one of the suggested option lists included in HandshakeRequest.
The responder MUST ignore all options or parameters that it cannot recognize when parsing the HandshakeRequest message.
If one of the suggested option lists is filled with unrecognized parameters, it SHOULD reply with a HandshakeResponse carrying unsupported_parameter.
Upon receiving a HandshakeResponse message without success status, the requester MUST ignore the other fields included in this message.
protocol_param: The protocol parameter selected by the responder, including the hash_to_curve suite, EC point string format and truncation option.
data_param: This message includes the data exchange pattern selected by the responder and the size of responder's dataset.Protocol Result
The structure of ProtocolResult is defined as follows:
struct {
HashToCurveSuite suite;
PointOctetFormat point_octet_format;
TruncationOption truncation_option;
} ProtocolResult;
suite: The hash_to_curve suite selected by the responder.point_octet_format: The format of EC point octet string chosen by the responder.
truncation_option: The truncation option selected by the responder.
When deciding the field, the responder SHOULD consider the sum of both participants' dataset sizes.
If the sum is larger than 2^40, truncation MUST no be used, which means the responder MUST set this field by no_truncation.
Data Result
The structure of DataInfoResult is defined as follows:
struct {
BatchMode batch_mode;
OutputMode output_mode;
uint64 max_batch_size;
uint64 item_num;
} DataInfoResult;
batch_mode: The BatchMode selected by responder.output_mode: The OutputMode selected by responder.max_batch_size: The maximum size of a EcdhPsiBatch message, which SHOULD NOT be larger than the max_batch_size given by the requester.item_num: The size of responder's dataset.Data Exchange
The data exchange phase consists of two rounds.
In each round, a participant may send or receive multiple EcdhPsiBatch messages.
Upon receiving a EcdhPsiBatch and handling the data properly, the participant SHOULD respond with a EcdhPsiBatchResponse message,
indicating its partner that it is ready to receive the next EcdhPsiBatch message.
EcdhPsiBatch
The structure of EcdhPsiBatch is defined as follows:
enum { transmit(0), retransmit(1), fatal_error(2), (255)} BatchStatus
struct {
uint64 index;
opaque octet;
} ECPointOctet;
struct {
BatchStatus status;
uint32 batch_type;
uint64 batch_index;
uint64 batch_count;
uint32 is_last_batch;
ECPointOctet data<1..2^64-1>;
} EcdhPsiBatch;
status: Status of current batch message. Values of this field are explained as follows:
transmit: It is a normal batch message which has not been transmitted before.
retransmit: This message has been transmitted, but it is re-transmitted due to the receiver's request.
A batch message may be retransmitted several times before it is properly handled by the receiver.
For example, the receiver may need a relative long time to prepare the storage resource, and require the sender to re-send the batch until it has enough space.
fatal_error: The sender of this batch message cannot continue the ECDH-PSI protocol procedure due to some fatal errors.
The receiver MUST ignore the other fields included in the current EcdhPsiBatch message and terminate the PSI process.
Such fatal error may occur in the handshake phase or data exchange phase.
For example, if the responder sends a HandshakeRespond message with parameters unsupported by the requester, the requester should construct and send a EcdhPsiBatch message carrying fatal_error status so as to terminate the process.
batch_type: This field indicates the mask status of current batch.
That is, value "1" stands for that the data included in the current batch are only masked by the owner's private key, and "2" means the data has been doubly-masked with both participants' private keys.
batch_index: The index of current EcdhPsiBatch message, which is an incremental counter starting with "1".
The participant SHOULD use independent counters for different rounds and protocol runs.
For example, if participant A needs to send dataset p_A in the first round by 10 batches, and sends p_AB in the second round by 5 batches, it should use batch_index from 1 to 10 for the first round, and 1 to 5 for the second.
batch_count: Number of EC points included in the current EcdhPsiBatch message.
is_last_batch: Indicating whether this batch is the last one, where "1" means the batch is the last one from the sender in current round, and "0" for there exists subsequent batches.
data:
This field carries multiple EC points with ECPointOctet, which number is specified by the batch_count field.
Each ECPointOctet structure includes an unique index allocated by the owner of the corresponding original data, and the encoded octet string.
The purpose of index is to associate the original data item with the (doubly)-masked EC points, such that the participant can match its dataset with the intersection of masked datasets.
The values of index are generated by the owner of data so as to identify each data item uniquely.
When masking the data from its partner, a participant MUST reserve the index for every record.
gives more discussions on the implementation and maintenance of such indexes.
The content of each octet string is determined by the selected hash_to_curve suite, octet format and truncation option.
In the first round, the receiver SHOULD decode the octet strings with the configurations negotiated in the handshake phase in order to mask the data provided by the sender.
However, in the second round where masking is not required, the receiver SHOULD treat the contents as octet strings rather than decode them to EC points, as such strings are enough for computing the intersection.
EcdhPsiBatchResponse
The structure of EcdhPsiBatchResponse is defined as follows:
enum {
success (0),
retransmit (1),
fatal_error (2),
(255)
} BatchResponseStatus;
struct {
BatchResponseStatus status;
uint64 batch_index;
} EcdhPsiBatchResponse;
status: Indicating whether the EcdhPsiBatch message identified by batch_index has been handled properly.
Values of this field are explained as follows:
success: The EcdhPsiBatch message has been handled in a proper way.
Upon receiving a response with this status, the sender of EcdhPsiBatch can send the subsequence batch
or prepare to receive the EcdhPsiBatch message from its partner.retransmit: Upon receiving retransmit, the participant MUST re-send the EcdhPsiBatch message identified by batch_index.fatal_error: The ECDH-PSI process meets some fatal error such that both the participants MUST terminate the current session, and discard all data received.
Implementation ConsiderationsThe Representation of EC Point
When deciding the point encoding format for the elliptic curves expect curve25519, the participants should consider the aspects of bandwidth, storage and computation comprehensively.
Compressed point format could decrease the costs of transmission and storage,
but it also needs more computation resources to "decompress" the point.
The "decompress" process may be as costly as scalar multiplication .
Uncompressed format requires more storage spaces and needs more time to transmit,
but the participant can perform scalar multiplications without extra effort to recover the points.
For example, when both parties are deployed in the same data center and linked with a high-bandwidth local area network (LAN), they can choose to use the uncompressed format to achieve better performance.
As another example, the parties may be deployed in geographically separated data centers connected with low bandwidth wide area network (WAN), and equipped with high-end computation acceleration devices such as Graphics Processing Units (GPUs). In this case, the computation resource may not be the bottleneck, and the participants can choose the compress format to minimize the costs of data transmission.
The Management of Index
A RECOMMENDED method of maintaining indexes is storing the records with a database table and use the row numbers or prime keys as indexes.
The intersection result can be matched with original data items by a simple join operation.
The participant could also design a different indexing mechanism on its own, as long as guaranteeing its uniqueness.
This document uses explicit indexes to identify data items rather than treats the order of records as "implicit" indexes.
Compare with the implicit counterpart, explicit indexes can be efficiently created and maintained by modern databases, and do not require the participants to implement extra logic to preserve the order of records.
When masking the EC points from its partner, a participant MUST send the doubly-masked data with correct indexes.
To achieve this, it keeps the indexes of each data items received in the first round, masks the records with its private key, and associates the masked records with the same indexes.
These operations can also be implemented easily with database operations by viewing the received records and masked records as columns in the same table, which enables them to be linked naturally via the primary key.
Support More hash_to_curve Suites
Participants can negotiate hash_to_curve suites besides the ones listed by .
They can use other suites defined by , or use a self-defined suite following the syntax and convention given by Section 8.9 and 8.10 of .
As an example, this document next defines a hash_to_curve suite for the ShangMi(SM) curve and cryptography algorithms.
The SM algorithms are becoming mandatory in China, and have been accepted by international standards such as , and .
The new suite, denoted by curveSM2_XMD:SM3_SSWU_RO, encodes data to points on the so-called curveSM2 with SM3 hash algorithm, and reuses the expand_message_xmd and Simplified Shallue-van de Woestijne-Ulas (SSWU) methods for message expansion and mapping.
In particular, curveSM2_XMD:SM3_SSWU_RO_ is defined as follows:

encoding type: hash_to_curve

E: y^2 = x^3 + A * x + B, where

A = -3

B = 0x28e9fa9e9d9f5e344d5a9e4bcf6509a7f39789f515ab8f92ddbcbd414d940e93

p: 2^256-2^224-2^96+2^64-1

m: 1

k: 128

expand_message: expand_message_xmd (Section 5.3.1 of )

H: SM3

L: 48

f: SSWU method (Section 6.6.2 of of )

Z: -9

h_eff: 1

The value Z is calculated by the sage script given by Appendix H.2 of with the p, A and B parameters of curveSM2.
The participants SHOULD agree on the meaning of HashToCurveSuite values.
In this document, suite curveSM2_XMD:SM3_SSWU_RO_ is identified as follows:
enum {
null (0),
... // (1) to (4) follow section 3.2.1.1
curveSM2_XMD:SM3_SSWU_RO_ (5),
... // other hash_to_curve suites
(255)
} HashToCurveSuite;
Security ConsiderationsData Truncation
This section provides a detailed discussion on the truncation mechanism presented in this document.
The truncation, undoubtedly, will raise the probability of message collision.
That is, two different data item may be mapped to the same string after the procedures of masking and truncation by coincidence.
The collision may happen across the datasets owned by different participants, which causes a false-positive case that two different records are matched by PSI, or happen in the same dataset.
The later situation may also lead to a false-positive case.
To be more specific, consider a participant A has two different data items data_X and data_Y which are mapped to the same record mask_data_XY.
Its partner, say B, also has record data_X which is mapped to mask_data_XY.
Finally, A outputs both data_X and data_Y as the result of PSI, as it cannot distinguish which one matches the record of mask_data_XY.
To avoid such false-positive case, we have to consider the collision probability with respect to the sum number of data items owned by A and B.
Such probability can be computed with the well-known birthday paradox bound.
Let n be the number of sampling and d be the output space, the probability of collision can be approximately computed with:
p(n,d)=1-e^(-n(n-1)/2d)
This document uses a truncated length of 128 bit, which means d=2^128.
For different n, which is the sum size of records, the probability are calculated as follows:
If n=2^50, p(2^50, 2^128) = 2^-29If n=2^45, p(2^45, 2^128) = 2^-33If n=2^41, p(2^41, 2^128) = 2^-47If n=2^40, p(2^40, 2^128) = 2^-49If n=2^39, p(2^39, 2^128) = 2^-51
That is, if the number of records is less than 2^40, the probability of false-positive will be smaller than 2^-48.
The participant can also decide the truncation option by calculating the collision probability, and only use truncation when they both agree that the probability is acceptable.
IANA ConsiderationsThis document has no IANA actions.Public Key Cryptography for the Financial Services Industry: The Elliptic Curve Digital Signature Algorithm (ECDSA)American National Standards InstituteHMAC: Keyed-Hashing for Message AuthenticationThis document describes HMAC, a mechanism for message authentication using cryptographic hash functions. HMAC can be used with any iterative cryptographic hash function, e.g., MD5, SHA-1, in combination with a secret shared key. The cryptographic strength of HMAC depends on the properties of the underlying hash function. This memo provides information for the Internet community. This memo does not specify an Internet standard of any kindHMAC-based Extract-and-Expand Key Derivation Function (HKDF)This document specifies a simple Hashed Message Authentication Code (HMAC)-based key derivation function (HKDF), which can be used as a building block in various protocols and applications. The key derivation function (KDF) is intended to support a wide range of applications and requirements, and is conservative in its use of cryptographic hash functions. This document is not an Internet Standards Track specification; it is published for informational purposes.Hashing to Elliptic CurvesThis document specifies a number of algorithms for encoding or
hashing an arbitrary string to a point on an elliptic curve. This
document is a product of the Crypto Forum Research Group (CFRG) in
the IRTF.Information security technology --- SM3 cryptographic hash algorithmStandardization Administration of ChinaInformation security technology --- Public key cryptographic algorithm SM2 based on elliptic curves --- Part 5: Parameter definitionStandardization Administration of the People's Republic of ChinaDigital Signature Standard (DSS)National Institute of Standards and Technology (NIST)Elliptic Curves for SecurityThis memo specifies two elliptic curves over prime fields that offer a high level of practical security in cryptographic applications, including Transport Layer Security (TLS). These curves are intended to operate at the ~128-bit and ~224-bit security level, respectively, and are generated deterministically based on a list of required properties.Key words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.Ambiguity of Uppercase vs Lowercase in RFC 2119 Key WordsRFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.A More Efficient Cryptographic Matchmaking Protocol for Use in the Absence of a Continuously Available Third PartyNaval Research Laboratory
1986 IEEE Symposium on Security and Privacy
Introduction to Modern Cryptography, Third EditionUniversity of MarylandBar-Ilan UniversityIndifferentiability, Impossibility Results on Reductions, and Applications to the Random Oracle MethodologyETH ZurichETH ZurichETH ZurichIn TCC 2004: Theory of Cryptography, pages 21-39Imperfect Forward Secrecy: How Diffie-Hellman Fails in PracticeUniversity of MichiganINRIA Paris-RocquencourtUniversity of MichiganINRIA Nancy-Grand Est, CNRS, and Université de LorraineJohns HopkinsUniversity of MichiganUniversity of PennsylvaniaUniversity of MichiganINRIA Nancy-Grand Est, CNRS, and Université de LorraineUniversity of PennsylvaniaUniversity of MichiganUniversity of MichiganMicrosoft ResearchUniversity of Pennsylvania
Proceedings of the 22nd {ACM} {SIGSAC} Conference on Computer and
Communications Security, Denver, CO, USA, October 12-16, 2015
Efficient Batched Oblivious {PRF} with Applications to Private Set IntersectionBell LabsMITOregon State UniversityOregon State University
Proceedings of the 2016 {ACM} {SIGSAC} Conference on Computer and
Communications Security, Vienna, Austria, October 24-28, 2016
Labeled PSI from Fully Homomorphic Encryption with Malicious SecurityMicrosoft ResearchÉcole Polytechnique Fédérale de LausanneMicrosoft ResearchOregon State University
Proceedings of the 2018 {ACM} {SIGSAC} Conference on Computer and
Communications Security, {CCS} 2018, Toronto, ON, Canada, October
15-19, 2018
Password Monitor: Safeguarding passwords in Microsoft EdgeMicrosoftMicrosoftMicrosoftMicrosoftBlazing Fast PSI from Improved OKVS and Subfield VOLEVisa ResearchVisa Research
Proceedings of the 2022 {ACM} {SIGSAC} Conference on Computer and
Communications Security, {CCS} 2022, Los Angeles, CA, USA, November
7-11, 2022
Private Set Operations from Multi-query Reverse Private Membership TestShandong UniversityShandong UniversitySKLOIS, IIE, Chinese Academy of SciencesShandong UniversityAlibaba Group
Public-Key Cryptography - {PKC} 2024 - 27th {IACR} International Conference on Practice and Theory of Public-Key Cryptography, 2024, Proceedings, Part {III}
IT Security techniques -- Digital signatures with appendix -- Part 3: Discrete logarithm based mechanismsInternational Organization for StandardizationIT Security techniques -- Hash-functions -- Part 3: Dedicated hash-functionsInternational Organization for StandardizationCurve25519 and Curve448 for the Internet Key Exchange Protocol Version 2 (IKEv2) Key AgreementThis document describes the use of Curve25519 and Curve448 for ephemeral key exchange in the Internet Key Exchange Protocol Version 2 (IKEv2).The Transport Layer Security (TLS) Protocol Version 1.3This document specifies version 1.3 of the Transport Layer Security (TLS) protocol. TLS allows client/server applications to communicate over the Internet in a way that is designed to prevent eavesdropping, tampering, and message forgery.This document updates RFCs 5705 and 6066, and obsoletes RFCs 5077, 5246, and 6961. This document also specifies new requirements for TLS 1.2 implementations.ShangMi (SM) Cipher Suites for TLS 1.3This document specifies how to use the ShangMi (SM) cryptographic algorithms with Transport Layer Security (TLS) protocol version 1.3.The use of these algorithms with TLS 1.3 is not endorsed by the IETF. The SM algorithms are becoming mandatory in China, so this document provides a description of how to use the SM algorithms with TLS 1.3 and specifies a profile of TLS 1.3 so that implementers can produce interworking implementations.Oblivious Pseudorandom Functions (OPRFs) Using Prime-Order GroupsAn Oblivious Pseudorandom Function (OPRF) is a two-party protocol between a client and a server for computing the output of a Pseudorandom Function (PRF). The server provides the PRF private key, and the client provides the PRF input. At the end of the protocol, the client learns the PRF output without learning anything about the PRF private key, and the server learns neither the PRF input nor output. An OPRF can also satisfy a notion of 'verifiability', called a VOPRF. A VOPRF ensures clients can verify that the server used a specific private key during the execution of the protocol. A VOPRF can also be partially oblivious, called a POPRF. A POPRF allows clients and servers to provide public input to the PRF computation. This document specifies an OPRF, VOPRF, and POPRF instantiated within standard prime-order groups, including elliptic curves. This document is a product of the Crypto Forum Research Group (CFRG) in the IRTF.SPAKE2, a Password-Authenticated Key ExchangeThis document describes SPAKE2, which is a protocol for two parties that share a password to derive a strong shared key without disclosing the password. This method is compatible with any group, is computationally efficient, and has a security proof. This document predated the Crypto Forum Research Group (CFRG) password-authenticated key exchange (PAKE) competition, and it was not selected; however, given existing use of variants in Kerberos and other applications, it was felt that publication was beneficial. Applications that need a symmetric PAKE, but are unable to hash onto an elliptic curve at execution time, can use SPAKE2. This document is a product of the Crypto Forum Research Group in the Internet Research Task Force (IRTF).