2012-07-27 SIDR Interim Meeting

1 Links to these notes in various forms:
2 Introductions
3 Deployment considerations - Time Bruijnzeels
- 3.1 Questions
- 3.2 Diagram from chalk board during discussion
4 Measuring RPKI Repositories - Randy Bush
- 4.1 Questions
5 RPKI propagation emulation Measurement an early report
- 5.1 Questions
6 RPKI Repository Fetch Protocols - Rob Austine
- 6.1 Questions?
7 Deployment Considerations in RPKI - Tim Bruijnzeels
8 Document advancement
9 BGPSEC protocol document
10 MultiPath Drafts
11 Stable Signature
12 Route Leaks and Brian Dickerson
13 Repository system requirements

1 Links to these notes in various forms:


org mode:	http://hardakers.net/temp/2012-07-27-interim.org
txt:	http://hardakers.net/temp/2012-07-27-interim.txt
html:	http://hardakers.net/temp/2012-07-27-interim.html

2 Introductions

local resources and agenda overview

3 Deployment considerations - Time Bruijnzeels

rsync is useful
current repository isn't robust enough yet
repositories are flat, no proxying, no CDNs yet
Steve Kent asked a question about changing while syncing, answer tabled for later.
rsync libraries don't really exist. Need more documentation or standardization of the rsync.
In particular, you end up needing to re-auth the entire crypto set which is inefficient when you really just need a list of things that have changed to only re-validate what is needed.
measuring results in the wild using rpki-validator
- on average 15 different instances
- preconfigured with trust anchors and prefetching enabled
- Rob and Randy disagree about whether prefetching is a wise choice (Steve K. asked what prefetching is, and it was tabled until a later slide)
- retrieval failures noted over IPv6 (slide 10)
  - Sandy: why is apnic IPv6 marked as N/A
  - Insert jokes about v6 deployment here
  - Randy proxies for someone that questions whether the v6 issue isn't on the client side and the client may not be able to reach any of the repos
- Prefetching: (slide 11)
  - takes the URI, chops off the end and then prefetches a larger number of things.
  - Tim gives a bit of description of his architecture
    - Two CAs, one is for the offline TAs and the other is for online work.
    - production CA and member CA are at the same level in the repo
    - So the code figures out what the base directory is of the production CA is so you can get the production CA and member CAs.
    - Rob A notes that if (and only if) you understand that the repo you're using is structured this way, it may make sense.
    - Thus it lets you have a local copy of everything first, and then speeds up the verification process.
    - Rob: one thing to note is that the current hack tim is describing only works with an RIR with your own TA. It only works when you're the root. If IANA was the TA, then you'd still need to do the tons of fetching.
    - Randy: the question is: how does this effect measurement results???
    - Tim: we do a prefetch of a known rsync, and we think that the time would be similar if the repositories are hierarchical in the future.
    - Rob: the important point is that our measurements are in disagreement because he's prefetching and [Rob's tool] is not.
    - Tim: Rob needs to do 1000+ fetches to complete validation
    - Rob: and if Tim needed to use IANA as the TA the prefetching will no longer work. It only works for TAs. APnic and RipeNCC are the big ones.
    - Tim: agrees it's a hack, but you could still prefetch the big ones (Rob: if you have magic knowledge of those; Tim: yes, but that's ok).
    - Steve: rsync only fetches things that have changed
    - Warren: what magic is needed?
    - Rob: "the following repositories are badly organized, please use this magic URI instead to help".
    - Wes: it's an implementation hack, and he could do that for his implementation
    - Randy: yes, but it still affects deployment measurements
  - big differences between clients and between runs results from different machine types running the tool
  - slide 12+ shows timing seen in graphs
    - most graphs have a high-short-time spike followed by a long tail out in the 100+ second range
    - the pre-fetch hack helps it validate within a reasonable amount of time
  - problems seen with a CRL for the new MFT wasn't published until later.
- Conclusions
  - big differences between clients (CPUs, etc)
  - prefetching is done outside of the standard
- hardware description
- rsyncd performance testing
  - full recursive fetch of just data (no validation)
  - client had all the data, on a local network: just rsync overhead
  - There is a boundary for clients/sec that can be handled that is constrained by the CPU.
  - There is a boundary for the number of concurrent clients that is constrained by the memory available
  - smallest test is 70k objects, and the real world isn't near that yet. yet. In the long term when we have 100k-400k ROAs then these numbers will become important.
  - Steve K: server was mac mini, how much memory?
    - Tim: 2Gb
    - Steve: we should try this again with a real server
    - Tim: I think the bigger issue is the CPU size, and there are much bigger CPU sizes out there of course
  - Rob: crank down the number of simultaneous connections may help
  - Steve K: we originally thought of using TLS with a certificate that was only available via RPKI so that you could limit who could actually connect
  - Tim: My bottom line is that I think the system would be better if it could take much higher loads.
- slide 26: latency has a huge impact
  - Randy: so what you've traded is the CPU/memory cost for network cost?
  - Tim: Yes
  - we're pushing work from the repo server to the clients because I think it will scale better
- slide 27:
  - It's fairly easy for a large repository to get DOS it with rsync. http risk is lower. (but the impact is the same)
  - There are a lot more available mitigations over http, however because the industry already has some available.
  - possible fixes with rsync are limited
- slide 28: rsync vs http
  - major rsync benefit: has support for built in delta
  - major http benefit: widely deployed and many implementations
  - negative rsyncs: building deltas can be expensive on the server
    - buying more CPUs help but I don't know if it's a battle you can win
  - negative http: without deltas it'll be slow
- next steps:
  - may need to turn off recursive fetches when repository gets large
  - deltas: need to make http work, or rsync with recursion disabled
  - updates to RP in minutes not hours

3.1 Questions

Steve K: if you use http with manifest, then you'll need to redo the manifest RFC because the RFC really is designed around rsync not http and you'd want to reuse it.
- True, but we shouldn't dive into that now
Sandy M: on slide 27, you don't include bandwidth required
- every relying party needs to pull down everything
- the impact on them will be the same, since every RP needs to pull down everything no matter what their size is
Rob A: current validation for me is small (50 seconds-ish), but the universe is small

3.2 Diagram from chalk board during discussion

4 Measuring RPKI Repositories - Randy Bush

flat fetches are the same family and will act about the same
where are we today?
this talk is about the problems, not the things going well
I care a lot about the performance and availability as an RP
The majority of what we get now is failures
Using Rob's RP software
9: sync time on RIR (lacnic) varies a lot, not sure why
- black line is object
apnic: no monitoring, no NOC, …
ripe did fairly well, but had some serious problems with rsync at points
- cause: NFS
Issues/Conclusions:
- current publishers (RIRs) aren't good operators
  - don't work weekends
  - don't do monitoring
  - …
- need repository structures to be fixed (add hierarchy)

4.1 Questions

Russ H: slide 20, why does it look like that?
- Randy: the nice ones at the bottom are the ones that have a flat structure
- Rob: these tests are being done on [well connected] machines in seattle. Latency aren't the problem. Some of the effects on the high end are connectivity problems (lacnic or afrinic), but most aren't.
Rob: there are three reasons why repo structures aren't fixed:
1. haven't flipped the switch yet
2. conscious (political) reasons not to do
3. [editor missed it]
Randy: we are happy that the RIRs have deployed
Ruediger: Do we have statistics for activity for current servers and usage?
- Randy adds: I want to know the condition of many servers. We don't have a way of determining if a service is up and it's status. There is no simple test for is this service up.
Sandy: has anyone ever looked at problems with the IRR servers?
- (no-ish)

5 RPKI propagation emulation Measurement an early report

there are serious issues with some of the measurements, but we'll be making more-better results soon
[slides being read word-for-word, not entirely reiterated here]
Sandy: slide 8: why are there two rows labeled tier-3?
- Randy: because you don't have a wide enough projector [IE, line 3 is wrapped]
Test bed used based on Starbed:
- a cluster in japan with 1000 KVMs
- each with 12 CPUs with 8Gb.
- we run 15 virtual machines on each one, using 50 machines or so.
17: when we want to induce delay, we just add a router in the middle
wanted to create the current routing output and then create a certificate structure with it
- multiple prefixes belong to the same entity. You can't know that.
- no way to aggregate the common areas
- thus no way to go bottom-up
- Randy promises a dinner for anyone that can do it.
21: initially after creating the autonetkit diagram, the initial machine starts it all to create all the certs and CAs before pushing it to the full simulator.
- Steve K: when we tried to do measurements we did a top down approach.
21: ROAs get created and distributed as it runs
- Any CRLs?
  - No
- We don't create CAs, we just create more ROAs for the same prefixes with new AS numbers.
22: the black dots on the right hand run are inducing delay, so we can measure how delay affects
The graphs you're about to see are complete lies.
- the ROA publish time is every 10 minutes, so that even though the gather's are twice an hour, the ROAs are only 10 minutes old that get pulled
delay to the top level gatherers does not look significant
25: don't know why the blue line is to the right of the black, it shouldn't be.
26: looks about like what we'd expect: 1800, or half an hour.
Sandy:
- how was flat vs hierarchial done?
- Rob; both types can be generated based on a flag
28: blue line to the right again
- could be an inter-router delay in Dallas that we didn't think would matter, but maybe it does

5.1 Questions

Sandy: what are your conclusions?
- Randy: we must do hierarchy, and hammer on the RIRs to do things right.
- bittorrent will likely be helpful.
- Rob A: there are too many rsyncs to many small little directories. 10k little files being pushed over anything is a problem.
- Randy: the real problem is that we have 50k CAs [eventually]
- Is there a radically different protocol?
- discussion about whether it should be a protocol covered by an RFC
- There is only one implementation of rsync, there are at least a few of bittorrent.
- Randy: we can write specs, we can write code but we have a architectural problem.
- Randy: we have deployment issues
  1. RIR, ARIN in particular, is problems
  2. docs aren't getting pushed out the door
  3. we don't specify at the ops meetings that we're looking for next steps and solutions and we're pushing FUD.
- Randy: We need to tell them "this is valid stuff" but "we're researchers so we're still going to work on it further, but don't take that as a bad [unstable] thing"
  - we need a deployment group of people/comm?
Sandy: what's the next step? we have certificates being published, and we have numbers that have a lot of people publishing but we have no reports of people actually doing things with the data.
- Ruediger: I'm trying to figure out what the next report should be
- Randy: there is one ISP that has said that people who have walked away with some of their address space may be surprised at the end of the year.
Randy: This presentation shows that things are actually pretty good. We've shown that the latency doesn't matter too much, and that things are working. But we want to make sure it'll scale forever, but for the forseeable future it looks good so get out and deploy it.
from webex: does posix_fadvise() help?
- Rob: don't know; will need to check
Randy: the results I see is that the real propagation problem is the RIRs and we need to hammer on them to make this work. The technology pieces are not the slow part.
Russ H: my view is that this is normal IETF process, and the sooner we get things done the more people will implement it
- Rob: There is no way we got this right the first time, and there is no way this won't change as time goes on. Get over it.
What can we do within the IETF (IE, within the WG)
- Ruediger: I'm happy to see that there are so many documents that describe operational concerns. But we don't have documents describing the concerns on the repo side. What concerns do I need to know that my customers may have.

6 RPKI Repository Fetch Protocols - Rob Austine

All of our future research is really about v2; we should deploy now.
really need more than a database protocol instead of rsync
- which has issues with filesystems, and changing, and …
- problems with rsync and error codes and dealing with them
- quote from the rsync author: "it would be difficult to write a spec for it, because it just kind of does stuff"
- starting a new rsync process is expensive
- Russ: is this client or server problems:
  - Rob: both. Clients have issues too with trying to compare everything on disk.
TTL doesn't exist
- Steve K: can't we express that in the manifest?
- Rob: will think about it
it turns out the hierarchy we really care about is actually the certificate hierarchy. If I could fetch the certificate hierarchy more exactly, I would care less about the URI hierarchy.
it might be worth keeping the rsync uri even if we don't use rsync, as we need unique uri's for all objects
rob discusses what DNS is
- Rob and Warren walk down a hypothetical other solution and will discuss things later
Data freshness: "is the stuff the RP is looking at close enough to what the server things it's put out, or is there significant skew between what the server has published and what the client has got?"
Two approaches, in my mind;
1. DNS transfer like protocols (AFXR, IXFR)
2. use ATOM (eg, RSS) to publish new stuff
DNS transfer like:
- more work for the CA
- must keep versions back in order to support the incremental transfer (IXFR)
ATOM-like approach:
- still need to ask "what is current?"
- ATOM vs RSS: ATOM is IETF-document and has some other cool features.

6.1 Questions?

steve b: how much data are we talking about in total?
- Rob: something on the order of 1M objects
- Sandy: at least that because 400k routes, EE certs, ROAs, …
- Rob: say 1M to 10M at most. 6 figures.
- Chris M notes on jabber:
  - you can do, I think binary-stuff in rss. see itunes for example… but 'go for ietf std' seems fine.
  - look at the notes of the last interim in reston for sizeing data/guesses.
  - we worked this out previously.
- Steve B: Journal files seem to be the best. We really just want to know what is new.
- Rob A: functionally ATOM is like a journal, it's a list of what has changed.
Warren: one of the concerns with bittorrent is that people assume that what's in bittorrent is evil.
Randy's concern is narrowing the attack avenue toward the new service I'm standing up

7 Deployment Considerations in RPKI - Tim Bruijnzeels

rsync gives us deltas
Use an update notification file?
- an update notification file shows everything that has changed since X
- The server is not involved at all and the client figures out for themselves what they need.
- could continue using rsync as an identifier, as Rob suggested earlier
- A publication server can recreate messages to create delta files
roughly 10 times faster than rsync
- Doug M: is this an unchanging directory in rsync?
- Tim: This is for fetching a specific file
- Sandy: is the file changing to tickle rsync?
- Tim: rsync already has a copy, and the http pull is pulling it everytime. [So the discrepancy in the graph is likely even worse for rsync]
Tim: this really shows apache is better at handling loads than rsync
Because http is flexible, it might be good to create headers
- but http is also untrusted
fetch objects by hash instead of by name would be helpful
- because the manifest contains hashes, you'll get the right certificates for that manifest rather than a potentially different set of certificates that don't perfectly match
- Sandy: does that make it critical for when the manifest is published?
- Tim: ideally you want to do this as an atomic operation. You want to publish the objects first and then publish the manifest.
Steve K: I don't understand when I can't get a current manifest. Why should I keep old stuff around to reward people that fail to pull the new one?
- Tim: you may be right, this may not be worth the effort.
- Steve: you have to have a way for a RP to do a cold-start.
- Steve: We can't keep stuff forever
- Steve: we really need a requirements list for what the new deployment technology should be
- Tim: I agree
The resources are already verifiable, so likely don't need https
- bittorrent, eg, already redistributes unverified pieces you must verify
key rollovers are going to be a pain
- especially emergency ones
publication/subscription protocols would help

7.1 Questions?

Sandy: how does an RP know which publication servers to subscribe to?
- Tim: that is missing in this architecture
- Rob: ATOM is also a poll based. How fast do you need an update? instant or is 5 minutes good enough?

7.2 Conclusions

reduce the load of the server
clients do the calculation work for what they need
can be delegated to CDNs
can just write to disk once
can give you transactionality

7.3 Discussions

taking publications to a publication point means you can do consistence checks first before publishing them at publication points. Bad guys can't upload junk.

7.4 More conclusions

likely that secure transfer protocols to avoid man-in-the-middle aren't needed because the objects are signed themselves
Rob: not sure if there are native libraries for fetching lots of things, and you don't want syncronous http only. You need to fetch many things at once.
- Wes H: there are clients that have good clients for pulling lots of data
Tim: want to keep a graph in memory that lets you figure out what parts of the tree need to revalidate after data changes. Having good information on what changed would help here.
Sandy: the server load varies a lot among the CA servers because of their sizes. The RPs have to all retrieve everything. This helps the server, right?
- more work is needed to study what works, but we don't have code that supports it yet so we can't test and measure it well yet.
Randy: my worry is the significant increase in smarts. Keeping a graph sounds harder than just doing the crypto.
Doug: can the server say how far back it has to go.
- Rob: it has to do something like full/incremental backups. When you get to a certain point you get to you need to transfer the entire backup again
- Wes H: you'll need two serial numbers, not one. You need one to indicate where the last full store is and then incremental numbers beyond that for retrieving the deltas from the last full.
- Rob: DNS IXFR was designed carefully to make it simple for the client and server, but may require more frequent full transfers

8 Document advancement

Room: how do we advance the documents out? What can we do to help?
- Sandy: the chairs have divided up duties and will try to get out some of the documents this week.
- …
- The chairs need to ensure that all comments have been adequately addressed.
  - Russ H: instead, post the new document and say we believe all comments have been reflected, please speak now if you think something was not addressed.
- Sandy created a chair action page: http://trac.tools.ietf.org/wg/sidr/trac/wiki/ChairActions
- Sandy: the issue tracker hasn't worked well
- We need a WG secretary?
- … lots of discussion about particular documents and where it is and why …

9 BGPSEC protocol document

Matt: draft-04 version published to reflect the items discussed in the june interop meeting
- In particular it has a section itself discussing confederation issue
- Don't think there are any new open technical issues. If there any open issues, they need to be raised at this point.
Target AS isn't on the wire, but is included in the hash. I know what AS number I said I was for that session, but it doesn't go over the wire.
- When the ASN needs to change, when does the other side of the connection know when to change the notion of the other number.
- one hop is easy because you can try both AS numbers, but 2 hops down is impossible
- WesG: it's a well known use case so i don't think you can say it's verboten.
- Sandy: doesn't it let you treat the middle as a pcount=0 with a route server in the middle?
  - complex discussion about whether it would work
- Randy: provider wants to change the AS number without coordinating with the sender
  - this is upstream from that
- the solution is to set the pcount=0 in the inside of the isp changing
  Diagram:

________________________________ / \ +------+ | +------+ +------+ | | cust | --> | | ISP1 | --> | ISP2 | | | AS7 | | | AS1 | pcount=0 | AS3 | | +------+ | +------+ +------+ | \_________________________________/

ISP1 uses original ASN (1) until later. Internally it could be IGP instead.

do we need a new document to describe changing AS scenarios?
- How much do we need to describe and why?
- Warren and WesG will sit down and document it
Matt drawing on the board:
- do we look forward or backward for the special entry

|--------| | origin | | AS2 * | | look ^ | AS3 | | or | | AS4 | v | | AS5 | |--------| _________________________________ / * \ +-----+ / +-----+ +-----+ +-----+ \ +-----+ | AS1 | --> | AS2 | --> | AS3 | --> | AS4 | --->| AS5 | +-----+ \ +-----+ +-----+ +-----+ / +-----+ \_________________________________/ | v set AFLAG

10 MultiPath Drafts

Sandy: Does BGPSEC help with any of the end-path things?
- Ruediger: those documents don't mangle the ASPATH so they shouldn't conflict with BGPSEC, they just add new attributes

11 Stable Signature

Doug M: ECDSA produces different signatures on each call. Multiple updates will look different.
- Matt and Rob: the only things that change are the signature, to avoid duplicates you don't include comparing the signatures in duplicate detection.
- and the SKI would change if keys change too
- Should a sentence be added to clarify this and specifically suggest that you may want to drop duplicates if the path and SKIs all match, regardless of whether the signatures don't match. Potentially in the operational section.
- Steve K and Sandy propose certain wording
- Defer to Matt's wording

12 Route Leaks and Brian Dickerson

Sandy received a question about whether they're going to be passing on the 3 drafts. Suggestion was to give it to grow, then to idr, then to sidr.

13 Repository system requirements

Ruediger: Arguments about what should be done about the current repository.
Steve K: Lets put it on the RIPE (sept) agenda?
Ruediger: I'd rather have it done by then.
Misc: the WG should make the statement that the RPKI is designed to be hierarchical and research shows there are real numbers that show this to be consistent.
How do we make a statement without writing a draft?
- russ: you can't write a consensus statement without writing a draft
- WesG: you could specify a bunch of specs they should meet and not care how they meet it, whether it's via more hardware or repository optimization.
- Randy: origin-ops with one new section, which is near end-game?
  - room seems fine with this choice
- Tim: if you specify requirements that can't be met, then RIRs may state they won't do them.
Going to hierarchical at least saves us time, but it doesn't change the fact we'll need a v2 in the future.
What does it take to convert to hierarchical?
- it's about equal to a rekey because all the URIs have to change