Re: [EAI] Body parts
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [EAI] Body parts



Claus Färber writes:
Shawn Steele schrieb:
The presumed savings isn't actually very much. Add an attachment, jpg, html encoding or whatnot and the "savings" will be lost in the noise. I concede that it might be interesting for a satellite text message or something where the bandwidth is amazingly poor, however that could be handled by the device's portal/gateway.

If bandwith really is a problem, use compression. I guess the charset does not matter so much with deflated text. (However, that's an educated guess; data would be nice.)

I didn't keep the data, but I did play around and measure things when I wrote RFC 4978, and my experience is that it makes a difference.

Deflate (like almost all other compression algorithms) want to see the atom. If it's compressing graphics, it wants to see the pixels or whatnot. If it's compressing text, it wants to see the characters. In deflate's case, it assumes that a file-format atom has some relation to octets, and works with octets.

If you send big5 with c-t-e 8bit, then deflate will see a series of half-characters, which is less good than seeing a series of characters, but still, there's a plain enough relationship that deflate's assumption works well. If you send the same text as utf-8 with c-t-e 8bit, it sees thirds of characters, which is almost as good as seeing half-characters.

But people don't send c-t-e 8bit. They send a lot of q-p or base64, both of which obfuscate the data. If deflate is told From ima-bounces at ietf.org Sat Jul 12 08:57:48 2008
Return-Path: <ima-bounces at ietf.org>
X-Original-To: ima-archive at megatron.ietf.org
Delivered-To: ietfarch-ima-archive at core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1])
	by core3.amsl.com (Postfix) with ESMTP id CF0943A68E6;
	Sat, 12 Jul 2008 08:57:48 -0700 (PDT)
X-Original-To: ima at core3.amsl.com
Delivered-To: ima at core3.amsl.com
Received: from localhost (localhost [127.0.0.1])
	by core3.amsl.com (Postfix) with ESMTP id 5F8F43A68E6
	for <ima at core3.amsl.com>; Sat, 12 Jul 2008 08:57:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5
	tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32])
	by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id cCUtycQhNsYL for <ima at core3.amsl.com>;
	Sat, 12 Jul 2008 08:57:47 -0700 (PDT)
Received: from kalyani.oryx.com (kalyani.oryx.com [195.30.37.30])
	by core3.amsl.com (Postfix) with ESMTP id 0CFC23A6898
	for <ima at ietf.org>; Sat, 12 Jul 2008 08:57:47 -0700 (PDT)
Received: from kalyani.oryx.com (localhost.oryx.com [127.0.0.1])
	by kalyani.oryx.com (Postfix) with ESMTP id C79AA4AC7B
	for <ima at ietf.org>; Sat, 12 Jul 2008 17:58:00 +0200 (CEST)
Received: from 195.30.37.40 (HELO lochnagar.oryx.com) by kalyani.oryx.com with
	esmtp id 1215878279-53639-2522; Sat, 12 Jul 2008 17:57:59 +0200
Message-Id: <1mUzQrZt2reiIBTq8WTZ/w.md5 at lochnagar.oryx.com>
Date: Sat, 12 Jul 2008 17:55:44 +0200
From: Arnt Gulbrandsen <arnt at oryx.com>
To: ima at ietf.org
References: <mailman.28.1215716403.13551.ima at ietf.org>
	<C9BF0238EED3634BA1866AEF14C7A9E561C848650E at NA-EXMSG-C116.redmond.corp.microsoft.com>
	<g5a6dk$b7q$1 at ger.gmane.org>
In-Reply-To: <g5a6dk$b7q$1 at ger.gmane.org>
Mime-Version: 1.0
Subject: Re: [EAI] Body parts
X-BeenThere: ima at ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ima>,
	<mailto:ima-request at ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ima>
List-Post: <mailto:ima at ietf.org>
List-Help: <mailto:ima-request at ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>,
	<mailto:ima-request at ietf.org?subject=subscribe>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
Sender: ima-bounces at ietf.org
Errors-To: ima-bounces at ietf.org

Claus Färber writes:
Shawn Steele schrieb:
The presumed savings isn't actually very much. Add an attachment, jpg, html encoding or whatnot and the "savings" will be lost in the noise. I concede that it might be interesting for a satellite text message or something where the bandwidth is amazingly poor, however that could be handled by the device's portal/gateway.

If bandwith really is a problem, use compression. I guess the charset does not matter so much with deflated text. (However, that's an educated guess; data would be nice.)

I didn't keep the data, but I did play around and measure things when I wrote RFC 4978, and my experience is that it makes a difference.

Deflate (like almost all other compression algorithms) want to see the atom. If it's compressing graphics, it wants to see the pixels or whatnot. If it's compressing text, it wants to see the characters. In deflate's case, it assumes that a file-format atom has some relation to octets, and works with octets.

If you send big5 with c-t-e 8bit, then deflate will see a series of half-characters, which is less good than seeing a series of characters, but still, there's a plain enough relationship that deflate's assumption works well. If you send the same text as utf-8 with c-t-e 8bit, it sees thirds of characters, which is almost as good as seeing half-characters.

But people don't send c-t-e 8bit. They send a lot of q-p or base64, both of which obfuscate the data. If deflate is told to comprto compress big5 with c-t-e base64, its first input oftet encodes 37.5% of a character, the second 37.5% more and the third the last 25% of the character with and 12.5% of the next character. It goes on like that. A character is encoded in three different ways depending on where it is in the stream. Q-p obfuscates differently, but the effect is comparable.

UTF-8+base64 happens to be better than big5, for the poor reason that its three-byte characters match one four-byte base64 group. There's less obfuscation than in the case of big5.

HOWEVER. I didn't bother doing anything like 4978 for SMTP or Submission, because I think that wouldn't be worth the bother. IMO there isn't any simple change with a large benefit.

Arnt
_______________________________________________
IMA mailing list
IMA at ietf.org
https://www.ietf.org/mailman/listinfo/ima


ess big5 with c-t-e base64, its first input oftet encodes 37.5% of a character, the second 37.5% more and the third the last 25% of the character with and 12.5% of the next character. It goes on like that. A character is encoded in three different ways depending on where it is in the stream. Q-p obfuscates differently, but the effect is comparable.

UTF-8+base64 happens to be better than big5, for the poor reason that its three-byte characters match one four-byte base64 group. There's less obfuscation than in the case of big5.

HOWEVER. I didn't bother doing anything like 4978 for SMTP or Submission, because I think that wouldn't be worth the bother. IMO there isn't any simple change with a large benefit.

Arnt
_______________________________________________
IMA mailing list
IMA at ietf.org
https://www.ietf.org/mailman/listinfo/ima



Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.