Character Set and Language Encoding for Hypertext Transfer Protocol (HTTP) Header Field Parameters

Glossary

The identifier is a name for the issue (and is unique within this document).

The type of issue is one of:

The status of the issue is one of:

The reference is an indication of where the issue was first raised.

The description is a succinct overview of the issue.

The resolution describes the specification change that resolves the issue.

Open Issues

Identifier Type / Status Reference and Description Proposed Resolution / Latest Change

Closed/Editor Issues

Identifier Type / Status Reference and Description Resolution / Latest Change
attrcharvstoken
change
closed
julian.reschke@greenbytes.de, 2010-02-04: For some reason, attr-char fails to be token - 2231specials; it includes ":", but fails to include a few other characters from token. (reported by Benjamin Carlyle) in revision 09:
Revise attr-char so it really is token \ ( "*" / "%" / "'" )
auth48
edit
closed
julian.reschke@greenbytes.de, 2010-08-10: Umbrella issue for changes made during the RFC Editor's AUTH48 period. in revision latest:
badseq
edit
closed
julian.reschke@greenbytes.de, 2009-10-06: Mention recipient handling for broken encoded sequences. in revision 05:
Done.
charset
edit
closed
julian.reschke@greenbytes.de, 2009-07-23: Need to revisit "character set" terminology. in revision 03:
Use "character set" consistently.
charset-­registered
change
closed
julian.reschke@greenbytes.de, 2010-02-20: Mention to use only registered charset names? (reported by Alexey Melnikov). in revision 11:
State this in the ABNF.
charsetmatch
change
closed
julian.reschke@greenbytes.de, 2009-10-03: Is the character set name matched case-sensitively? in revision 04:
Be consistent with http://www.iana.org/assignments/character-sets and match case-insensitively.
edit
edit
closed
julian.reschke@greenbytes.de, 2009-04-17: Umbrella issue for editorial fixes/enhancements. in revision 12:
handling-­multiple
change
closed
Reference: <http://www.ietf.org/mail-archive/web/apps-discuss/current/msg01344.html>

roessler@gmail.com, 2010-02-24: Leaving the choice of precedence to the header specification implies that parsers need to special-case. It would seem reasonable to make a choice in this specification that for properties which can only occur once, the traditional syntax takes precedence.

julian.reschke@greenbytes.de, 2010-02-26: That would rule out the use case where the traditional syntax is used as a fallback for clients that do not support the new syntax, as discussed in that section: ... http://greenbytes.de/tech/tc2231/#attfnboth2 is a test case that shows that using this technique, both variants can be served to clients, and those that understand the ext-parameter encoding will indeed pick the "better" parameter. Unfortunately, this appears to depend on parameter ordering, which I didn't want to mention in this spec. Maybe I should?
in revision 11:
Just state that when repetitions are not allowed, the extended form should take precedence.
i18n-­spoofing
change
closed
Reference: <http://www.ietf.org/mail-archive/web/apps-discuss/current/msg01329.html>

GK@ninebynine.org, 2010-02-20: I note that the security considerations section says nothing about possible character "spoofing" - i.e. making a displayed prompt or value appear to be something other than it is. E.g. Non-ASCII characters have been used to set up exploits involving dodgy URIs that may appear to a user to be legitimate.
in revision 11:
Mention the problem, and point to RFC 3629's security considerations which mention this as well. While at it, also mention the other UTF-8 related attack scenario.
impl
edit
closed
julian.reschke@greenbytes.de, 2010-01-16: Report on current implementations. in revision 08:
iso8859
change
closed
julian.reschke@greenbytes.de, 2010-02-20: The protocol could be further simplified by mandating UTF-8 only (reported by Alexey Melnikov). On the other hand and not surprinsingly, testing shows that ISO-8859-1 support is widely implemented. The author is looking for community feedback on this choice. in revision 11:
Further feedback was requested during IETF LC; but none was received. Thus defaulting to no change; keeping the support for ISO-8859-1.
multiple-­inst-­spoofing
change
closed
kivinen@iki.fi, 2010-03-01: Yes, but the impact of them is different. For example it does not really matter if the filename parameters having different languages differ, but there might be parameters where this really matters.
As this document does not define any exact parameters, it might be enough to comment something like that "This document specifies way to transport multiple language variants for parameters, and such use might allow spoofing attacks, where different language versions of the same parameters do not match. Whether this attack is useful as an attack depends on the parameter specified."
in revision 11:
Add text based on the recommendation.
nonorm2231
edit
closed
julian.reschke@greenbytes.de, 2010-04-23: It's not totally clear that the mentions of RFC 2231 really are all informative. in revision 12:
Clarify title of the spec, plus text talking about RFC 2231. Avoid saying "profile" in general.
parameter-­abnf
change
closed
julian.reschke@greenbytes.de, 2010-02-20: The ABNF for reg-parameter and ext-parameter is ambiguous, as "*" is a valid token character; furthermore, RFC 2616's "attribute" production allows "*" while RFC 2231's does not. (reported by Alexey Melnikov).

julian.reschke@greenbytes.de, 2010-02-21: Proposal: restrict the allowable character set in parameter names to exclude "*" (and maybe even more non-name characters?). Also, consider extending the set of value characters (for the right hand side) to allow more characters that can be unambiguously parsed outside quoted strings, such as "/".
in revision 11:
Introduced parmname, disallowing "*" / "'" / "%". Moving the value ABNF discussion into a separate issue ("value-abnf").
rel-­2388
edit
closed
julian.reschke@greenbytes.de, 2010-01-07: Note the non-applicability to the use of RFC 2231 encoding in multipart/form-data. in revision 08:
Done.
repeated-­param
change
closed
Chris.Newman@Sun.COM, 2010-03-22: RFC 2231 did not allow two parameters with the same name but different languages, at least in the context of continuations that was impossible. Absent continuations, RFC 2231 was otherwise silent on that topic.
So section 4.3 adds a new feature over and above what RFC 2231 did. It's a feature that will make implementations significantly more complex and is likely to cause interoperability problems.
Much of the experience with deployment of both language tagging and language variants in the IETF seems to result in unnecessary complexity. While there are good abstract arguments for language tagging in theory, it seems more often than not that the parties in the exchange are unable to put anything useful in the field in which case it falls into the realm of unnecessary complexity. In addition, we have experience where we attempted to allow language variants (multipart/alternative) and not only did that usage not deploy, it is actively broken despite being an explicit example in RFC 1766.
The one place where I've seen language variants mostly work is when the language tag is actually included in the attribute name (LDAP does this) and the "search" mechanism allows wildcarding of languages. But having two attributes with the same name seems dangerous.
My recommendation is to remove this feature as I believe it will not be used in practice and will add unnecessary complexity that is likely to create interoperability problems.
in revision 11:
State the issue. Remove section 4.3. Rephrase 4.2 accordingly.
repeats
edit
closed
julian.reschke@greenbytes.de, 2009-08-20: Talk about parameters that are repeated for the sake of I18N. in revision 03:
Discuss this use case and add an example.
rfc2978-­normative
change
closed
julian.reschke@greenbytes.de, 2010-02-20: The reference to RFC2978 needs to be normative (reported by Alexey Melnikov). in revision 10:
Done.
rfc3986-­normative
change
closed
julian.reschke@greenbytes.de, 2010-02-20: The reference to percent-encoding (RFC3986) needs to be normative (reported by Alexey Melnikov). in revision 10:
Done.
rfc4646
edit
closed
julian.reschke@greenbytes.de, 2009-10-03: Update RFC 4646 reference. in revision 03:
Done.
tokengrammar
edit
closed
julian.reschke@greenbytes.de, 2010-02-04: Benjamin Carlyle noticed (off-list) that token in RFC 2231 / RFC 2045 allows "{" and "}", while HTTP does not. Minimally, we need to point out the difference. in revision 09:
Add a note pointing out (and explaining) the difference.
tokenquotcharset
change
closed
Reference: <http://lists.w3.org/Archives/Public/ietf-http-wg/2009OctDec/0013.html>

julian.reschke@greenbytes.de, 2009-10-05: token can contain single quotes. ABNF is ambiguous for charset in ext-value.

duerst@it.aoyama.ac.jp, 2009-10-06: (points out that Section 2.3 of RFC 2978 doesn't allow the single quote) - <http://lists.w3.org/Archives/Public/ietf-http-wg/2009OctDec/0014.html>.
in revision 05:
usascii-­normative
change
closed
julian.reschke@greenbytes.de, 2010-02-20: The reference to USASCII needs to be normative. in revision 10:
Done.
value-­abnf
change
closed
julian.reschke@greenbytes.de, 2010-02-26: Consider extending the right-hand side ABNF - both for regular and extended parameters - to include more characters that can be unambiguously parsed outside quoted strings, such as "/". in revision 11:
No change due to lack of feedback. Potentially defer to future versions of HTTP/1.1 (defining guidelines for header definitions), or a revision of this spec.
when-­ext-­value
change
closed
julian.reschke@greenbytes.de, 2010-02-18: There's no point in using ext-value when the language is unknown and no "special" characters are present. in revision 11:
Fixed.

Progress

Version Issues
latest |||||||||||||||||||||||||
12 ||||||||||||||||||||||||
11 |||||||||||||||||||||||
10 ||||||||||||||||||
09 |||||||||||
08 |||||||||
07 |||||||
06 |||||||
05 |||||||
04 |||||
03 ||||
02 |
01
00

Last change: 2010-08-10