X-Git-Url: http://git.vanrenterghem.biz/git.ikiwiki.info.git/blobdiff_plain/2e9fae5c11d9fabf6270de18d0c26bc251750b09..3f4914b95ba239489bae48f154abed9892e0cbf5:/doc/plugins/contrib/field/discussion.mdwn?ds=inline

diff --git a/doc/plugins/contrib/field/discussion.mdwn b/doc/plugins/contrib/field/discussion.mdwn
index dd9342224..6161f80df 100644
--- a/doc/plugins/contrib/field/discussion.mdwn
+++ b/doc/plugins/contrib/field/discussion.mdwn
@@ -239,13 +239,84 @@ smcv's discuission of field author vs meta author above. --[[Joey]]
 >>> set values in pagetemplate which are prefixed with *field_*.  I don't think
 >>> this is quite satisfactory, since that would still mean that people could
 >>> put un-scrubbed values into a pagetemplate, albeit they would be values
->>> named field_foo, etc.
+>>> named field_foo, etc. --[[KathrynAndersen]]
+
+>>>> They can already do similar; `PERMALINK` is pre-sanitized to
+>>>> ensure that it's a "safe" URL, but if an extremely confused wiki admin was
+>>>> to put `COPYRIGHT` in their RSS/Atom feed's `<link>`, a malicious user
+>>>> could put an unsafe (e.g. Javascript) URL in there (`COPYRIGHT` *is*
+>>>> HTML-scrubbed, but "javascript:alert('pwned!')" is just text as far as a
+>>>> HTML sanitizer is concerned, so it passes straight through). The solution
+>>>> is to not use variables in situations where that variable would be
+>>>> inappropriate. Because `field` is so generic, the definition of what's
+>>>> appropriate is difficult. --[[smcv]]
 
 >>> An alternative solution would be to classify field registration as "secure"
 >>> and "insecure".  Sources such as ymlfront would be insecure, sources such
 >>> as concon (or the $config hash) would be secure, since they can't be edited
 >>> as pages.  Then, when doing pagetemplate substitution (but not ftemplate
 >>> substitution) the insecure sources could be HTML-escaped.
+>>> --[[KathrynAndersen]]
+
+>>>> Whether you trust the supplier of data seems orthogonal to whether its value
+>>>> is (meant to be) interpreted as plain text, HTML, a URL or what?
+>>>>
+>>>> Even in cases where you trust the supplier, you need to escape things
+>>>> suitably for the context, not for security but for correctness. The
+>>>> definition of the value, and the context it's being used in, changes the
+>>>> processing you need to do. An incomplete list:
+>>>>
+>>>> * HTML used as HTML needs to be html-scrubbed if and only if untrusted
+>>>> * URLs used as URLs need to be put through `safeurl()` if and only if
+>>>>   untrusted
+>>>> * HTML used as plain text needs tags removed regardless
+>>>> * URLs used as plain text are safe
+>>>> * URLs or plain text used in HTML need HTML-escaping (and URLs also need
+>>>>   `safeurl()` if untrusted)
+>>>> * HTML or plain text used in URLs need URL-escaping (and the resulting
+>>>>   URL might need sanitizing too?)
+>>>>
+>>>> I can't immediately think of other data types we'd be interested in beyond
+>>>> text, HTML and URL, but I'm sure there are plenty.
+
+>>>>> But isn't this a problem with anything that uses pagetemplates?  Or is
+>>>>> the point that, with plugins other than `field`, they all know,
+>>>>> beforehand, the names of all the fields that they are dealing with, and
+>>>>> thus the writer of the plugin knows which treatment each particular field
+>>>>> needs?  For example, that `meta` knows that `title` needs to be
+>>>>> HTML-escaped, and that `baseurl` doesn't.  In that case, yes, I see the problem.
+>>>>> It's a tricky one.  It isn't as if there's only ever going to be a fixed set of fields that need different treatment, either.  Because the site admin is free to add whatever fields they like to the page template (if they aren't using the default one, that is.  I'm not using the default one myself).
+>>>>> Mind you, for trusted sources, since the person writing the page template and the person providing the variable are the same, they themselves would know whether the value will be treated as HTML, plain text, or a URL, and thus could do the needed escaping themselves when writing down the value.
+
+>>>>> Looking at the content of the default `page.tmpl` let's see what variables fall into which categories:
+
+>>>>> * **Used as URL:** BASEURL, EDITURL, PARENTLINKS->URL, RECENTCHANGESURL, HISTORYURL, GETSOURCEURL, PREFSURL, OTHERLANGUAGES->URL, ADDCOMMENTURL, BACKLINKS->URL, MORE_BACKLINKS->URL
+>>>>> * **Used as part of a URL:** FAVICON, LOCAL_CSS
+>>>>> * **Needs to be HTML-escaped:** TITLE
+>>>>> * **Used as-is (as HTML):** FEEDLINKS, RELVCS, META, PERCENTTRANSLATED, SEARCHFORM, COMMENTSLINK, DISCUSSIONLINK, OTHERLANGUAGES->PERCENT, SIDEBAR, CONTENT, COMMENTS, TAGS->LINK, COPYRIGHT, LICENSE, MTIME, EXTRAFOOTER
+
+>>>>> This looks as if only TITLE needs HTML-escaping all the time, and that the URLS all end with "URL" in their name.  Unfortunately the FAVICON and LOCAL_CSS which are part of URLS don't have "URL" in their name, though that's fair enough, since they aren't full URLs.
+
+>>>>>  --K.A.
+
+>>>> One reasonable option would be to declare that `field` takes text-valued
+>>>> fields, in which case either consumers need to escape
+>>>> it with `<TMPL_VAR FIELD_FOO ESCAPE=HTML>`, and not interpret it as a URL
+>>>> without first checking `safeurl`), or the pagetemplate hook needs to
+>>>> pre-escape.
+
+>>>>> Since HTML::Template does have the ability to do ESCAPE=HTML/URL/JS, why not take advantage of that? Some things, like TITLE, probably should have ESCAPE=HTML all the time; that would solve the "to escape or not to escape" problem that `meta` has with titles. After all, when one *sorts* by title, one doesn't really want HTML-escaping in it; only when one uses it in a template. -- K.A.
+
+>>>> Another reasonable option would be to declare that `field` takes raw HTML,
+>>>> in which case consumers need to only use it in contexts that will be
+>>>> HTML-scrubbed (but it becomes unsuitable for using as text - problematic
+>>>> for text-based things like sorting or URLs, and not ideal for searching).
+>>>>
+>>>> You could even let each consumer choose how it's going to use the field,
+>>>> by having the `foo` field generate `TEXT_FOO` and `HTML_FOO` variables?
+>>>> --[[smcv]]
+
+>>>>> Something similar is already done in `template` and `ftemplate` with the `raw_` prefix, which determines whether the variable should have `htmlize` run over it first before the value is applied to the template.  Of course, that isn't scrubbing or escaping, because with those templates, the scrubbing is done afterwards as part of the normal processing.
 
 >>> Another problem, as you point out, is special-case fields, such as a number of
 >>> those defined by `meta`, which have side-effects associated with them, more
@@ -253,3 +324,84 @@ smcv's discuission of field author vs meta author above. --[[Joey]]
 >>> the side-effects, but use `field` as an interface to get the values of those special fields.
 
 >>> --[[KathrynAndersen]]
+
+-----
+
+I think the main point is: what is (or should be) the main point of the
+field plugin? If it's essentially a way to present a consistent
+interface to access page-related structured information, then it makes
+sense to have it very general. Plugins registering with fields would
+then present ways for recovering the structure information from the page
+(`ymlfront`, `meta`, etc),  ways to manipulate it (like `meta` does),
+etc.
+
+In this sense, security should be entirely up to the plugins, although
+the fields plugin could provide some auxiliary infrastructure (like
+determining where the data comes from and raise or lower the security
+level accoringly).
+
+Namespacing is important, and it should be considered at the field
+plugin interface level. A plugin should be able to register as
+responsible for the processing of all data belonging to a given
+namespace, but plugins should be able to set data in any namespace. So
+for example, `meta` register are `meta` fields processing, and whatever
+method is used to set the data (`meta` directive, `ymlfront`, etc) it
+gets a say on what to do with data in its namespace.
+
+What I'm thinking of is something you could call fieldsets. The nice
+thing about them is that, aside from the ones defined by plugins (like
+`meta`), it would be possible to define custom ones (with a generic,
+default processor) in an appropriate file (like smileys and shortcuts)
+with a syntax like:
+
+    [[!fieldset book namespace=book
+       fields="author title isbn"
+       fieldtype="text text text"]]
+
+after which, you coude use
+
+    [[!book author="A. U. Thor"
+            title="Fields of Iki"]]
+
+and the data would be available under the book namespace, and thus
+as BOOK_AUTHOR, BOOK_TITLE etc in templates.
+
+Security, in this sense, would be up to the plugin responsible for the
+namespace processing (the default handler would HTML-escape text fields
+scrub, html fields, safeurl()ify url fields, etc.)
+
+> So, are you saying that getting a field value is sort of a two-stage process?  Get the value from anywhere, and then call the "security processor" for that namespace to "secure" the value?  I think "namespaces" are really orthogonal to this issue.  What the issue seems to be is:
+
+   * what form do we expect the raw field to be in? (text, URL, HTML)
+   * what form do we expect the "secured" output to be in? (raw HTML, scrubbed HTML, escaped HTML, URL)
+
+> Only if we know both these things will we know what sort of security processing needs to be done.
+
+>> Fieldsets are orthogonal to the security issue in the sense that you can use
+>> them without worrying about the field security issue, but they happen to be
+>> a rather clean way of answering those two questions, by allowing you to
+>> attach preprocessing attributes to a field in a way that the user
+>> (supposedly) cannot mingle with.
+
+> There is also a difference between field values that are used inside pagetemplate, and field values which are used as part of a page's content (e.g. with ftemplate).  If you have a TITLE, you want it to be HTML-escaped if you're using it inside pagetemplate, but you don't want it to be HTML-escaped if you're using it inside a page's content.  On the other hand, if you have, say, FEEDLINKS used inside pagetemplate, you don't wish it to be HTML-escaped at all, or your page content will be completely stuffed.
+
+>> Not to talk about the many different ways date-like fields might be need
+>> processing. It has already been proposed to solve this problem by exposing
+>> the field values under different names depending on the kind or amout of
+>> postprocessing they had (e.g. RAW_SOMEFIELD, SOMEFIELD, to which we could add
+>> HTML_SOMEFIELD, URL_SOMEFIELD or whatever). Again, fieldsets offer a simple way
+>> of letting Ikiwiki know what kind of postprocessing should be offered for
+>> that particular field.
+
+> So, somehow, we have to know the meaning of a field before we can use it properly, which kind of goes against the idea of having something generic.
+
+>> We could have a default field type (text, for example), and a way to set a
+>> different field type (which is what my fieldset proposal was about).
+
+> --[[KathrynAndersen]]
+
+-----
+
+I was just looking at HTML5 and wondered if the field plugin should generate the new Microdata tags (as well as the internal structures)? <http://slides.html5rocks.com/#slide19> -- [[Will]]
+
+> This could just as easily be done as a separate plugin.  Feel free to do so. --[[KathrynAndersen]]