X-Git-Url: http://git.vanrenterghem.biz/git.ikiwiki.info.git/blobdiff_plain/bd958f91a2b71761e9aa20fa25a6702dab3b4b8d..3948b422381a0f05225aaf8b973d0cd54f089348:/doc/todo/dependency_types.mdwn diff --git a/doc/todo/dependency_types.mdwn b/doc/todo/dependency_types.mdwn index 6218222f7..56153239f 100644 --- a/doc/todo/dependency_types.mdwn +++ b/doc/todo/dependency_types.mdwn @@ -10,7 +10,7 @@ whenever a matching dependency is added, removed, or *modified*. But a great many things don't care about the modification case, and often cause unnecessary page rebuilds: -* meta only cares if the pages are added or removed. Content change does +* map only cares if the pages are added or removed. Content change does not matter (unless show=title is used). * brokenlinks, orphans, pagecount, ditto (generally) * inline in archive mode cares about page title, author changing, but @@ -105,11 +105,11 @@ unnecessary page rebuilds: I propose the following. --[[Joey]] -* Add a second type of dependency, call it an "contentless dependency". +* Add a second type of dependency, call it an "presence dependency". * `add_depends` defaults to adding a regular ("full") dependency, as before. (So nothing breaks.) -* `add_depends($page, $spec, content => 0)` adds an contentless dependency. -* `refresh` only looks at added/removed pages when resolving contentless +* `add_depends($page, $spec, presence => 0)` adds an presence dependency. +* `refresh` only looks at added/removed pages when resolving presence dependencies. This seems straightforwardly doable. I'd like [[Will]]'s feedback on it, if @@ -128,25 +128,468 @@ I implemented the above in a branch. Then I found some problems: -* pagestats is often used with a pagespec that uses `tagged()`. - A pure contentless dependency does not work for that, it needs to look - at link info. -* orphans and brokenlinks cannot use contentless dependencies because they - need to update when links change. * Something simple like pagecount, that seems like it could use a - contentless dependency, can have a pagespec that uses metadata, like + presence dependency, can have a pagespec that uses metadata, like `author()` or `copyright()`. +* pagestats, orphans and brokenlinks cannot use presence dependencies + because they need to update when links change. -Now I'm thinking about having a contentless dependency look at page +Now I'm thinking about having a special dependency look at page metadata, and fire if the metadata changes. And it seems links should either be included in that, or there should be a way to make a dependency that fires when a page's links change. (And what about backlinks?) It's easy to see when a page's links change, since there is `%oldlinks`. To see when metadata is changed is harder, since it's stored in the -pagestate by the meta plugin. +pagestate by the meta plugin. Also, there are many different types of +metadata, that would need to be matched with the pagespecs somehow. -(Alternative: Make add_depends look at the pagespec. Ie, if it is a simple -page name, or a glob, we know a contentless dependency can be valid. -If's more complex, convert the dependency from contentless to full. Finding -a non-ad-hoc, non-sucky way to do that could be hard.) +Quick alternative: Make add_depends look at the pagespec. Ie, if it +is a simple page name, or a glob, we know a presence dependency +can be valid. If's more complex, convert the dependency from +presence to full. + +There is a lot to dislike about this method. Its parsing of the pagespec, +as currently implemented, does not let plugins add new types of pagespecs +that only care about presence. Its pagespec parsing is also subject to +false negatives (though these should be somewhat rare, and no false +positives). Still, it does work, and it makes things like simple maps and +pagecounts much more efficient. + +---- + +#### Will's first pass feedback. + +If the API is going to be updated, then it would be good to make it forward compatible. +I'd like for the API to be extendible to what is useful for complex pagespecs, even if we +that is a little redundant at the moment. + +My attempt to play with this is in my git repo. [[!template id=gitbranch branch=origin/depends-spec author="[[will]]"]] +That branch is a little out of date, but if you just look at the changes in IkiWiki.pm you'll see the concept I was looking at. +I added an "add_depends_spec()" function that adds a dependency on the pagespec passed to it. If the set of matched pages +changes, then the dependent page is rebuilt. At the moment the implementation uses the same hack used by map and inline - +just add all the pages that currently exist as traditional content dependencies. + +> As I note below, a problem with this approach is that it has to try +> matching the pagespec against every page, redundantly with the work done +> by the plugin. (But there are ways to avoid that redundant matching.) +> --[[Joey]] + +Getting back to commenting on your proposal: + +Just talking about the definition of a "presence dependency" for the moment, and ignoring implementation. Is a +"presence dependency" supposed to cause an update when a page disappears? I assume so. Is a presence dependency +supposed to cause an update when a pages existence hasn't changed, but it no longer matches the pagespec. +(e.g. you use `created_before(test_page)` in a pagespec, and there was a page, `new_page`, that was created +after `test_page`. `new_page` will not match the spec. Now we'll delete and then re-create `test_page`. Now +`new_page` will match the spec, and yet `new_page` itself hasn't changed. Nor has its 'presence' - it was present +before and it is present now. Should this cause a re-build of any page that has a 'presence' dependency on the spec? + +> Yes, a presence dep will trigger when a page is added, or removed. + +> Your example is valid.. but it's also not handled right by normal, +> (content) dependencies, for the same reasons. Still, I think I've +> addressed it with the pagespec influence stuff below. --[[Joey]] + +I think that is another version of the problem you encountered with meta-data. + +In the longer term I was thinking we'd have to introduce a concept of 'internal pagespec dependencies'. Note that I'm +defining 'internal' pagespec dependencies differently to the pagespec dependencies I defined above. Perhaps an example: +If you had a pagespec that was `created_before(test_page)`, then you could list all pages created before `test_page` +with a `map` directive. The map directive would add a pagespec dependency on `created_before(test_page)`. +Internally, there would be a second page-spec parsing function that discovers which pages a given pagespec +depends on. As well as the function `match_created_before()`, we'd have to add a new function `depend_created_before()`. +This new function would return a list of pages, which when any of them change, the output of `match_created_before()` +would change. In this example, it would just return `test_page`. + +These lists of dependent pages could just be concatenated for every `match_...()` function in a pagespec - you can ignore +the boolean formula aspects of the pagespec for this. If a content dependency were added on these pages, then I think +the correct rebuilds would occur. + +In all, this is a surprisingly difficult problem to solve perfectly. Consider the following case: + +PageA.mdwn: + +> [ShavesSelf] + +PageB.mdwn + +> Doesn't shave self. + +ShavedByBob.mdwn: + +> [!include pages="!link(ShavesSelf)"] + +Does ShavedByBob.mdwn include itself? + +(Yeah - in IkiWiki currently links are *not* included by include, but the idea holds. I had a good example a while back, but I can't think of it right now.) + +sigh. + +-- [[Will]] + +> I have also been thinking about some sort of analysis pass over pagespecs +> to determine what metadata, pages, etc they depend on. It is indeed +> tricky to do. More thoughts on influence lists a bit below. --[[Joey]] + +>> The big part of what makes this tricky is that there may be cycles in the +>> dependency graph. This can lead to situations where the result is just not +>> well defined. This is what I was trying to get at above. -- [[Will]] + +>>> Hmm, I'm not seeing cycles be a problem, at least with the current +>>> pagespec terms. --[[Joey]] + +>>>> Oh, they're not with current pagespec terms. But this is really close to extending to handle +>>>> functional pagespecs, etc. And I think I'd like to think about that now. +>>>> +>>>> Having said that, I don't want to hold you up - you seem to be making progress. The best is +>>>> the enemy of the good, etc. etc. +>>>> +>>>> For my part, I'm imagining we have two more constructs in IkiWiki: +>>>> +>>>> * A map directive that actually wikilinks to the pages it links to, and +>>>> * A `match_sharedLink(pageX)` matching function that matches pageY if both pageX and pageY each have links to any same third page, pageZ. +>>>> +>>>> With those two constructs, one page changing might change the set of pages included in a map somewhere, which might then change the set of pages matched by some other pagespec, which might then... +>>>> +>>>> --[[Will]] + +>>>>> I think that should be supported by [[bugs/transitive_dependencies]]. +>>>>> At least in the current implementation, which considers each page +>>>>> that is rendered to be changed, and rebuilds pages that are dependent +>>>>> on it, in a loop. An alternate implementation, which could be faster, +>>>>> is to construct a directed graph and traverse it just once. Sounds +>>>>> like that would probably not support what you want to do. +>>>>> --[[Joey]] + +---- + +### Link dependencies + +* `add_depends($page, $spec, links => 1, presence => 1)` + adds a links + presence dependency. +* Use backlinks change code to detect changes to link dependencies too. +* So, brokenlinks can fire whenever any links in any of the + pages it's tracking change, or when pages are added or + removed. +* To determine if a pagespec is valid to be used with a links dependency, + use the same set that are valid for presence dependencies. But also + allow `backlinks()` to be used in it, since that matches pages + that the page links to, which is just what link dependencies are + triggered on. + +---- + +### the removal problem + +So far I have not addressed fixing the removal problem (which Will +discusses above). + +Summary of problem: A has a dependency on a pagespec such as +"bugs/* and !link(done)". B currently matches. Then B is updated, +in a way that makes A's dependency not match it (ie, it links to done). +Now A is not updated, because ikiwiki does not realize that it +depended on B before. + +This was worked around to fix [[bugs/inline_page_not_updated_on_removal]] +by inline and map adding explicit dependencies on each page that appears +on them. Then a change to B triggers the explicit dep. While this works, +it's 1) ugly 2) probably not implemented by all plugins that could +be affected by this problem (ie, linkmap) and 3) is most of the reason why +we grew the complication of `depends_simple`. + +One way to fix this is to include with each dependency, a list of pages +that currently match it. If the list changes, the dependency is triggered. + +Should be doable, but may involve more work than +currently. Consider that a dependency on `bugs/*` currently +is triggered by just checking until *one* page is found to match it. +But to store the list, *every* page would have to be tried against it. +Unless the list can somehow be intelligently updated, looking at only the +changed pages. + +---- + +Found a further complication in presence dependencies. Map now uses +presence dependencies when adding its explicit dependencies on pages. But +this defeats the purpose of the explicit dependencies! Because, now, +when B is changed to not match a pagespec, the A's presence dep does +not fire. + +I didn't think things through when switching it to use presence +dependencies there. But, if I change it to use full dependencies, then all +the work that was done to allow map to use presence dependencies for its +main pagespec is for naught. The map will once again have to update +whenever *any* content of the page changes. + +This points toward the conclusion that explicit dependencies, however they +are added, are not the right solution at all. Some other approach, such as +maintaining the list of pages that match a dependency, and noticing when it +changes, is needed. + +---- + +### pagespec influence lists + +I'm using this term for the concept of a list of pages whose modification +can indirectly influence what pages a pagespec matches. + +> Trying to make a formal definition of this: (Note, I'm using the term sets rather than lists, but they're roughly equivalent) +> +> * Let the *matching set* for a pagespec be the set of existing pages that the pagespec matches. +> * Let a *influence set* for a pagespec be the set of all pages, *p*, whose alteration might: +> * cause the pagespec to include or exclude a page other than *p*, or +> * cause the pagespec to exclude *p*. +> +>> \[Will snipped some stuff and edited the formal definition] +> +> --[[Will]] + +>> I appreciate the formalism! +>> +>> Only existing pages need to be in these sets, because if a page is added +>> in the future, the existing dependency code will always test to see +>> if it matches. So it will be in the maching set (or not) at that point. +>> +>>> Hrm, I agree with you in general, but I think I can come up with nasty counter-examples. What about a pagespec +>>> of "!backlink(bogus)" where the page bogus doesn't exist? In this case, the page 'bogus' needs to be in the influence +>>> set even though it doesn't exist. +>>> +>>>> I think you're right, this is a case that the current code is not +>>>> handling. Actually, I made all the pagespecs return influences +>>>> even if the influence was not present or did not match. But, it +>>>> currently only records influences as dependencies when a pagespec +>>>> successfully matches. Now I'm sure that is wrong, and I've removed +>>>> that false optimisation. I've updated some of the below. --[[Joey]] +>>> +>>> Also, I would really like the formalism to include the whole dependency system, not just any additions to it. That will make +>>> the whole thing much easier to reason about. +>> +>> The problem with your definition of direct influence set seems to be +>> that it doesn't allow `link()` and `title()` to have as an indirect +>> influence, the page that matches. But I'm quite sure we need those. +>> --[[Joey]] + +>>> I see what you mean. Does the revised definition capture this effectively? +>>> The problem with this revised definition is that it still doesn't match your examples below. +>>> My revised definition will include pretty much all currently matching pages to be in the influence list +>>> because deletion of any of them would cause a change in which pages are matched - the removal problem. +>>> -- [[Will]] + +#### Examples + +* The pagespec "created_before(foo)" has an influence list that contains foo. + The removal or (re)creation of foo changes what pages match it. Note that + this is true even if the pagespec currently fails to match. + +* The pagespec "foo" has an empty influence list. This is because a + modification/creation/removal of foo directly changes what the pagespec + matches. + +* The pagespec "*" has an empty influence list, for the same reason. + Avoiding including every page in the wiki into its influence list is + very important! + +>>> So, why don't the above influence lists contain the currently matched pages? +>>> Don't you need this to handle the removal problem? -- [[Will]] + +>>>> The removal problem is slightly confusingly named, since it does not +>>>> affect pages that were matched by a glob and have been removed. Such +>>>> pages can be handled without being influences, because ikiwiki knows +>>>> they have been removed, and so can still match them against the +>>>> pagespec, and see they used to match; and thus knows that the +>>>> dependency has triggered. +>>>> +>>>> Maybe the thing to do is consider this an optimisation, where such +>>>> pages are influences, but ikiwiki is able to implicitly find them, +>>>> so they do not need to be explicitly stored. --[[Joey]] + +* The pagespec "title(foo)" has an influence list that contains every page + that currently matches it. A change to any matching page can change its + title, making it not match any more, and so the list is needed due to the + removal problem. A page that does not have a matching title is not an + influence, because modifying the page to change its title directly + changes what the pagespec matches. + +* The pagespec "backlink(index)" has an influence list + that contains index (because a change to index changes the backlinks). + Note that this is true even if the backlink currently fails. + +* The pagespec "link(done)" has an influence list that + contains every page that it matches. A change to any matching page can + remove a link and make it not match any more, and so the list is needed + due to the removal problem. + +>> Why doesn't this include every page? If I change a page that doesn't have a link to +>> 'done' to include a link to 'done', then it will now match... or is that considered a +>> 'direct match'? -- [[Will]] + +>>> The regular dependency calculation code will check if every changed +>>> page matches every dependency. So it will notice the link was added. +>>> --[[Joey]] + +#### Low-level Calculation + +One way to calculate a pagespec's influence would be to +expand the SuccessReason and FailReason objects used and returned +by `pagespec_match`. Make the objects be created with an +influence list included, and when the objects are ANDed or ORed +together, combine the influence lists. + +That would have the benefit of allowing just using the existing `match_*` +functions, with minor changes to a few of them to gather influence info. + +But does it work? Let's try some examples: + +Consider "bugs/* and link(done) and backlink(index)". + +Its influence list contains index, and it contains all pages that the whole +pagespec matches. It should, ideally, not contain all pages that link +to done. There are a lot of such pages, and only a subset influence this +pagespec. + +When matching this pagespec against a page, the `link` will put the page +on the list. The `backlink` will put index on the list, and they will be +anded together and combined. If we combine the influences from each +successful match, we get the right result. + +Now consider "bugs/* and link(done) and !backlink(index)". + +It influence list is the same as the previous one, even though a term has +been negated. Because a change to index still influences it, though in a +different way. + +If negation of a SuccessReason preserves the influence list, the right +influence list will be calculated. + +Consider "bugs/* and (link(done) or backlink(index))" +and "bugs/* and (backlink(index) or link(done))' + +Its clear that the influence lists for these are identical. And they +contain index, plus all matching pages. + +When matching the first against page P, the `link` will put P on the list. +The OR needs to be a non-short-circuiting type. (In perl, `or`, not `||` -- +so, `pagespec_translate` will need to be changed to not use `||`.) +Given that, the `backlink` will always be evalulated, and will put index +onto the influence list. If we combine the influences from each +successful match, we get the right result. + +> This is implemented, seems to work ok. --[[Joey]] + +> `or` short-circuits too, but the implementation correctly uses `|`, +> which I assume is what you meant. --[[smcv]] + +>> Er, yeah. --[[Joey]] + +---- + +What about: "!link(done)" + +Specifically, I want to make sure it works now that I've changed +`match_link` to only return a page as an influence if it *does* +link to done. + +So, when matching against page P, that does not link to done, +there are no influences, and the pagespec matches. If P is later +changed to add a link to done, then the dependency resolver will directly +notice that. + +When matching against page P, that does link to done, P +is an influence, and the pagespec does not match. If P is later changed +to not link to done, the influence will do its job. + +Looks good! + +---- + +Here is a case where this approach has some false positives. + +"bugs/* and link(patch)" + +This finds as influences all pages that link to patch, even +if they are not under bugs/, and so can never match. + +To fix this, the influence calculation would need to consider boolean +operators. Currently, this turns into roughly: + +`FailReason() & SuccessReason(patch)` + +Let's say that the glob instead returns a HardFailReason, which when +ANDed with another object, drops their influences. (But when ORed, combines +them.) Fixes the above, but does it always work? + +"(bugs/* or link(patch)) and backlink(index)" => +`( HardFailReason() | SuccessReason(patch) ) & SuccessReason(index)`` => +`SuccessReason(patch) & SuccessReason(index)` => +SuccessReason(patch, index) => right + +"(bugs/* and link(patch)) or backlink(index)" => +`( HardFailReason() & SuccessReason(patch) ) | SuccessReason(index)`` => +`HardFailReason() | SuccessReason(index)` => +`SuccessReason(index)` => right + +#### High-level Calculation and Storage + +Naively calculating the full influence list for a pagespec requires trying +to match it against every page in the wiki. I'd like to avoid doing such +expensive matching redundantly. + +It may be possible, for some types of pagespecs, to just try matching a +single, arbitrary page against it, and know the full influence list has +been obtained. It seems to be that case that if a pagespec has any +influences, matching any page will return at least one. So if none are +returned, we can skip trying other pages. + +If the influence list does not include the page that was tried, we know +that the pagespec does not things like `link()` and `title()`, that are +influenced by the page's own content. So it *might* be safe to not try +matching any more pages in this case too. I think it would work for all +current pagespec terms. There might be a hypothetical term where this +optimisation doesn't work. We could add a special case to ensure it can +work: If a term declares it is unfluenced by "", then it means it is +always influenced by the matching page. + +Anyway, this seems worth doing: Add a `pagespec_match_all`, which returns a +list of all pages in the whole wiki that match the pagespec, and also adds +the pagespec as a dependency, and while it's at it, calculates and stores +the influence list. + +It could have an optional sort parameter, and limit parameter, to control +how many items to return and the sort order. So when inline wants to +display the 10 newest, only the influence lists for those ten are added. + +If `pagespec_match_depends` can be used by all plugins, then great, +influences are automatically calculated, no extra work needs to be done. + +If not, and some plugins still need to use `pagespec_match_list` or +`pagespec_match`, and `add_depends`, then I guess that `add_depends` can do +a slightly more expensive influence calculation. + +Bonus: If `add_depends` is doing an influence calculation, then I can remove +the nasty hack it currently uses to decide if a given pagespec is safe to use +with an existence or links dependency. + +Where to store the influence list? Well, it appears that we can just add +(content) dependencies for each item on the list, to the page's +regular list of simple dependencies. So, the data stored ends up looking +just like what is stored today by the explicit dependency hacks. Except, +it's calculated more smartly, and is added automatically. + +> I've implemented influence calculation in `add_depends`. As expected, +> it means rather a lot more work, and makes some things much slower. +> Optimisations next.. --[[Joey]] + +#### Influence types + +Note that influences can also have types, same as dependency types. +For example, "backlink(foo)" has an influence of foo, of type links. +"created_before(foo)" also is influenced by foo, but it's a presence +type. Etc. + +> This is an interesting concept that I hadn't considered. It might +> allow significant computational savings, but I suspect will be tricky +> to implement. -- [[Will]] + +>> It was actually really easy to implement it, assuming I picked the right +>> dependency types of course. --[[Joey]]