X-Git-Url: http://git.vanrenterghem.biz/git.ikiwiki.info.git/blobdiff_plain/d2b1264546fa412db8c1591bc8bb14aba04b960d..43db8269a002daf00ac6195d4bd70018b4b68b75:/doc/todo/should_optimise_pagespecs.mdwn diff --git a/doc/todo/should_optimise_pagespecs.mdwn b/doc/todo/should_optimise_pagespecs.mdwn index 61458d972..1594dcee7 100644 --- a/doc/todo/should_optimise_pagespecs.mdwn +++ b/doc/todo/should_optimise_pagespecs.mdwn @@ -79,6 +79,8 @@ I can think about reducung the size of my wiki source and making it available on > > --[[Joey]] +[[!template id=gitbranch branch=smcv/ready/optimize-depends author="[[smcv]]"]] + >> I've been looking at optimizing ikiwiki for a site using >> [[plugins/contrib/album]] (which produces a lot of pages) and it seems >> that checking which pages depend on which pages does take a significant @@ -96,4 +98,89 @@ I can think about reducung the size of my wiki source and making it available on >>>> I haven't actually deleted it), because the "or" operation is now done in >>>> the Perl code, rather than by merging pagespecs and translating. --[[smcv]] +[[!template id=gitbranch branch=smcv/ready/remove-pagespec-merge author="[[smcv]]"]] + +>>>>> I've now added a patch to the end of that branch that deletes +>>>>> `pagespec_merge` almost entirely (we do need to keep a copy around, in +>>>>> ikiwiki-transition, but that copy doesn't have to be optimal or support +>>>>> future features like [[tracking_bugs_with_dependencies]]). --[[smcv]] + +--- + +Some questions on your optimize-depends branch. --[[Joey]] + +In saveindex it still or'd together the depends list, but the `{depends}` +field seems only useful for backwards compatability (ie, ikiwiki-transition +uses it still), and otherwise just bloats the index. + +> If it's acceptable to declare that downgrading IkiWiki requires a complete +> rebuild, I'm happy with that. I'd prefer to keep the (simple form of the) +> transition done automatically during a load/save cycle, rather than +> requiring ikiwiki-transition to be run; we should probably say in NEWS +> that the performance increase won't fully apply until the next +> rebuild. --[[smcv]] + +>> It is acceptable not to support downgrades. +>> I don't think we need a NEWS file update since any sort of refresh, +>> not just a full rebuild, will cause the indexdb to be loaded and saved, +>> enabling the optimisation. --[[Joey]] + +Is an array the right data structure? `add_depends` has to loop through the +array to avoid dups, it would be better if a hash were used there. Since +inline (and other plugins) explicitly add all linked pages, each as a +separate item, the list can get rather long, and that single add_depends +loop has suddenly become O(N^2) to the number of pages, which is something +to avoid.. + +> I was also thinking about this (I've been playing with some stuff based on the +> `remove-pagespec-merge` branch). A hash, by itself, is not optimal because +> the dependency list holds two things: page names and page specs. The hash would +> work well for the page names, but you'll still need to iterate through the page specs. +> I was thinking of keeping a list and a hash. You use the list for pagespecs +> and the hash for individual page names. To make this work you need to adjust the +> API so it knows which you're adding. -- [[Will]] + +> I wasn't thinking about a lookup hash, just a dedup hash, FWIW. +> --[[Joey]] + +>> I was under the impression from previous code review that you preferred +>> to represent unordered sets as lists, rather than hashes with dummy +>> values. If I was wrong, great, I'll fix that and it'll probably go +>> a bit faster. --[[smcv]] + +>>> It depends, really. And it'd certianly make sense to benchmark such a +>>> change. --[[Joey]] + +Also, since a lot of places are calling add_depends in a loop, it probably +makes sense to just make it accept a list of dependencies to add. It'll be +marginally faster, probably, and should allow for better optimisation +when adding a lot of depends at once. + +> That'd be an API change; perhaps marginally faster, but I don't +> see how it would allow better optimisation if we're de-duplicating +> anyway? --[[smcv]] + +>> Well, I was thinking that it might be sufficient to build a `%seen` +>> hash of dependencies inside `add_depends`, if the places that call +>> it lots were changed to just call it once. Of course the only way to +>> tell is benchmarking. --[[Joey]] + +In Render.pm, we now have a triply nested loop, which is a bit +scary for efficiency. It seems there should be a way to +rework this code so it can use the optimised `pagespec_match_list`, +and/or hoist some of the inner loop calculations (like the `pagename`) +out. + +> I don't think the complexity is any greater than it was: I've just +> moved one level of "loop" out of the generated Perl, to be +> in visible code. I'll see whether some of it can be hoisted, though. +> --[[smcv]] + +>> The call to `pagename` is the only part I can see that's clearly +>> run more often than before. That function is pretty inexpensive, but.. +>> --[[Joey]] + +Very good catch on img/meta using the wrong dependency; verified in the wild! +(I've cherry-picked those bug fixes.) + [[!tag wishlist patch patch/core]]