Help Standardisation issues
The conversion of the entire help system to the new xml specification was a huge and doubtless heroic task. In the process a number of convenience measures had to be taken, with the result that in some cases the new specification is not fully obeyed.
This is a page to record variations and discuss solutions.
Simplified background notes
Help sources are written in xhp, which is an xml format specifically for OOo Help. The xhp files are processed by main_transform.xsl to produce html, which is in essence displayed by the OOo Help Viewer (although in practice it is more complicated than this). The Help system is very well defined in OOo2HelpAuthoring.pdf "Understanding, authoring and editing OpenOffice.org Help".
The variation of help sources from specification becomes more important as help authoring is opened up for user contributions - it is another 'entry barrier'.
A major consideration in changing help sources to bring them into specification is that any change may trigger a request for translation into other languages - which might cause much unnecessary work.
- Paragraphs within tables
- Currently main_transform.xsl changes any paragraph role inside a table by adding "intable" to it. The original reason was probably to do with layout. Thus the system does not deliver the paragraph role specified by the help author.
- Solution: the only styles ending in 'intable' that are defined in default.css - namely .listitemintable, .tablecontentintable, .codeintable, .exampleintable, .literalintable, .pathintable, .tableheadintable - are the same as those styles defined without 'intable'. It should therefore be safe to edit main_transform.xsl so that it leaves paragraph roles unchanged. Translation requests should not be triggered by this.
- It appears from running a Perl script that no paragraph directly specifies a role ending in 'intable' - thus once main_transform.xsl has been edited, styles .listitemintable, .tablecontentintable, .codeintable, .exampleintable, .literalintable, .pathintable, .tableheadintable can be removed from the .css files. Oh joy oh rapture.
- Embedded variables
- The <embed> element should embed only sections and paragraphs according to the specification. However main_transform.xsl also allows it to embed variables, enclosed in <p class="embedded"> </p> tags.
- A script reveals: there are 2618 embed elements which falsely call a variable. Of those only 9 call a variable which does not have a paragraph as direct parent. Of the 2609 which call a variable whose direct parent is a paragraph, 2492 call a variable which is the sole element inside a paragraph, and 117 call a variable which is not alone inside a paragraph. There are 53 files containing the variables which are called by these 117.
- Possible solution: The 9, 53 and 117 may need individual editing. The 2492 could be corrected by script, firstly to change the embed to call the paragraph not the variable, then to check that no embedvar element calls the variable, then to remove the variable. Embedding a paragraph rather than a variable may mean that the style changes, so this needs checking as well; most likely the paragraph will be in the default style which is how variables appear currently. If the script that triggers re-translation is set to look at textual changes to a paragraph (not just subnode changes) then no retranslation should be triggered. With this number of corrections to be made, it might be safest to start with just some of the files.
- An alternative solution is to replace any embed element which calls a variable with embedvar encased in a paragraph with role="embedded". That's simple, but leaves a lot of redundant elements. Possibly this could be used on those embeds that can't be corrected by script as above.
- Help sources that refer to '/00/00000004.xhp#wie' are intercepted by main_transform.xsl, to provide a specially formatted section which starts (in English) "To access this command..."
- solution: main_transform.xsl appears to implement an alternative method using sections, although it looks unused at present. There are 462 instances of the old unconforming references, so a script might be the only practical way to solve this.
- Demoted embedded headings
- Any level 1 heading is demoted to level 2 on embed, which is contrary to the xhp spec.
- Solution: There are 3 embed elements that call a level 1 heading directly - these could be hand edited. There are no embed elements that call a variable that contains a level 1 heading. There are 1809 embed elements that call a section that contains a level 1 heading - typically embedding 2 paragraphs. This is the difficult case - the 'correct' solution is probably to embed the section elements separately. With great care this would be possible with a script, but it is not straightforward.
- Would solving this one issue allow main_transform.xsl to be (considerably) simplified with the removal of 'mode="embedded"'? That would be a prize worth having.
- Heading linked to itself
- Heading which are linked to themselves are intercepted by main_transform.xsl to remove the link. This doesn't conform to the xhp spec - and is confusing to read in the help sources.
- It appears (from running a script) that there are 1920 link elements in the help texts that refer to their own file. All of these are removed by main_transform.xsl. However in quite a few cases the help author seemingly intended the link as written.
- This is tied in with the 'demoted headings' issue, and could probably be solved at the same time, by script. This is not straightforward.
- All self references are removed at present, even if not in the heading. A decision is needed whether to keep this behaviour or not. If keeping it, then another script would be need to remove those links. If allowing those links, then a review of their (individual) usefulness would be good, because at least some of them point to the wrong place.
- Table classes
- main_transform.xsl intercepts single cell tables and forces them to be class="onecell"; it also intercepts tables where the first cell is an image and forces them to be class="icontable". Thus the author has lost control (eg can't make an image table with a border).
- Possible solution: table class could be specified in xhp - eg <table class="onecell" .. >. As happens already with <table class="wide" .. >. This would hopefully not trigger re-translation.
- html tags
- main_transform.xsl outputs html tags containing attributes that might be better set in the stylesheet. eg the default table style <table border="1" class="border" cellpadding="0" cellspacing="0" >. This may well be because the Help Viewer does not fully implement stylesheets.
- Solution: await improved Help Viewer?
- default.css does not define some classes outputted by main_transform.xsl: .paragraph, .paragraphintable, .embedded, .border, .icontable, .onecell, .avis.
- Also <ol> and <ul> have no style attached. This works OK in the OOo HelpViewer, but when viewing the html with a browser the default style looks wrong.
- Solution: define as necessary. Not that important, and possibly the Help Viewer cannot handle some of these styles yet anyway. Some styles will become redundant when other issues have been fixed - and it might be better to wait until then.
- Redundant attributes
- The paragraph element has two attributes that were used in the changeover from the old Help system. Is there any reason they are still needed?
- 'createlink' template in main_transform.xsl
- tests for # to identify an internal link, but an external link might be http://.......#anchor - is this a potential bug?
- possible solution: test for not starting with "http://" as well as "#"
- 'insertembed' template in main_transform.xsl
- currently inserts <p class="embedded">..</p> around an embedded paragraph, which itself inserts <p...> </p> - so the <p>s are nested in html eg <p class="embedded"><p class="paragraph">some text</p></p>.
- there are only 5 instances where a paragraph is embedded. Two examples are in /text/shared/00/00000005.xhp.
- the Help Viewer seems to handle this OK, but different broswers might well see 2 attempts at a paragraph and perhaps double the line spacing.
- Solution: simply remove <p class="embedded">..</p> from 'insertembed' template
Bringing the help source files into line with the xhp definition is quite a task. It needs a plan of action.
Suggestion: tackle the easy things first - 'createlink' template / 'insertembed' template / Paragraphs within tables / Stylesheets
then check out what triggers re-translation and modify if necessary, then tackle the more difficult issues.