Bug 158127

Summary: INDEX should use en dash (not hyphen) for number ranges
Product: LibreOffice Reporter: R. Green <greenandpleasant2000-support>
Component: WriterAssignee: Not Assigned <libreoffice-bugs>
Status: NEW ---    
Severity: normal CC: dgp-mail, heiko.tietze, mentoring, rb.henschel, vsfoote
Priority: medium Keywords: difficultyMedium, easyHack, skillCpp
Version: unspecified   
Hardware: All   
OS: All   
See Also: https://bugs.documentfoundation.org/show_bug.cgi?id=158119
Whiteboard:
Crash report or crash signature: Regression By:
Bug Depends on:    
Bug Blocks: 89606, 129434    

Description R. Green 2023-11-09 10:16:02 UTC
Version: 7.5.4.2 (X86_64) / LibreOffice Community
Build ID: 36ccfdc35048b057fd9854c757a8b67ec53977b6
CPU threads: 2; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-GB (en_GB.UTF-8); UI: en-GB
Calc: threaded

Number ranges are ALWAYS written by using dashes, e.g. 23–29, NOT hyphens (i.e. NOT 
23-29).

Unfortunately, indexes are generated in LO Writer using hyphens rather than the correct em dashes.

So, hyphens in index number ranges need to be replaced with em dashes.
Comment 1 R. Green 2023-11-24 09:51:52 UTC
Big oops! That should have been EN (repeat EN) dashes NOT em dashes.
Comment 2 Dieter 2023-11-26 12:32:52 UTC
As far as I can see, "ALWAYS" is not true. Wikipedia says for example APA-Stiyle uses en-dash, while AMA-Style uses hyphen: https://en.wikipedia.org/wiki/Dash

So perhaps there should be an option in index dialog. The option "Combine with -" is too vague. To have the options "Combine with hyphen" and "Combine with en-dash" would be an enhancement.

cc: Design-Team
Comment 3 Heiko Tietze 2023-11-27 11:08:57 UTC
Quick and dirty solution would be to change "aNumStr += "-";" in sw/source/core/doc/doctxm.cxx. But I like the idea with the option, which should be available in the ToC dialog offered as dropdown list (cannot think of another list separator than dashes) instead of "combine with -".

I wonder if the file format has any restriction and what MSO makes out of those documents. If I manually replace the dash it's read in both Writer and MSO correctly (of course replaced on update).
Comment 4 V Stuart Foote 2023-11-27 13:16:01 UTC
Other facet is localization. The TOC/Index generator (core/tox and header)  seem to have additional TOC/Index structure for CJK and CTL nodes. 

Rather than just the appended U+002D HYPHEN-MINUS as U+2013 EN DASH what could other locales require?
Comment 5 Heiko Tietze 2023-11-27 13:22:15 UTC
(In reply to V Stuart Foote from comment #4)
> Rather than just the appended U+002D HYPHEN-MINUS as U+2013 EN DASH what
> could other locales require?

Wikipedia lists four types: En dash, Em dash, Horizontal bar, Figure dash, plus the U+002D hyphen makes it five. I can also imagine running text <1> "to" <2" (localized, of course).
Comment 6 Heiko Tietze 2024-01-31 09:09:20 UTC
No further input, let's implement.