Here I jot down thoughts, roadmaps, to-do's and other things related to PDCLib. Newest entry first.
The ePub conversion was a dead end; I should have spent the time reading instead of working on “conversion to better readable format”. So now I am looking at wasted time, a reading backlog, and lots of things I neglected while working on the now-abandoned conversion.
Since I was asked, I thought I could just as well give the answer here:
Why are you doing
<locale.h>first? I would think floating point support would be more important.
Three reasons, really. The first is just a minor snag – FP I/O is locale-dependent (decimal point vs. decimal comma).
The second is that, to do the FP logic right (instead of naïve 80-20 solutions), you need to take lots of platform specifics into account. This will blow up
<_PDCLIB_config.h> significantly, and result in lots of rather ugly conditional code.
Third, it is quite simply the area I have the least expertise in. I want to save the hardest part for last.
Sometimes we find ourselves approaching new technologies from rather unexpected angles. Right now I am working on an ePub conversion of The Unicode Standard for easier reading, as PDF handles poorly on my tolino ebook reader.
I would probably never have bothered with looking into the ePub format if it had not been for PDCLib… we live and learn.
There is no way around it. Too much of the whole ctype, wctype, uchar, locale issue is pointing to Unicode all over again. And I have been cursing at getting tangled by lots of cross-references and internal dependencies, so now I made myself sit down and tackle the monster that is The Unicode Standard. From cover to cover, as there seems to be no real shortcut to “just what I need right now”.
So… yeah. Stay put.
In these past two days, I learned a lot about the Unicode Collation Algorithm. Yes, I can do this, I can make this part of the PDCLib.
But no, not in the immediate future. That will have to “make do” with the “C” locale.
I have added
_PDCLIB_load_lc_*() functions for all the locale categories mandated by C99, plus
LC_MESSAGES which is a C99-compliant POSIX extension which is required anyway for
perror() to be locale-aware.
The one thing left is
LC_COLLATE. Collation in the C locale is comparatively simple, but Unicode aware collation?
Let's just say that the corresponding Unicode document, converted to PDF for easier offline reading, amounts to 61 pages. I will have to dig through that at some point, so why not now.
Bah. Think first. There already is a function to load contents for the various locale-data structures from file, and it's name is
Also, while loading from the filesystem is rather “raw”, any other mechanic will be even more “raw”, and less standard (as in,
So stop dithering and make
setlocale() do more than
Looking at what I already had in
<locale.h>, I decided some reworking was required. Stuffing everything into
struct lconv was not the smartest idea I had, so I did split things up into separate
struct _PDCLIB_lc_*. I also moved the
extern declaration of the actual data instances from
<_PDCLIB_int.h> where they are less confusing to the casual observer.
I am currently thinking in terms of
_PDCLIB_load_lc() to load contents for the various locale-data structures from file. I do not like the idea of having raw filesystem access inside PDCLib, though… this needs some pondering.
get-uctypes (the source of which is in the repo at
auxiliary/uctype/), I now have a program to get character classification information (as required by
<ctype.h> and, more importantly,
<wctype.h>) directly from data files available from unicode.org.
shepherd branch already had this functionality, but it was a) written in Python (which IMHO has no place in the source tree of a C library); b) including the raw data files which made them prone to getting outdated and required additional legalese added due to Unicode licensing; c) not giving correct results, and more importantly, not offering an easy way to test against the system library's results.
Now I have to provide a way to actually use the derived information in PDCLib proper.