Yes I understand your point of view and I will use a custom regex.
But I guess you probably don''t want to implement the new type "unicode-nmtoken" using:
neither, because it wouldn''t be consistent with "nmtoken" which recognizes hyphens :
http://www.w3.org/TR/REC-xml/#NT-Nmtoken
The URL above shows that a nmtoken is :
Unfortunately the # encoding is not explicit and I don''t know if these ranges cover the UTF-8 characters or not (I guess they don''t, otherwise it would be a bug in the current NetKernel implementation).
Thanks for your help!
Gregoire
But I guess you probably don''t want to implement the new type "unicode-nmtoken" using:
|
<regex>[^\\p{Punct}]*</regex>
|
neither, because it wouldn''t be consistent with "nmtoken" which recognizes hyphens :
http://www.w3.org/TR/REC-xml/#NT-Nmtoken
The URL above shows that a nmtoken is :
|
(":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] |
[#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] | "-" | "." |
[0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040])+
|
Unfortunately the # encoding is not explicit and I don''t know if these ranges cover the UTF-8 characters or not (I guess they don''t, otherwise it would be a bug in the current NetKernel implementation).
Thanks for your help!
Gregoire