libxml2
Loading...
Searching...
No Matches
encoding.h File Reference

Character encoding conversion functions. More...

Data Structures

struct  _xmlCharEncodingHandler
 A character encoding conversion handler for non UTF-8 encodings. More...

Typedefs

typedef int(* xmlCharEncodingInputFunc) (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)
 Convert characters to UTF-8.
typedef int(* xmlCharEncodingOutputFunc) (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)
 Convert characters from UTF-8.
typedef xmlCharEncError(* xmlCharEncConvFunc) (void *vctxt, unsigned char *out, int *outlen, const unsigned char *in, int *inlen, int flush)
 Convert between character encodings.
typedef void(* xmlCharEncConvCtxtDtor) (void *vctxt)
 Free a conversion context.
typedef struct _xmlCharEncodingHandler xmlCharEncodingHandler
 Character encoding converter.
typedef xmlParserErrors(* xmlCharEncConvImpl) (void *vctxt, const char *name, xmlCharEncFlags flags, xmlCharEncodingHandler **out)
 If this function returns XML_ERR_OK, it must fill the out pointer with an encoding handler.

Enumerations

enum  xmlCharEncError
 Encoding conversion errors. More...
enum  xmlCharEncoding
 Predefined values for some standard encodings. More...
enum  xmlCharEncFlags
 Encoding conversion flags. More...

Functions

void xmlInitCharEncodingHandlers (void)
void xmlCleanupCharEncodingHandlers (void)
 Cleanup the memory allocated for the char encoding support, it unregisters all the encoding handlers and the aliases.
void xmlRegisterCharEncodingHandler (xmlCharEncodingHandler *handler)
 Register the char encoding handler.
xmlParserErrors xmlLookupCharEncodingHandler (xmlCharEncoding enc, xmlCharEncodingHandler **out)
 Find or create a handler matching the encoding.
xmlParserErrors xmlOpenCharEncodingHandler (const char *name, int output, xmlCharEncodingHandler **out)
 Find or create a handler matching the encoding.
xmlParserErrors xmlCreateCharEncodingHandler (const char *name, xmlCharEncFlags flags, xmlCharEncConvImpl impl, void *implCtxt, xmlCharEncodingHandler **out)
 Find or create a handler matching the encoding.
xmlCharEncodingHandlerxmlGetCharEncodingHandler (xmlCharEncoding enc)
xmlCharEncodingHandlerxmlFindCharEncodingHandler (const char *name)
 If the encoding is UTF-8, this will return a no-op handler that shouldn't be used.
xmlCharEncodingHandlerxmlNewCharEncodingHandler (const char *name, xmlCharEncodingInputFunc input, xmlCharEncodingOutputFunc output)
 Create and registers an xmlCharEncodingHandler.
xmlParserErrors xmlCharEncNewCustomHandler (const char *name, xmlCharEncConvFunc input, xmlCharEncConvFunc output, xmlCharEncConvCtxtDtor ctxtDtor, void *inputCtxt, void *outputCtxt, xmlCharEncodingHandler **out)
 Create a custom xmlCharEncodingHandler.
int xmlAddEncodingAlias (const char *name, const char *alias)
 Registers an alias alias for an encoding named name.
int xmlDelEncodingAlias (const char *alias)
 Unregisters an encoding alias.
const char * xmlGetEncodingAlias (const char *alias)
 Lookup an encoding name for the given alias.
void xmlCleanupEncodingAliases (void)
 Unregisters all aliases.
xmlCharEncoding xmlParseCharEncoding (const char *name)
 Compare the string to the encoding schemes already known.
const char * xmlGetCharEncodingName (xmlCharEncoding enc)
 The "canonical" name for XML encoding.
xmlCharEncoding xmlDetectCharEncoding (const unsigned char *in, int len)
 Guess the encoding of the entity using the first bytes of the entity content according to the non-normative appendix F of the XML-1.0 recommendation.
int xmlCharEncOutFunc (xmlCharEncodingHandler *handler, struct _xmlBuffer *out, struct _xmlBuffer *in)
 Generic front-end for output encoding conversion.
int xmlCharEncInFunc (xmlCharEncodingHandler *handler, struct _xmlBuffer *out, struct _xmlBuffer *in)
 Generic front-end for input encoding conversion.
int xmlCharEncFirstLine (xmlCharEncodingHandler *handler, struct _xmlBuffer *out, struct _xmlBuffer *in)
 DEPERECATED: Don't use.
int xmlCharEncCloseFunc (xmlCharEncodingHandler *handler)
 Releases an xmlCharEncodingHandler.
int xmlUTF8ToIsolat1 (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)
 Take a block of UTF-8 chars in and try to convert it to an ISO Latin 1 block of chars out.
int xmlIsolat1ToUTF8 (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)
 Take a block of ISO Latin 1 chars in and try to convert it to an UTF-8 block of chars out.

Detailed Description

Character encoding conversion functions.

Author
Daniel Veillard

Typedef Documentation

◆ xmlCharEncConvCtxtDtor

typedef void(* xmlCharEncConvCtxtDtor) (void *vctxt)

Free a conversion context.

Parameters
vctxtconversion context

◆ xmlCharEncConvFunc

typedef xmlCharEncError(* xmlCharEncConvFunc) (void *vctxt, unsigned char *out, int *outlen, const unsigned char *in, int *inlen, int flush)

Convert between character encodings.

The value of inlen after return is the number of bytes consumed and outlen is the number of bytes produced.

If the converter can consume partial multi-byte sequences, the flush flag can be used to detect truncated sequences at EOF. Otherwise, the flag can be ignored.

Parameters
vctxtconversion context
outa pointer to an array of bytes to store the result
outlenthe length of out
ina pointer to an array of input bytes
inlenthe length of in
flushend of input
Returns
an xmlCharEncError code.

◆ xmlCharEncConvImpl

typedef xmlParserErrors(* xmlCharEncConvImpl) (void *vctxt, const char *name, xmlCharEncFlags flags, xmlCharEncodingHandler **out)

If this function returns XML_ERR_OK, it must fill the out pointer with an encoding handler.

The handler can be obtained from xmlCharEncNewCustomHandler.

flags can contain XML_ENC_INPUT, XML_ENC_OUTPUT or both.

Parameters
vctxtuser data
nameencoding name
flagsbit mask of flags
outpointer to resulting handler
Returns
an xmlParserErrors code.

◆ xmlCharEncodingInputFunc

typedef int(* xmlCharEncodingInputFunc) (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)

Convert characters to UTF-8.

On success, the value of inlen after return is the number of bytes consumed and outlen is the number of bytes produced.

Parameters
outa pointer to an array of bytes to store the UTF-8 result
outlenthe length of out
ina pointer to an array of chars in the original encoding
inlenthe length of in
Returns
the number of bytes written or an xmlCharEncError code.

◆ xmlCharEncodingOutputFunc

typedef int(* xmlCharEncodingOutputFunc) (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)

Convert characters from UTF-8.

On success, the value of inlen after return is the number of bytes consumed and outlen is the number of bytes produced.

Parameters
outa pointer to an array of bytes to store the result
outlenthe length of out
ina pointer to an array of UTF-8 chars
inlenthe length of in
Returns
the number of bytes written or an xmlCharEncError code.

Enumeration Type Documentation

◆ xmlCharEncError

Encoding conversion errors.

Enumerator
XML_ENC_ERR_SUCCESS 

Success.

XML_ENC_ERR_INTERNAL 

Internal or unclassified error.

XML_ENC_ERR_INPUT 

Invalid or untranslatable input sequence.

XML_ENC_ERR_SPACE 

Not enough space in output buffer.

XML_ENC_ERR_MEMORY 

Out-of-memory error.

◆ xmlCharEncFlags

Encoding conversion flags.

Enumerator
XML_ENC_INPUT 

Create converter for input (conversion to UTF-8)

XML_ENC_OUTPUT 

Create converter for output (conversion from UTF-8)

XML_ENC_HTML 

Use HTML5 mappings.

◆ xmlCharEncoding

Predefined values for some standard encodings.

Enumerator
XML_CHAR_ENCODING_ERROR 

No char encoding detected.

XML_CHAR_ENCODING_NONE 

No char encoding detected.

XML_CHAR_ENCODING_UTF8 

UTF-8.

XML_CHAR_ENCODING_UTF16LE 

UTF-16 little endian.

XML_CHAR_ENCODING_UTF16BE 

UTF-16 big endian.

XML_CHAR_ENCODING_UCS4LE 

UCS-4 little endian.

XML_CHAR_ENCODING_UCS4BE 

UCS-4 big endian.

XML_CHAR_ENCODING_EBCDIC 

EBCDIC uh!

XML_CHAR_ENCODING_UCS4_2143 

UCS-4 unusual ordering.

XML_CHAR_ENCODING_UCS4_3412 

UCS-4 unusual ordering.

XML_CHAR_ENCODING_UCS2 

UCS-2.

XML_CHAR_ENCODING_8859_1 

ISO-8859-1 ISO Latin 1.

XML_CHAR_ENCODING_8859_2 

ISO-8859-2 ISO Latin 2.

XML_CHAR_ENCODING_8859_3 

ISO-8859-3.

XML_CHAR_ENCODING_8859_4 

ISO-8859-4.

XML_CHAR_ENCODING_8859_5 

ISO-8859-5.

XML_CHAR_ENCODING_8859_6 

ISO-8859-6.

XML_CHAR_ENCODING_8859_7 

ISO-8859-7.

XML_CHAR_ENCODING_8859_8 

ISO-8859-8.

XML_CHAR_ENCODING_8859_9 

ISO-8859-9.

XML_CHAR_ENCODING_2022_JP 

ISO-2022-JP.

XML_CHAR_ENCODING_SHIFT_JIS 

Shift_JIS.

XML_CHAR_ENCODING_EUC_JP 

EUC-JP.

XML_CHAR_ENCODING_ASCII 

pure ASCII

XML_CHAR_ENCODING_UTF16 

UTF-16 native, available since 2.14.

XML_CHAR_ENCODING_HTML 

HTML (output only), available since 2.14.

XML_CHAR_ENCODING_8859_10 

ISO-8859-10, available since 2.14.

XML_CHAR_ENCODING_8859_11 

ISO-8859-11, available since 2.14.

XML_CHAR_ENCODING_8859_13 

ISO-8859-13, available since 2.14.

XML_CHAR_ENCODING_8859_14 

ISO-8859-14, available since 2.14.

XML_CHAR_ENCODING_8859_15 

ISO-8859-15, available since 2.14.

XML_CHAR_ENCODING_8859_16 

ISO-8859-16, available since 2.14.

XML_CHAR_ENCODING_WINDOWS_1252 

windows-1252, available since 2.15

Function Documentation

◆ xmlAddEncodingAlias()

int xmlAddEncodingAlias ( const char * name,
const char * alias )

Registers an alias alias for an encoding named name.

Existing aliases will be overwritten.

Deprecated
This function modifies global state and is not thread-safe. See xmlCtxtSetCharEncConvImpl for an alternative.
Parameters
namethe encoding name as parsed, in UTF-8 format (ASCII actually)
aliasthe alias name as parsed, in UTF-8 format (ASCII actually)
Returns
0 in case of success, -1 in case of error.

◆ xmlCharEncCloseFunc()

int xmlCharEncCloseFunc ( xmlCharEncodingHandler * handler)

Releases an xmlCharEncodingHandler.

Must be called after a handler is no longer in use.

Parameters
handlerencoding handler
Returns
0.

◆ xmlCharEncFirstLine()

int xmlCharEncFirstLine ( xmlCharEncodingHandler * handler,
struct _xmlBuffer * out,
struct _xmlBuffer * in )

DEPERECATED: Don't use.

Parameters
handlerencoding handler
outan xmlBuffer for the output.
inan xmlBuffer for the input
Returns
the number of bytes written or an xmlCharEncError code.

◆ xmlCharEncInFunc()

int xmlCharEncInFunc ( xmlCharEncodingHandler * handler,
struct _xmlBuffer * out,
struct _xmlBuffer * in )

Generic front-end for input encoding conversion.

Parameters
handlerencoding handler
outan xmlBuffer for the output.
inan xmlBuffer for the input
Returns
the number of bytes written or an xmlCharEncError code.

◆ xmlCharEncNewCustomHandler()

xmlParserErrors xmlCharEncNewCustomHandler ( const char * name,
xmlCharEncConvFunc input,
xmlCharEncConvFunc output,
xmlCharEncConvCtxtDtor ctxtDtor,
void * inputCtxt,
void * outputCtxt,
xmlCharEncodingHandler ** out )

Create a custom xmlCharEncodingHandler.

Parameters
namethe encoding name
inputinput callback which converts to UTF-8
outputoutput callback which converts from UTF-8
ctxtDtorcontext destructor
inputCtxtcontext for input callback
outputCtxtcontext for output callback
outpointer to resulting handler
Returns
an xmlParserErrors code.

◆ xmlCharEncOutFunc()

int xmlCharEncOutFunc ( xmlCharEncodingHandler * handler,
struct _xmlBuffer * out,
struct _xmlBuffer * in )

Generic front-end for output encoding conversion.

A first call with in set to NULL has to be made to write a BOM.

When using GNU libiconv, unsupported characters in the output encoding will be automatically replaced with a numeric character reference.

Parameters
handlerencoding handler
outan xmlBuffer for the output.
inan xmlBuffer for the input
Returns
the number of bytes written or an xmlCharEncError code.

◆ xmlCleanupCharEncodingHandlers()

void xmlCleanupCharEncodingHandlers ( void )

Cleanup the memory allocated for the char encoding support, it unregisters all the encoding handlers and the aliases.

Deprecated
This function will be made private. Call xmlCleanupParser to free global state but see the warnings there. xmlCleanupParser should be only called once at program exit. In most cases, you don't have call cleanup functions at all.

◆ xmlCleanupEncodingAliases()

void xmlCleanupEncodingAliases ( void )

Unregisters all aliases.

Deprecated
This function modifies global state and is not thread-safe. See xmlCtxtSetCharEncConvImpl for an alternative.

◆ xmlCreateCharEncodingHandler()

xmlParserErrors xmlCreateCharEncodingHandler ( const char * name,
xmlCharEncFlags flags,
xmlCharEncConvImpl impl,
void * implCtxt,
xmlCharEncodingHandler ** out )

Find or create a handler matching the encoding.

The following converters are looked up in order:

  • Built-in handler (UTF-8, UTF-16, ISO-8859-1, ASCII)
  • Custom implementation if provided
  • User-registered global handler (deprecated)
  • iconv if enabled
  • ICU if enabled

The handler must be closed with xmlCharEncCloseFunc.

If the encoding is UTF-8, a NULL handler and no error code will be returned.

flags can contain XML_ENC_INPUT, XML_ENC_OUTPUT or both.

Since
2.14.0
Parameters
namea string describing the char encoding.
flagsbit mask of flags
impla conversion implementation (optional)
implCtxtuser data for conversion implementation (optional)
outpointer to result
Returns
XML_ERR_OK, XML_ERR_UNSUPPORTED_ENCODING or another xmlParserErrors error code.

◆ xmlDelEncodingAlias()

int xmlDelEncodingAlias ( const char * alias)

Unregisters an encoding alias.

Deprecated
This function modifies global state and is not thread-safe. See xmlCtxtSetCharEncConvImpl for an alternative.
Parameters
aliasthe alias name as parsed, in UTF-8 format (ASCII actually)
Returns
0 in case of success, -1 in case of error.

◆ xmlDetectCharEncoding()

xmlCharEncoding xmlDetectCharEncoding ( const unsigned char * in,
int len )

Guess the encoding of the entity using the first bytes of the entity content according to the non-normative appendix F of the XML-1.0 recommendation.

Parameters
ina pointer to the first bytes of the XML entity, must be at least 2 bytes long (at least 4 if encoding is UTF4 variant).
lenpointer to the length of the buffer
Returns
a xmlCharEncoding value.

◆ xmlFindCharEncodingHandler()

xmlCharEncodingHandler * xmlFindCharEncodingHandler ( const char * name)

If the encoding is UTF-8, this will return a no-op handler that shouldn't be used.

Deprecated
Use xmlOpenCharEncodingHandler which has better error reporting.
Parameters
namea string describing the char encoding.
Returns
the handler or NULL if no handler was found or an error occurred.

◆ xmlGetCharEncodingHandler()

xmlCharEncodingHandler * xmlGetCharEncodingHandler ( xmlCharEncoding enc)
Deprecated
Use xmlLookupCharEncodingHandler which has better error reporting.
Parameters
encan xmlCharEncoding value.
Returns
the handler or NULL if no handler was found or an error occurred.

◆ xmlGetCharEncodingName()

const char * xmlGetCharEncodingName ( xmlCharEncoding enc)

The "canonical" name for XML encoding.

C.f. http://www.w3.org/TR/REC-xml#charencoding Section 4.3.3 Character Encoding in Entities

Parameters
encthe encoding
Returns
the canonical name for the given encoding.

◆ xmlGetEncodingAlias()

const char * xmlGetEncodingAlias ( const char * alias)

Lookup an encoding name for the given alias.

Deprecated
This function is not thread-safe.
Parameters
aliasthe alias name as parsed, in UTF-8 format (ASCII actually)
Returns
NULL if not found, otherwise the original name.

◆ xmlInitCharEncodingHandlers()

void xmlInitCharEncodingHandlers ( void )

◆ xmlIsolat1ToUTF8()

int xmlIsolat1ToUTF8 ( unsigned char * out,
int * outlen,
const unsigned char * in,
int * inlen )

Take a block of ISO Latin 1 chars in and try to convert it to an UTF-8 block of chars out.

The value of inlen after return is the number of bytes consumed. The value of outlen after return is the number of bytes produced.

Parameters
outa pointer to an array of bytes to store the result
outlenthe length of out
ina pointer to an array of ISO Latin 1 chars
inlenthe length of in
Returns
the number of bytes written or an xmlCharEncError code.

◆ xmlLookupCharEncodingHandler()

xmlParserErrors xmlLookupCharEncodingHandler ( xmlCharEncoding enc,
xmlCharEncodingHandler ** out )

Find or create a handler matching the encoding.

The following converters are looked up in order:

  • Built-in handler (UTF-8, UTF-16, ISO-8859-1, ASCII)
  • User-registered global handler (deprecated)
  • iconv if enabled
  • ICU if enabled

The handler must be closed with xmlCharEncCloseFunc.

If the encoding is UTF-8, a NULL handler and no error code will be returned.

Since
2.13.0
Parameters
encan xmlCharEncoding value.
outpointer to result
Returns
XML_ERR_OK, XML_ERR_UNSUPPORTED_ENCODING or another xmlParserErrors error code.

◆ xmlNewCharEncodingHandler()

xmlCharEncodingHandler * xmlNewCharEncodingHandler ( const char * name,
xmlCharEncodingInputFunc input,
xmlCharEncodingOutputFunc output )

Create and registers an xmlCharEncodingHandler.

Deprecated
This function modifies global state and is not thread-safe. See xmlCtxtSetCharEncConvImpl for an alternative.
Parameters
namethe encoding name, in UTF-8 format (ASCII actually)
inputthe xmlCharEncodingInputFunc to read that encoding
outputthe xmlCharEncodingOutputFunc to write that encoding
Returns
the xmlCharEncodingHandler created (or NULL in case of error).

◆ xmlOpenCharEncodingHandler()

xmlParserErrors xmlOpenCharEncodingHandler ( const char * name,
int output,
xmlCharEncodingHandler ** out )

Find or create a handler matching the encoding.

The following converters are looked up in order:

  • Built-in handler (UTF-8, UTF-16, ISO-8859-1, ASCII)
  • User-registered global handler (deprecated)
  • iconv if enabled
  • ICU if enabled

The handler must be closed with xmlCharEncCloseFunc.

If the encoding is UTF-8, a NULL handler and no error code will be returned.

Since
2.13.0
Parameters
namea string describing the char encoding.
outputboolean, use handler for output
outpointer to result
Returns
XML_ERR_OK, XML_ERR_UNSUPPORTED_ENCODING or another xmlParserErrors error code.

◆ xmlParseCharEncoding()

xmlCharEncoding xmlParseCharEncoding ( const char * name)

Compare the string to the encoding schemes already known.

Note that the comparison is case insensitive accordingly to the section [XML] 4.3.3 Character Encoding in Entities.

Parameters
namethe encoding name as parsed, in UTF-8 format (ASCII actually)
Returns
one of the xmlCharEncoding values or XML_CHAR_ENCODING_NONE if not recognized.

◆ xmlRegisterCharEncodingHandler()

void xmlRegisterCharEncodingHandler ( xmlCharEncodingHandler * handler)

Register the char encoding handler.

Deprecated
This function modifies global state and is not thread-safe. See xmlCtxtSetCharEncConvImpl for an alternative.
Parameters
handlerthe xmlCharEncodingHandler handler block

◆ xmlUTF8ToIsolat1()

int xmlUTF8ToIsolat1 ( unsigned char * out,
int * outlen,
const unsigned char * in,
int * inlen )

Take a block of UTF-8 chars in and try to convert it to an ISO Latin 1 block of chars out.

The value of inlen after return is the number of bytes consumed. The value of outlen after return is the number of bytes produced.

Parameters
outa pointer to an array of bytes to store the result
outlenthe length of out
ina pointer to an array of UTF-8 chars
inlenthe length of in
Returns
the number of bytes written or an xmlCharEncError code.