Javascript required
Skip to content Skip to sidebar Skip to footer

Object Filler Again Assets Abc Logo

Group of binary-to-text encoding schemes using 64 symbols (plus padding)

In estimator programming, Base64 is a grouping of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented past four 6-bit Base64 digits.

Common to all binary-to-text encoding schemes, Base64 is designed to deport data stored in binary formats across channels that just reliably support text content. Base64 is peculiarly prevalent on the World wide web[1] where one of its uses is the ability to embed paradigm files or other binary avails inside textual assets such as HTML and CSS files.[2]

Base64 is too widely used for sending electronic mail attachments. This is required because SMTP—in its original form—was designed to transport 7-scrap ASCII characters only. This encoding causes an overhead of 33–36% (33% by the encoding itself; up to 3% more by the inserted line breaks).

Pattern [edit]

The detail prepare of 64 characters chosen to correspond the 64 digit values for the base varies between implementations. The general strategy is to choose 64 characters that are common to most encodings and that are as well printable. This combination leaves the information unlikely to exist modified in transit through information systems, such as email, that were traditionally non 8-bit clean.[3] For example, MIME's Base64 implementation uses A-Z, a-z, and 0-9 for the first 62 values. Other variations share this property just differ in the symbols chosen for the terminal 2 values; an example is UTF-vii.

The primeval instances of this blazon of encoding were created for dial-up advice between systems running the same OS, for example, uuencode for UNIX and BinHex for the TRS-80 (afterwards adapted for the Macintosh), and could therefore make more assumptions almost what characters were safe to use. For instance, uuencode uses uppercase letters, digits, and many punctuation characters, merely no lowercase.[four] [5] [6] [3]

Base64 table from RFC 4648 [edit]

This is the Base64 alphabet defined in RFC 4648 §4 . See likewise Variants summary (below).

Index Binary Char Alphabetize Binary Char Index Binary Char Index Binary Char
0 000000 A sixteen 010000 Q 32 100000 one thousand 48 110000 w
1 000001 B 17 010001 R 33 100001 h 49 110001 x
2 000010 C 18 010010 Southward 34 100010 i 50 110010 y
3 000011 D 19 010011 T 35 100011 j 51 110011 z
4 000100 E xx 010100 U 36 100100 yard 52 110100 0
5 000101 F 21 010101 V 37 100101 50 53 110101 1
six 000110 G 22 010110 West 38 100110 m 54 110110 2
vii 000111 H 23 010111 X 39 100111 n 55 110111 three
8 001000 I 24 011000 Y 40 101000 o 56 111000 4
9 001001 J 25 011001 Z 41 101001 p 57 111001 v
x 001010 1000 26 011010 a 42 101010 q 58 111010 6
11 001011 L 27 011011 b 43 101011 r 59 111011 seven
12 001100 Grand 28 011100 c 44 101100 due south 60 111100 8
xiii 001101 North 29 011101 d 45 101101 t 61 111101 9
xiv 001110 O 30 011110 due east 46 101110 u 62 111110 +
fifteen 001111 P 31 011111 f 47 101111 v 63 111111 /
Padding =

Examples [edit]

The instance below uses ASCII text for simplicity, merely this is not a typical apply example, as information technology tin already be safely transferred across all systems that can handle Base64. The more typical use is to encode binary data (such every bit an image); the resulting Base64 data volition simply contain 64 different ASCII characters, all of which can reliably be transferred across systems that may corrupt the raw source bytes.

Here is a well-known idiom from distributed computing:

Many hands make calorie-free piece of work.

When the quote is encoded into Base64, it is represented every bit a byte sequence of 8-bit-padded ASCII characters encoded in MIME'due south Base64 scheme as follows (newlines and white spaces may be nowadays anywhere just are to be ignored on decoding):

TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu

In the above quote, the encoded value of Human is TWFu. Encoded in ASCII, the characters Thousand, a, and north are stored as the byte values 77, 97, and 110, which are the eight-fleck binary values 01001101, 01100001, and 01101110. These three values are joined together into a 24-fleck string, producing 010011010110000101101110. Groups of 6 bits (6 bits have a maximum of ii6 = 64 different binary values) are converted into private numbers from start to cease (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values.

Every bit this example illustrates, Base64 encoding converts three octets into 4 encoded characters.

Source Text (ASCII) M a n
Octets 77 (0x4d) 97 (0x61) 110 (0x6e)
Bits 0 1 0 0 1 one 0 ane 0 1 one 0 0 0 0 i 0 i one 0 1 i one 0
Base64
encoded
Sextets 19 22 5 46
Character T W F u
Octets 84 (0x54) 87 (0x57) lxx (0x46) 117 (0x75)

= padding characters might be added to make the terminal encoded block incorporate four Base64 characters.

Hexadecimal to octal transformation is useful to convert between binary and Base64. Both for advanced calculators and programming languages, such conversion is available. For case, the hexadecimal representation of the 24 bits higher up is 4D616E. The octal representation is 23260556. Those 8 octal digits can be split into pairs (23 26 05 56), and each pair converted to decimal to yield 19 22 05 46. Using those four decimal numbers as indices for the Base64 alphabet, the corresponding ASCII characters are TWFu.

If there are only 2 significant input octets (e.yard., 'Ma'), or when the last input grouping contains only ii octets, all sixteen bits will exist captured in the start 3 Base64 digits (eighteen bits); the two least significant bits of the last content-bearing six-bit block volition plough out to be goose egg, and discarded on decoding (along with the succeeding = padding grapheme):

Source Text (ASCII) Thousand a
Octets 77 (0x4d) 97 (0x61)
Bits 0 ane 0 0 one 1 0 1 0 ane 1 0 0 0 0 1 0 0
Base64
encoded
Sextets 19 22 4 Padding
Character T W E =
Octets 84 (0x54) 87 (0x57) 69 (0x45) 61 (0x3D)

If there is only one significant input octet (e.g., 'Chiliad'), or when the last input group contains only i octet, all 8 bits will exist captured in the first two Base64 digits (12 bits); the four to the lowest degree pregnant bits of the last content-begetting half-dozen-bit block will turn out to be nothing, and discarded on decoding (along with the succeeding two = padding characters):

Source Text (ASCII) M
Octets 77 (0x4d)
Bits 0 1 0 0 1 ane 0 one 0 0 0 0
Base64
encoded
Sextets 19 16 Padding Padding
Character T Q = =
Octets 84 (0x54) 81 (0x51) 61 (0x3D) 61 (0x3D)

Output padding [edit]

Considering Base64 is a six-bit encoding, and considering the decoded values are divided into eight-chip octets on a mod computer, every four characters of Base64-encoded text (4 sextets = 4 × six = 24 bits) represents three octets of unencoded text or data (3 octets = iii × 8 = 24 bits). This means that when the length of the unencoded input is not a multiple of 3, the encoded output must accept padding added and so that its length is a multiple of four. The padding character is =, which indicates that no farther bits are needed to fully encode the input. (This is different from A, which means that the remaining bits are all zeros.) The example below illustrates how truncating the input of the above quote changes the output padding:

Input Output Padding
Text Length Text Length
low-cal wor k. 11 bGlnaHQgd29y ay4= xvi ane
light wor k x bGlnaHQgd29y aw== 16 2
low-cal wor 9 bGlnaHQgd29y 12 0
light wo viii bGlnaHQgd28= 12 ane
light westward 7 bGlnaHQgdw== 12 2

The padding character is non essential for decoding, since the number of missing bytes can be inferred from the length of the encoded text. In some implementations, the padding character is mandatory, while for others it is not used. An exception in which padding characters are required is when multiple Base64 encoded files have been concatenated.

Decoding Base64 with padding [edit]

When decoding Base64 text, four characters are typically converted back to three bytes. The just exceptions are when padding characters exist. A single = indicates that the iv characters volition decode to only ii bytes, while == indicates that the four characters will decode to just a unmarried byte. For example:

Encoded Padding Length Decoded
bGlnaHQgdw== == ane light westward
bGlnaHQgd28= = 2 low-cal wo
bGlnaHQgd29y None 3 light wor

Decoding Base64 without padding [edit]

Without padding, afterward normal decoding of four characters to 3 bytes over and over again, fewer than four encoded characters may remain. In this situation, only two or three characters tin remain. A single remaining encoded graphic symbol is not possible, considering a single Base64 character simply contains 6 bits, and viii bits are required to create a byte, and so a minimum of ii Base64 characters are required: The showtime graphic symbol contributes 6 bits, and the 2d character contributes its first 2 bits. For example:

Length Encoded Length Decoded
ii bGlnaHQgdw 1 low-cal west
iii bGlnaHQgd28 ii calorie-free wo
4 bGlnaHQgd29y 3 light wor

Implementations and history [edit]

Variants summary tabular array [edit]

Implementations may have some constraints on the alphabet used for representing some bit patterns. This notably concerns the final two characters used in the alphabet at positions 62 and 63, and the character used for padding (which may be mandatory in some protocols or removed in others). The table below summarizes these known variants and provides links to the subsections beneath.

Encoding Encoding characters Separate encoding of lines Decoding non-encoding characters
62nd 63rd pad Separators Length Checksum
RFC 1421: Base64 for Privacy-Enhanced Mail service (deprecated) + / = mandatory CR+LF 64, or lower for the final line No No
RFC 2045: Base64 transfer encoding for MIME + / = mandatory CR+LF At most 76 No Discarded
RFC 2152: Base64 for UTF-7 + / No No No
RFC 3501: Base64 encoding for IMAP mailbox names + , No No No
RFC 4648 §four: base64 (standard)[a] + / = optional No No
RFC 4648 §v: base64url (URL- and filename-safe standard)[a] - _ = optional No No
RFC 4880: Radix-64 for OpenPGP + / = mandatory CR+LF At nigh 76 Radix-64 encoded 24-bit CRC No
Other variations encounter Applications not compatible with RFC-4648 Base64 (below)
  1. ^ a b It is important to annotation that this variant is intended to provide common features where they are not desired to be specialised past implementations, ensuring robust engineering. This is particularly in light of separate line encodings and restrictions, which take not been considered when previous standards have been co-opted for use elsewhere. Thus, the features indicated hither may be over-ridden.

Privacy-enhanced mail [edit]

The start known standardized employ of the encoding at present chosen MIME Base64 was in the Privacy-enhanced Electronic mail (PEM) protocol, proposed past RFC 989 in 1987. PEM defines a "printable encoding" scheme that uses Base64 encoding to transform an arbitrary sequence of octets to a format that tin be expressed in short lines of half-dozen-bit characters, every bit required by transfer protocols such as SMTP.[vii]

The current version of PEM (specified in RFC 1421) uses a 64-grapheme alphabet consisting of upper- and lower-case Roman messages (AZ, az), the numerals (09), and the + and / symbols. The = symbol is also used as a padding suffix.[4] The original specification, RFC 989, additionally used the * symbol to delimit encoded just unencrypted data within the output stream.

To convert information to PEM printable encoding, the get-go byte is placed in the most significant eight bits of a 24-bit buffer, the side by side in the centre eight, and the third in the least significant eight $.25. If there are fewer than three bytes left to encode (or in full), the remaining buffer bits will be nada. The buffer is then used, six bits at a time, well-nigh meaning showtime, every bit indices into the cord: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", and the indicated character is output.

The process is repeated on the remaining information until fewer than iv octets remain. If three octets remain, they are processed ordinarily. If fewer than three octets (24 $.25) are remaining to encode, the input data is right-padded with zip bits to form an integral multiple of half dozen bits.

After encoding the non-padded data, if two octets of the 24-fleck buffer are padded-zeros, two = characters are appended to the output; if one octet of the 24-fleck buffer is filled with padded-zeros, one = graphic symbol is appended. This signals the decoder that the nil bits added due to padding should be excluded from the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes.

PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the terminal line, which may comprise fewer printable characters. Lines are delimited by whitespace characters co-ordinate to local (platform-specific) conventions.

MIME [edit]

The MIME (Multipurpose Internet Mail Extensions) specification lists Base64 as i of two binary-to-text encoding schemes (the other being quoted-printable).[five] MIME'due south Base64 encoding is based on that of the RFC 1421 version of PEM: information technology uses the same 64-character alphabet and encoding mechanism equally PEM, and uses the = symbol for output padding in the aforementioned mode, as described at RFC 2045.

MIME does not specify a fixed length for Base64-encoded lines, but it does specify a maximum line length of 76 characters. Additionally it specifies that any extra-alphabetic characters must exist ignored by a compliant decoder, although most implementations use a CR/LF newline pair to circumscribe encoded lines.

Thus, the bodily length of MIME-compliant Base64-encoded binary data is ordinarily about 137% of the original data length ( four3 × 7876 ), though for very short letters the overhead tin be much higher due to the overhead of the headers. Very roughly, the final size of Base64-encoded binary information is equal to i.37 times the original information size + 814 bytes (for headers). The size of the decoded data can exist approximated with this formula:

bytes = (string_length(encoded_string) - 814) / 1.37        

UTF-7 [edit]

UTF-7, described first in RFC 1642, which was later superseded past RFC 2152, introduced a system called modified Base64. This information encoding scheme is used to encode UTF-16 equally ASCII characters for use in vii-bit transports such as SMTP. It is a variant of the Base64 encoding used in MIME.[8] [nine]

The "Modified Base64" alphabet consists of the MIME Base64 alphabet, but does non use the "=" padding character. UTF-vii is intended for use in mail headers (divers in RFC 2047), and the "=" character is reserved in that context as the escape character for "quoted-printable" encoding. Modified Base64 simply omits the padding and ends immediately subsequently the last Base64 digit containing useful bits leaving up to three unused bits in the terminal Base64 digit.

OpenPGP [edit]

OpenPGP, described in RFC 4880, describes Radix-64 encoding, also known every bit "ASCII armor". Radix-64 is identical to the "Base64" encoding described from MIME, with the addition of an optional 24-bit CRC. The checksum is calculated on the input data before encoding; the checksum is then encoded with the same Base64 algorithm and, prefixed by "=" symbol as separator, appended to the encoded output data.[10]

RFC 3548 [edit]

RFC 3548, entitled The Base16, Base32, and Base64 Data Encodings, is an informational (non-normative) memo that attempts to unify the RFC 1421 and RFC 2045 specifications of Base64 encodings, alternative-alphabet encodings, and the Base32 (which is seldom used) and Base16 encodings.

Unless implementations are written to a specification that refers to RFC 3548 and specifically requires otherwise, RFC 3548 forbids implementations from generating letters containing characters outside the encoding alphabet or without padding, and it likewise declares that decoder implementations must reject data that incorporate characters outside the encoding alphabet.[half-dozen]

RFC 4648 [edit]

This RFC obsoletes RFC 3548 and focuses on Base64/32/16:

This certificate describes the commonly used Base64, Base32, and Base16 encoding schemes. It also discusses the apply of line-feeds in encoded information, use of padding in encoded data, employ of non-alphabet characters in encoded data, utilise of different encoding alphabets, and canonical encodings.

URL applications [edit]

Base64 encoding can be helpful when fairly lengthy identifying data is used in an HTTP environment. For instance, a database persistence framework for Java objects might use Base64 encoding to encode a relatively big unique id (by and large 128-chip UUIDs) into a cord for employ equally an HTTP parameter in HTTP forms or HTTP GET URLs. Likewise, many applications need to encode binary data in a way that is convenient for inclusion in URLs, including in hidden web form fields, and Base64 is a convenient encoding to render them in a compact way.

Using standard Base64 in URL requires encoding of '+', '/' and '=' characters into special percent-encoded hexadecimal sequences ('+' becomes '%2B', '/' becomes '%2F' and '=' becomes '%3D'), which makes the string unnecessarily longer.

For this reason, modified Base64 for URL variants exist (such every bit base64url in RFC 4648), where the '+' and '/' characters of standard Base64 are respectively replaced past '-' and '_', and so that using URL encoders/decoders is no longer necessary and has no effect on the length of the encoded value, leaving the same encoded class intact for utilize in relational databases, web forms, and object identifiers in general. A popular site to make use of such is YouTube.[11] Some variants allow or require omitting the padding '=' signs to avert them existence dislocated with field separators, or require that any such padding be percentage-encoded. Some libraries[ which? ] will encode '=' to '.', potentially exposing applications to relative path attacks when a folder name is encoded from user data.

HTML [edit]

The atob() and btoa() JavaScript methods, defined in the HTML5 typhoon specification,[12] provide Base64 encoding and decoding functionality to spider web pages. The btoa() method outputs padding characters, but these are optional in the input of the atob() method.

Other applications [edit]

Example of an SVG containing embedded JPEG images encoded in Base64[13]

Base64 tin be used in a variety of contexts:

  • Base64 can exist used to transmit and store text that might otherwise cause delimiter collision
  • Spammers use Base64 to evade basic anti-spamming tools, which frequently do non decode Base64 and therefore cannot detect keywords in encoded messages.
  • Base64 is used to encode character strings in LDIF files
  • Base64 is oft used to embed binary data in an XML file, using a syntax like to <information encoding="base64">…</data> e.g. favicons in Firefox's exported bookmarks.html.
  • Base64 is used to encode binary files such as images inside scripts, to avoid depending on external files.
  • The information URI scheme can use Base64 to represent file contents. For example, background images and fonts tin be specified in a CSS stylesheet file as data: URIs, instead of being supplied in separate files.
  • The FreeSWAN IPSec implementation precedes Base64 strings with 0s, so they can be distinguished from text or hexadecimal strings.[ citation needed ]
  • Although non role of the official specification for SVG, some viewers can interpret Base64 when used for embedded elements, such every bit images inside SVG.[14]

Applications not uniform with RFC-4648 Base64 [edit]

Some application use a Base64 alphabet that is significantly dissimilar from the alphabets used in the almost mutual Base64 variants (see Variants summary table above).

One issue with the RFC 4648 alphabet is that, when a sorted listing of ASCII-encoded strings is base64-transformed and sorted again, the order of elements changes. This is because the padding character and the characters in the commutation alphabet are not ordered by ASCII graphic symbol value (which can be seen by using the following sample table's sort buttons). Alphabets similar (unpadded) B64 accost this.

ASCII base64 base64, no padding B64
light west bGlnaHQgdw== bGlnaHQgdw P4ZbO5EURk
calorie-free wo bGlnaHQgd28= bGlnaHQgd28 P4ZbO5EURqw
light wor bGlnaHQgd29y bGlnaHQgd29y P4ZbO5EURqxm
  • The Uuencoding alphabet includes no lower case characters, instead using ASCII codes 32 ("" (space)) through 95 ("_"), consecutively. Uuencoding uses the alphabet " !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_". Avoiding all lower-instance letters was helpful, because many older printers only printed uppercase. Using consecutive ASCII characters saved calculating power, because it was only necessary to add 32, without requiring a lookup tabular array. Its use of most punctuation characters and the space character may limit its usefulness in some applications, such equally those that use these characters as syntax.[ citation needed ]
  • BinHex 4 (HQX), which was used within the classic Mac OS, excludes some visually confusable characters like '7', 'O', 'g' and 'o'. Its alphabet includes additional punctuation characters. It uses the alphabet "!"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr".
  • Several other applications utilize alphabets like to the mutual variations, just in a different lodge:
    • Unix stores password hashes computed with crypt in the /etc/passwd file using an encoding called B64. catacomb's alphabet puts the punctuation . and / earlier the alphanumeric characters. crypt uses the alphabet "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz". Padding is not used.
    • The GEDCOM 5.5 standard for genealogical data interchange encodes multimedia files in its text-line hierarchical file format. GEDCOM uses the same alphabet as crypt, which is "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".[fifteen]
    • bcrypt hashes are designed to exist used in the aforementioned mode every bit traditional crypt(three) hashes, just bcrypt'southward alphabet is in a dissimilar order than crypt'due south. bcrypt uses the alphabet "./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789".[16]
    • Xxencoding uses a mostly-alphanumeric character set like to crypt, merely using + and - rather than . and /. Xxencoding uses the alphabet "+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".
    • 6PACK, used with some terminal node controllers, uses an alphabet from 0x00 to 0x3f.[17]
    • Bash supports numeric literals in Base64. Bash uses the alphabet "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_".[18]

Run across also [edit]

  • 8BITMIME
  • Ascii85 (also called Base85)
  • Base16
  • Base32
  • Base36
  • Base62
  • Binary-to-text encoding for a comparing of various encoding algorithms
  • Binary number
  • URL

References [edit]

  1. ^ "Base64 encoding and decoding - Web APIs | MDN".
  2. ^ "When to base64 encode images (and when not to)". 28 August 2011.
  3. ^ a b The Base16,Base32,and Base64 Data Encodings. IETF. October 2006. doi:10.17487/RFC4648. RFC 4648. Retrieved March 18, 2010.
  4. ^ a b Privacy Enhancement for InternetElectronic Mail: Part I: Message Encryption and Hallmark Procedures. IETF. February 1993. doi:ten.17487/RFC1421. RFC 1421. Retrieved March 18, 2010.
  5. ^ a b Multipurpose Internet Mail Extensions: (MIME) Role One: Format of Internet Message Bodies. IETF. Nov 1996. doi:10.17487/RFC2045. RFC 2045. Retrieved March 18, 2010.
  6. ^ a b The Base16, Base32, and Base64 Data Encodings. IETF. July 2003. doi:10.17487/RFC3548. RFC 3548. Retrieved March eighteen, 2010.
  7. ^ Privacy Enhancement for Internet Electronic Mail. IETF. Feb 1987. doi:x.17487/RFC0989. RFC 989. Retrieved March 18, 2010.
  8. ^ UTF-7 A Post-Condom Transformation Format of Unicode. IETF. July 1994. doi:10.17487/RFC1642. RFC 1642. Retrieved March 18, 2010.
  9. ^ UTF-7 A Post-Safe Transformation Format of Unicode. IETF. May 1997. doi:10.17487/RFC2152. RFC 2152. Retrieved March 18, 2010.
  10. ^ OpenPGP Message Format. IETF. November 2007. doi:10.17487/RFC4880. RFC 4880. Retrieved March 18, 2010.
  11. ^ "Here's Why YouTube Will Practically Never Run Out of Unique Video IDs". www.mentalfloss.com. 23 March 2016. Retrieved 27 December 2021.
  12. ^ "vii.three. Base64 utility methods". HTML 5.2 Editor's Draft. Www Consortium. Retrieved 2 January 2018. Introduced past changeset 5814, 2021-02-01.
  13. ^ <image xlink:href="data:prototype/jpeg;base64,JPEG contents encoded in Base64" ... />
  14. ^ "Edit fiddle". jsfiddle.net.
  15. ^ "The GEDCOM Standard Release 5.5". Homepages.rootsweb.beginnings.com. Retrieved 2012-06-21 .
  16. ^ Provos, Niels (1997-02-thirteen). "src/lib/libc/crypt/bcrypt.c r1.i". Retrieved 2018-05-xviii .
  17. ^ "6PACK a "real time" PC to TNC protocol". Retrieved 2013-05-nineteen .
  18. ^ "Shell Arithmetic". Bash Reference Manual . Retrieved 8 April 2020. Otherwise, numbers take the form [base of operations#]n, where the optional base of operations is a decimal number between 2 and 64 representing the arithmetic base of operations, and n is a number in that base.

addisongoinat.blogspot.com

Source: https://en.wikipedia.org/wiki/Base64