More Than Just Characters

A common misconception is that the char data type in C or C++ is only for storing a single character like ‘A’ or ‘b’. In reality, char is simply a 1-byte data type. While it’s commonly used to represent characters, it can also hold small numerical values, essentially acting like a tiny integer.

In this post, we’ll explore:

What char actually is.
The relationship between char and ASCII
Why signedness matters.
Some insights on how escape sequences are represented.

What is a `char`?

In C and C++, char is a data type that is 1 byte in size. While both char and int are primitive data types, they differ in size, int typically 4 bytes on modern systems, whereas char is always 1 byte. Both can be used to store integral (whole number) values, with their ranges determined by their sizes.

Just like integers, chars can also be signed, allowing them to represent both positive and negative values

unsigned char: 0 to 255
signed char: typically –128 to 127

At first glance, you might wonder, why does a char even need to be signed? Why would anyone need to map a character to a negative number?

As mentioned earlier, char isn’t solely for storing characters. The signedness of char isn’t an issue when used for characters, since ASCII values are non-negative. But because char is fundamentally an integer type, it supports the full range of integer operations, including signedness, making it versatile for other uses. Computers simply use a lookup table of characters, where each numeric value maps to a character. Internally, they read, process, and store these values in numerical form, but when presenting them to the user, they consult the lookup table to display the corresponding, human-readable character.

The best example for such a lookup table is the ASCII table. Let’s take a look at the ASCII table now.

Let’s do a cross type comparison to check what is common on the different types.

Like Other Integer Types

char behaves much like int, short, long:

It can be signed or unsigned (depending on the implementation).
It supports arithmetic: +, -, *, /, %.
It supports bitwise operators: &, |, ^, ~, <<, >>.

Like Boolean (In Size)

The size of char is the same as the size of bool and it is 1 byte in size. Yeah, bool only needs 1 bit to represent true or false. But to make memory alignment easier compilers pads the 1 bit with 7 additional bits making it 1 byte in size.

A char is essentially a small integer type. However, integers are typically larger than 1 byte, while the entire ASCII table—including the extended set of characters—fits perfectly within a single byte. Therefore, using a larger data type would be unnecessary, all we need is a char to store individual characters efficiently.

The ASCII Table

The ASCII standard maps integers to characters. For example,

Character	Decimal	Hex
`A`	65	0x41
`B`	66	0x42
`a`	97	0x61
0	48	0x30
1	49	0x31

Standard ASCII characters use values from 0 to 127, which fit perfectly within the 7 bits they require. This range also fits within both signed char (which goes up to 127) and unsigned char (which goes up to 255), so signedness doesn’t affect standard ASCII.

There is also an extended version of the ASCII table that adds 128 more characters. But to store the extended set ASCII characters we need more than 7 bits. This means the ASCII table needs another bit. With that ASCII table can occupy the whole range of values within the range of a byte to store the extended version along with all the other characters from the base ASCII table.

The base ASCII table only occupies 7 bits. Extended ASCII uses all 8 bits of a byte. This introduces the issue of figuring out how to store all the other different locales and symbols. Over time, people came up with various solutions to address this, but none of them fit entirely within a single char. Covering those solutions is beyond the scope of this post.

Escape Characters and Their Representations

Now, let’s look at something different, escape sequences.

\n → newline (ASCII 10)
\r → carriage return (ASCII 13)
\t → horizontal tab (ASCII 9)

You might have already seen in some applications that \n is used to represent a new line. But have you ever wondered why these characters, which are invisible on screen, are represented by a backslash followed by a letter?

If you click on the show non printable character button on the right you get this view of the non printable characters on the requests intercepted by burpsuite.

In Vim, running :set list will reveal invisible characters. to check your listchars configuration, running :set listchars? will give you a simillar list as follows
```
1
listchars=tab:> ,trail:+,eol:$
```
then:
- End-of-line (\n) will appear as $.
- Tabs will appear as > .
- Trailing spaces as +.
Mostly, when you open a file that was modified on Windows in a Linux environment, you might see the newline character displayed as ^M to represent the \r left behind from the \r\n as linux only used \n to den ote a new line. This is called caret notation, a way of representing non-printable control characters in a readable form by prefixing them with ^ and a corresponding letter (eg:, ^M represents carriage return).

This is simply how different systems represent these characters. There is not a hardcoded notation, the developers are free to show these on their system as how they see fit, that won’t be an issue as long as everyone uses the same look up table to interpret what they are receiving. But the convention, for example \n is common and widely being used.

A Practical Example - 1: Writing Text to a File

Suppose we create a file containing just:

1
AAA

That is, three capital As followed by a newline.

Performing a hex dump might show something like this:

1
 -  hexdump AAA.txt
2
0000000 4141 0a41
3
0000004

Breaking it down,

0x41 → 'A' (decimal 65)
0x0A → newline. (decimal 10)
The sequence 4141 0a41 is simply the file contents viewed as 16-bit chunks in little-endian order.

1
 -  hexdump AAA.txt -c
2
0000000   A   A   A  \n
3
0000004

If you pass the -c flag to the same hexdump command, you can get the interpretation of each byte.

This example concludes how data is stored in a file. systems internally do not deal with characters directly, but store them as numberical values when presenting to the user the responsible applications will perform a lookup in the specific table (eg: ASCII table ) and show the correct value.

A Practical Example - 2: Arithmetic Operations on Chars

Here is a C code example demonstrating how char can be used in arithmetic to get the next character in sequence.

1
#include <stdio.h>
2

3
int main() {
4
    char my_char = 'A';
5
    // 'A's ASCII value is 65, so 65 + 1 = 66
6
    char next_char = my_char + 1;
7

8
    printf("The character is: %c\n", my_char);
9
    printf("The next character is: %c\n", next_char);
10

11
    return 0;
12
}

1
 -  gcc chars-arithmetic.cpp
2
 -  ./a.out
3
The character is: A
4
The next character is: B

Wrapping Up

So, what’s the main point here?

The key is to remember that a char is fundamentally a 1-byte integer type. While its primary role is to store characters by mapping them to numerical values from a lookup table like ASCII, it’s not limited to this function.

Because char is a numerical type, it can:
Be used for arithmetic operations.
Hold small positive and negative integers.

Resources

Source: https://github.com/n3tw0rth/blog-resources/tree/master/chars

More Than Just Characters

What is a char?

The ASCII Table

Escape Characters and Their Representations

A Practical Example - 1: Writing Text to a File

A Practical Example - 2: Arithmetic Operations on Chars

Wrapping Up

Resources

What is a `char`?