A C99 compliant C compiler with additions implementing many extensions and features, as well as arbirary-precision integer arithmetic.
The main feature that differentiates this compiler from others, is its ability to directly read, preprocess, tokenize, parse, assemble and link c source code, all at the same time, in a way allowing you to execute C code in an environment similar to that of an interactive commandline. If you are interested in how this is achieved, take a look at /include/drt/drt.h
Currently only able to target I386 and above, support for x86-64 is planned and already partially implemented.
Supported output formats are ELF, windows PE, as well as direct execution of generated code.
DCC supports AT&T inline assembly syntax, emulating gcc's __asm__
statement and the GNU assembler as well as direct parsing of assembly sources.
Using TPP as preprocessor to implement a fully featured perprocessor, DCC implements many GCC extensions such as __asm__
, __builtin_constant_p
, many __attribute__
-s, __typeof__
, __auto_type
, and many more, including my own twist on awesome C extensions.
Development on DCC started on 17.04.2017, eversince then being the usual one-person project.
Note that DCC is still fairly early in its development, meaning that anything can still change and that more features will be added eventually.
- Link against windows PE binaries/libraries (*.dll).
- Statically link against PE binaries (as in: clone everything from a *.dll)
- Dynamically/Statically link against ELF binaries/libraries/object files (*, *.so, *.o)
- Output windows PE binary/library (*.exe, *.dll).
- Output linux ELF binary/library (*, *.so).
- Output ELF relocatable object files (*.o)
- Process and merge (link) multiple source-/object files/static libraries.
- Compiling DCC is mainly tested and working on windows using Visual C or DCC itself. GCC and linux support is present, but may occasionally be broken.
- Full STD-C compliance up to C99.
- Full AT&T assembly support with many GNU assembler extensions (see below).
- Full ELF binary target support.
- Fully working live execution of C source code.
- DCC can fully compile itself (And the result can compile itself again!)
- Support for X86-64/AMD64 CPU architectures.
- Compiling DCC on linux (most of the work's already there, but nothing's tested yet).
- Compiling DCC with DCC (because every C compiler must be able to do that!).
- Generation of debug information (recognizeable by gdb).
- Finish many partially implemented features (see below).
- Support for true thread-local storage (aka. segment-based)
- DCC as host compiler can easily be detected with
defined(__DCC_VERSION__)
. - Using TPP as preprocessor, every existing preprocessor extension is supported, as well as all that are exclusive to mine.
- Live-compilation-mode directly generates assembly.
- C-conforming symbol forward/backward declaration.
- K&R-C compatible
- Full STD-C89/90 compliance
- Full STD-C95 compliance
- Full STD-C99 compliance
- Supports all C standard types.
- Supports 64-bit
long long
integrals (using double-register storage). - Supports all C control statements.
- Supports C11
_Generic
. - Supports C11
_Atomic
(Not fully implemented). - Supports C99
_Bool
. - Supports C99
__func__
builtin identifier. - Supports Variable declaration in if-expressions and for-initializers.
- Supports nested function declaration, as well as access to variables from surrounding scopes.
- Supports C++ lvalue types (
int y = 10; int &x = y;
). - Supports C structure bitfields
- Support for GCC statement-expressions:
int x = ({ int z = 10; z+20; }); // x == 30
. - Support for
__FUNCTION__
and__PRETTY_FUNCTION__
, including use by concat with other strings:char *s = "Function " __FUNCTION__ " was called"; printf("%s\n",s);
. - Support for GCC
__sync_*
builtin functions (__sync_val_compare_and_swap(&x,10,20)
). - Supports all compiler-slangs for alignof:
_Alignof
,__alignof
,__alignof__
and__builtin_alignof
. - Support for compile-time type deduction from expressions:
typeof
,__typeof
,__typeof__
. - Support for GCC scoped labels:
__label__
. - Support for GCC-style inline assembly:
__asm__("ret")
. - Support for MSVC fixed-length integer types:
__int(8|16|32|64)
. - Support for GCC
__auto_type
(as well as special interpretation ofauto
when not used as storage class. -auto int x = 42
auto is storage class;auto y = 10;
auto denotes automatic type deduction). - Support for C99 variable-length arrays:
int x = 10; int y[x*2]; assert(sizeof(y) == 80);
. - Support for old (pre-STDC: K&R-C) function declarations/implementations.
- Support for new (post-STDC: C90+) function declarations/implementations.
- Support for floating-point types (Assembly generator is not implemented yet).
- Support for GCC x86 segment address space (
__seg_fs
/__seg_gs
) - Debugging aids for pre-initializing local variables with
0xCC
bytes and memory allocated usingalloca
with0xAC
. - Inherited from assembly: Named register identifiers.
int x = %eax;
(CPU-specific, on i386 compiles tomov %eax, x
).int x = *(int *)%fs:0x18;
(Can also be used to access segment register, on i386 compiles tomovl %fs:(0x18), x
).
- Inherited from assembly: Get current text address.
void *p = .;
(Evaluates to the current text address withvoid *
typing).
- Use label names in expressions:
void *p = &&my_label; my_label: printf("p = %p\n",p);
- Support for new & old GCC structure/array initializer:
- dot-field:
struct { int x,y; } p = { .x = 10, .y = 20 };
- field-collon:
struct point { int x,y; } p = { x: 10, y: 20 };
- array-subscript:
int alpha[256] = { ['a' ... 'z'] = 1, ['A' ... 'Z'] = 1, ['_'] = 1 };
- dot-field:
- Support for runtime brace-initializers:
struct point p = { .x = get_x(), .y = get_y() };
- Split between struct/union/enum, declaration and label namespaces:
foo: struct foo foo; // Valid code and 3 different 'foo'
- Support for unnamed struct/union inlining:
union foo { __int32 x; struct { __int16 a,b; }; };
offsetof(union foo,x) == 0
,offsetof(union foo,a) == 0
,offsetof(union foo,b) == 2
- Support for builtin functions offering special compile-time optimizations, or functionality (Every builtin can be queried with
__has_builtin(...)
):char const (&__builtin_typestr(type_or_expr t))[];
- Accepting arguments just like 'sizeof', return a human-readable representation of the [expression's] type as a compile-time array of characters allocated in the '.string' section.
_Bool __builtin_constant_p(expr x);
expr __builtin_choose_expr(constexpr _Bool c, expr tt, expr ff);
_Bool __builtin_types_compatible_p(type t1, type t2);
void __builtin_unreachable(void) __attribute__((noreturn));
void __builtin_trap(void) __attribute__((noreturn));
void __builtin_breakpoint(void);
- Emit a CPU-specific instruction to break into a debugging environment, or do nothing if the target CPU doesn't allow for such an instruction
void *__builtin_alloca(size_t s);
void *__builtin_alloca_with_align(size_t s, size_t a);
void __builtin_assume(expr x),__assume(expr x);
long __builtin_expect(long x, long e);
const char (&__builtin_FILE(void))[];
int __builtin_LINE(void);
const char (&__builtin_FUNCTION(void))[];
void *__builtin_assume_aligned(void *p, size_t align, ...);
size_t __builtin_offsetof(typename T, members...);
T (__builtin_bitfield(T expr, constexpr int const_index, constexpr int const_size)) : const_size;
- Access a given sub-range of bits of any integral expression, the same way access is performed for structure bit-fields.
typedef ... __builtin_va_list;
void __builtin_va_start(__builtin_va_list &ap, T &start);
void __builtin_va_end(__builtin_va_list &ap);
void __builtin_va_copy(__builtin_va_list &dstap, __builtin_va_list &srcap);
T __builtin_va_arg(__builtin_va_list &ap, typename T);
- Compiler-provided var-args helpers for generating smallest-possible code
int __builtin_setjmp(T &buf);
void __builtin_longjmp(T &buf, int sig) __attribute__((noreturn));
- Requires:
sizeof(T) == __SIZEOF_JMP_BUF__
- Compile-time best-result code generation for register save to 'buf'
- Optimizations for 'sig' known to never be '0'
- Requires:
void *__builtin_malloc(size_t s);
void *__builtin_calloc(size_t c, size_t s);
void *__builtin_realloc(void *p, size_t c, size_t s);
void __builtin_free(void *p);
void __builtin_cfree(void *p);
void *__builtin_return_address(unsigned int level);
void *__builtin_frame_address(unsigned int level);
void *__builtin_extract_return_addr(void *p);
void *__builtin_frob_return_address(void *p);
void *__builtin_isxxx(void *p);
- ctype-style builtin functions
void *__builtin_memchr(void *p, int c, size_t s);
void *__builtin_memrchr(void *p, int c, size_t s);
- Additional functions are available for
mem(r)len
/mem(r)end
/rawmem(r)chr
/rawmem(r)len
- Additional functions are available for
T __builtin_min(T args...);
T __builtin_max(T args...);
void __builtin_cpu_init(void);
int __builtin_cpu_is(char const *cpuname);
int __builtin_cpu_supports(char const *feature);
char (&__builtin_cpu_vendor(char *buf = __builtin_alloca(sizeof(__builtin_cpu_vendor()))))[?];
char (&__builtin_cpu_brand(char *buf = __builtin_alloca(sizeof(__builtin_cpu_brand()))))[?];
- Returns a target-specific
'\0'
-terminated string describing the brand/vendor name of the host CPU. The length of the returned string is always constant and known at compile-time. __builtin_cpu_init
is required to be called first, and if the string cannot be determined at runtime, the returned string is filled with all'\0'
-characters.
- Returns a target-specific
uint16_t __builtin_bswap16(uint16_t x);
uint32_t __builtin_bswap32(uint32_t x);
uint64_t __builtin_bswap64(uint64_t x);
int __builtin_ffs(int x);
int __builtin_ffsl(long x);
int __builtin_ffsll(long long x);
int __builtin_clz(int x);
int __builtin_clzl(long x);
int __builtin_clzll(long long x);
- Generate inline code with per-case optimizations for best results
T __builtin_bswapcc(T x, size_t s = sizeof(T));
int __builtin_ffscc(T x, size_t s = sizeof(T));
int __builtin_clzcc(T x, size_t s = sizeof(T));
- General purpose functions that works for any size
void *__builtin_memcpy(void *dst, void const *src, size_t s);
- Replace with inlined code for sizes known at compile-time
- Warn about dst/src known to overlap
void *__builtin_memmove(void *dst, void const *src, size_t s);
- Optimize away dst == src cases
- Hint about dst/src never overlapping
void *__builtin_memset(void *dst, int byte, size_t s);
- Replace with inlined code for sizes known at compile-time
int __builtin_memcmp(void const *a, void const *b, size_t s);
- Replace with compile-time constant for constant
- Replace with inline code for sizes known at compile-time
size_t __builtin_strlen(char const *s);
- Resolve length of static strings at compile-time
- Split between declaration and assembly name (aka.
__asm__("foo")
suffix in declarations) - Arbitrary size arithmetic operations (The sky's the limit; as well as your binary size bloated with hundreds of add-instructions for one line of source code).
- Support for deemon's 'pack' keyword (now called
__pack
):- Can be used to emit parenthesis almost everywhere (except in the preprocessor, or when calling macros)
- Explicit alignment of code, data, or entire sections in-source
- Support for
#pragma comment(lib,"foo")
to link against a given library "foo" - Support for
#pragma pack(...)
- Supports GCC builtin macros for fixed-length integral constants (
__(U)INT(8|16|32|64|MAX)_C(...)
). - GCC-compatible predefined CPU macros, such as
__i386__
or__LP64__
. - Support for GCC builtin macros, such as
__SIZEOF_POINTER__
,__SIZE_TYPE__
, etc.
- Ever attribute can be written in one of three ways:
- GCC attribte syntax (e.g.:
__attribute__((noreturn))
) - cxx-11 attributes syntax (e.g.:
[[noreturn]]
) - MSVC declspec syntax (e.g.:
__declspec(noreturn)
)
- GCC attribte syntax (e.g.:
- The name of an attribute (in the above examples
noreturn
) can be written with any number of leading, or terminating underscores to prevent ambiguity with user-defined macros:__attribute__((____noreturn_))
is the same as__attribute__((noreturn))
- The following attributes (as supported by other compiler) are recognized:
__attribute__((noreturn*))
__attribute__((warn_unused_result*))
__attribute__((weak*))
__attribute__((dllexport*))
__attribute__((dllimport*))
__attribute__((visibility("default")))
__attribute__((alias("my_alias")))
__attribute__((weakref("my_alias")))
__attribute__((used*))
__attribute__((unused*))
__attribute__((cdecl*))
__attribute__((stdcall*))
__attribute__((thiscall*))
__attribute__((fastcall*))
__attribute__((section(".text")))
__attribute__((regparm(x)))
__attribute__((naked*))
__attribute__((deprecated))
__attribute__((deprecated(msg)))
__attribute__((aligned(x)))
__attribute__((packed*))
__attribute__((transparent_union*))
__attribute__((mode(x)))
(Underscores surroundingx
are ignored)- All attribute names marked with '*' accept an optional suffix that adds an enabled-dependency on a compiler-time expression. (e.g.:
__attribute__((noreturn(sizeof(int) == 4)))
- Mark as noreturn, ifint
is4
bytes wide)
- Attributes not currently implemented (But planned to be):
__attribute__((constructor))
__attribute__((constructor(priority)))
__attribute__((destructor))
__attribute__((destructor(priority)))
__attribute__((ms_struct))
__attribute__((gcc_struct))
- Attributes ignored without warning:
__attribute__((noinline...))
__attribute__((returns_twice...))
__attribute__((force_align_arg_pointer...))
__attribute__((cold...))
__attribute__((hot...))
__attribute__((pure...))
__attribute__((nothrow...))
__attribute__((noclone...))
__attribute__((nonnull...))
__attribute__((malloc...))
__attribute__((leaf...))
__attribute__((format_arg...))
__attribute__((format...))
__attribute__((externally_visible...))
__attribute__((alloc_size...))
__attribute__((always_inline...))
__attribute__((gnu_inline...))
__attribute__((artificial...))
- New attributes added by DCC:
__attribute__((lib("foo")))
- Most effective for PE targets: 'foo' is the name of the DLL file that the associated declaration should be linked against.
- Using this attribute, one can link against DLL files that don't exist at compile-time, or create artificial dependencies on ELF targets.
__attribute__((arithmetic*))
- Used on struct types of arbirary size to enable arithmetic operations with said structure. Using this attribute you could easily create e.g.: a 512-bit integer type.
- Most operators are implemented through inline-code, but some (mul,div,mod,shl,shr,sar) generate calls to external symbols.
- When this attribute is present, the associated structure type can be modified with 'signed'/'unsigned' to control the sign-behavior.
- Used on struct types of arbirary size to enable arithmetic operations with said structure. Using this attribute you could easily create e.g.: a 512-bit integer type.
- In addition, the following keywords can be used anywhere attributes are allowed.
{_}_cdecl
: Same as__attribute__((cdecl))
{_}_stdcall
: Same as__attribute__((stdcall))
{_}_fastcall
: Same as__attribute__((fastcall))
__thiscall
: Same as__attribute__((thiscall))
- DCC features an enourmous amount of warnings covering everything from code quality, to value truncation, to syntax errors, to unresolved references during linkage, etc...
- Any warning can be configured as
- Disabled: (Compilation is continued, but based on severity, generated assembly/binary may be wrong)
- Enabled: Emit a warning, but continue compilation as if it was disabled
- Error: Emit an error message and halt compilation at the next convenient location
- Supress: Works recursively: Handle the warning as Disabled for every time it is suppressed before reverting its state to before it was.
- Warnings are sorted into named groups that can be disabled as a whole. The main group of a warning is always displayed when it is emit. (e.g.:
W1401("-WSyntax"): Expected ']', but got ...
) - The global warning state can be pushed/popped from usercode:
- Push:
#pragma warning(push)
#pragma GCC diagnostic push
- Pop:
#pragma warning(pop)
#pragma GCC diagnostic pop
- Push:
- Individual warnings/warning group states can be explicitly defined from usercode:
- Disabled:
#pragma warning("[-][W]no-<name>")
#pragma warning(disable: <IDS>)
#pragma warning(disable: "[-][W]<name>")
#pragma GCC diagnostic ignored "[-][W]<name>"
- Enabled:
#pragma warning(enable: <IDS>)
#pragma warning(enable: "[-][W]<name>")
#pragma GCC diagnostic warning "[-][W]<name>"
- Error:
#pragma warning(error: <IDS>)
#pragma warning(error: "[-][W]<name>")
#pragma GCC diagnostic error "[-][W]<name>"
- Suppress (once for every time a warning/group is listed):
#pragma warning(suppress: <IDS>)
#pragma warning(suppress: "[-][W]<name>")
#pragma warning("[-][W]sup-<name>")
#pragma warning("[-][W]suppress-<name>")
- Revert to default state:
#pragma warning(default: <IDS>)
#pragma warning(default: "[-][W]<name>")
#pragma warning("[-][W]def-<name>")
IDS
is a space-separated list of individual warning IDS as integral constants- Besides belonging to any number of groups, each warning also has an ID
- Use of these
IDS
should be refrained from, as they might change randomly
- Similar to the
extension
-pragma,#pragma warning(...)
accepts a comma-seperated list of commands.#pragma warning(push,disable: "-Wsyntax")
- Disabled:
- All warnings can be enabled/disabled on-the-fly using pragmas:
#pragma warning(push|pop)
Push/pop currently enabled extensions#pragma warning("-W<name>")
Enable warning 'name'#pragma warning("-Wno-<name>")
Disable warning 'name'
#pragma GCC system_header
treats the current input file as though all warnings disabled- Mainly meant for headers in /fixinclude which may re-define type declarations, but are not meant to cause any problems
- Extensions are implemented in two different ways:
- Extensions that are always enabled, but emit a warning when used.
- The warning can either be disabled individually (e.g.:
#pragma warning("-Wno-declaration-in-if")
). - Or all extension warnings can be disabled using
#pragma warning("-Wno-extensions")
. - Don't let yourself be fooled. Writing
"-Wno-extensions"
disables warnings about extensions, not extensions themself! - Some warnings are also emit for deprecated or newer language features.
"constant-case-expressions"
: Emit for old-style function declarations."old-function-decl"
: Emit for old-style function declarations.
- The warning can either be disabled individually (e.g.:
- Extensions that may change semantics and can therefor be disabled.
- All of these extensions can be enabled/disabled on-the-fly using pragmas:
- As comma-seperated list in
#pragma extension(...)
push
: Push currently enabled extensions (e.g.:#pragma extension(push)
)pop
: Pop previously enabled extensions (e.g.:#pragma extension(pop)
)"[-][f]<name>"
: Enable extensionname
(e.g.:#pragma extension("-fmacro-recursion")
)"[-][f]no-<name>"
: Disable extensionname
(e.g.:#pragma extension("-fno-macro-recursion")
)
- As comma-seperated list in
"expression-statements"
: Recognize GCC statement-expressions."label-expressions"
: Allow use of labels in expression (prefixed by&&
)."local-labels"
: Allow labels to be scoped (using GCC's__label__
syntax)."gcc-attributes"
: Recognize GCC__attribute__((...))
syntax."msvc-attributes"
: Recognize MSVC__declspec(...)
syntax."cxx-11-attributes"
: Recognize c++11[[...]]
syntax."attribute-conditions"
: Allow optional conditional expression to follow a switch-attribute."calling-convention-attributes"
: Recognize MSVC stand-alone calling convention attributes (e.g.:__cdecl
)."fixed-length-integer-types"
: Recognize fixed-length integer types (__int(8|16|32|64)
)."asm-registers-in-expressions"
: Allow assembly registers to be used in expressions (e.g.:int x = %eax;
)."asm-address-in-expressions"
: Allow assembly registers to be used in expressions (e.g.:int x = %eax;
)."void-arithmetic"
:sizeof(void) == __has_extension("void-arithmetic") ? 1 : 0
."struct-compatible"
: When enabled, same-layout structures are compatible, when disabled, only same-declaration structs are."auto-in-type-expressions"
: Allowauto
be be used either as storage class, or as alias for__auto_type
."variable-length-arrays"
: Allow declaration of C99 VLA variables."function-string-literals"
: Treat__FUNCTION__
and__PRETTY_FUNCTION__
as language-level string literals."if-else-optional-true"
: Recognize GCC if-else syntaxint x = (p ?: other_p)->x; // Same as '(p ? p : other_p)->x'
."fixed-length-integrals"
: Recognize MSVC fixed-length integer suffix:__int32 x = 42i32;
."macro-recursion"
: Enable/Disable TCC recursive macro declaration.- Many more extensions are provided by TPP to control preprocessor syntax, such as
#include_next
directives. Their list is too long to be documented here.
- All of these extensions can be enabled/disabled on-the-fly using pragmas:
- Extensions that are always enabled, but emit a warning when used.
- Dead code elimination
- Correct deduction on merging branches, such as if-statement with two dead branches
- Re-enable control flow when encountering a label
- Correctly interpretation of
__builtin_unreachable()
- Correctly interpretation of
__{builtin_}assume(0)
- Automatic constant propagation
- Even capable of handling generic offsetof:
(size_t)&((struct foo *)0)->bar
- Even capable of handling generic offsetof:
- Automatic removal of unused symbols/data
- Recursively delete unused functions/data symbols from generated binary
- Can be suppressed for any symbol using
__attribute__((used))
- Automatic merging of data in sections marked with
M
(merge) (Not fully implemented, because of missing re-use counter; the rest already works)- Using the same string (or sub-string) more than once will only allocate a single data segment:
printf("foobar\n"); printf("bar\n");
Re-use"bar\n\0"
as a sub-string of"foobar\n\0"
- Using the same string (or sub-string) more than once will only allocate a single data segment:
- Full AT&T Assembly support
- Extension for fixed-length
- Supported assembly directives are:
.align <N> [, <FILL>]
.skip <N> [, <FILL>]
.space <N> [, <FILL>]
.quad <I>
.short <I>
.byte <I>
.word <I>
.hword <I>
.octa <I>
.long <I>
.int <I>
.fill <REPEAT> [, <SIZE> [, <FILL>]]
. = <ORG>
.org <ORG>
.extern <SYM>
.global <SYM>
.globl <SYM>
.protected <SYM>
.hidden <SYM>
.internal <SYM>
.weak <SYM>
.local <SYM>
.used <SYM>
.unused <SYM>
.size <SYM>, <\SIZE>
.string <STR>
.ascii <STR>
.asciz <STR>
.text
.data
.bss
.section
.previous
.set <SYM>, <VAL>
.include <NAME>
.incbin <NAME> [, <SKIP> [, <MAX>]]
- CPU-specific, recognized directives:
- I386+
.code16
.code32
- X86-64
.code64
- I386+
- Directives ignored without warning:
.file ...
.ident ...
.type ...
.lflags ...
.line ...
.ln ...
- Integrated linker allows for direct (and very fast) creation of executables
- Merge multiple source files into a single compilation unit
- ELF-style visibility control/attributes (
__attribute__((visibility(...)))
) - Directly link against already-generated PE binaries
- Add new library dependencies from source code (
#pragma comment(lib,...)
) - Output to PE binary (*.exe/*.dll)