It is not C Lessons at all :). I’d programming in C long time ago, sometimes I want to pick something up, but I cannot find the peice of code somewhere or cannot run the code written in another machine.
Sadly, old dog always need to learn something new.
- Access the code from anywhere, oh, GitHub is good one
- Run or write code on anywhere, so Linux, Darwin, or Windows, Docker Box
- Easy to try and learn
Now, we had Nore, something changed and something not.
Let’s start …
# bootstrap Nore
curl https://raw.githubusercontent.com/junjiemars/nore/master/bootstrap.sh -sSfL | sh
# configure -> make -> test -> install
./configure --has-hi
make
make test
make install
Run the example under src/lang
.
./configure --has-lang
make clean test
The preprocessor runs first, as the name implies. It performs some text manipulations, such as:
- stripping comments
- resolving
#include
directives and replacing them with the contents of the included file #include_next
directives does not distinguish between<file>
and"file"
inclusion, just look the file in the search path- evaluating
#if
and#ifdef
directives - evaluating
#define
- expanding the macros found in the rest of the code according to
those
#define
./configure --lang
make clean lang_preprocessor_test
The #include
directive instructs the preprocessor to paste the text
of the given file into the current file. Generally, it is necessary to
tell the preprocessor where to look for header files if they are not
placed in the current directory or a standard system directory.
The #define
directive takes two forms: defining a constant or
creating a macro.
- Defining a constant
#define identifier [value]
When defining a constant, you may optionally elect not to provide a
value for that constant. In this case, the identifier will be
replaced with blank text, but will be “defined” for the purposes of
#ifdef
and ifndef
. If a value is provided, the given token will be
replaced literally with the remainder of the text on the line. You
should be careful when using #define
in this way.
- Defining a parameterized macro
#define identifier(<arg> [, <arg> ...]) statement
#define max(a, b) ((a) > (b) ? (a) : (b))
#undef identifier
The #undef
directive undefines a constant or macro that defined
previously using #define
.
For example:
#define E 2.71828
double e_squared = E * E;
#ifdef E
# undef E
#endif
Usually, #undef
is used to scope a preprocessor constant into a very
limited region: this is done to avoid leaking the constant. #undef
is the only way to create this scope since the preprocessor does not
understand block scope.
#if
check the value of the symbol when the symbol had been defined,
#ifdef
just check the existence of the symbol.
Prefer #if defined(...)
, it’s more flexible
#if defined(LINUX) || defined(DARWIN)
/* code: when on LINUX or DARWIN platform */
#endif
#if defined(CLANG) && (1 == NM_CPU_LITTLE_ENDIAN)
/* code: when using clang compiler and on a little endian machine */
#endif
#ifndef identifer
/* code: when the identifier had not been defined */
#endif
#ifndef
checks whether the given identifier has been #defined
earlier in the file or in an included file; if not, it includes the
code between it and the closing #else
or, if no #else
is present,
#endif
statement. #ifndef
is often used to make header files
idempotent by defining a identifier once the file has been included
and checking that the identifier was not set at the top of that file.
#ifndef _LANG_H_
# define _LANG_H_
#endif
#if !defined(identifier)
is equivalent to #ifndef identifier
#if !defined(min)
# define min(a, b) ((a) < (b) ? (a) : (b))
#endif
#error "[description]"
The #error
macro allows you to make compilation fail and issue a
statement that will appear in the list of compilation errors. It is
most useful when combined with #if/#elif/#else
to fail compilation
if some condition is not true. For example:
#if (1 == _ERROR_)
# error "compile failed: because _ERROR_ == 1 is true"
#endif
The #pragma
directive is used to access compiler-specific
preprocessor extensions.
A common use of #pragma
is the #pragma once
directive, which asks
the compiler to include a header file only a single time, no matter
how many times it has been imported.
#pragma once
/* header file code */
/* #pragma once is equivalent to */
#ifndef _FILE_NAME_H_
# define _FILE_NAME_H_
/* header file code */
#endif
The #pragma
directive can also be used for other compiler-specific
purposes. #pragma
is commonly used to suppress warnings.
#if (MSVC)
# pragma warning(disable:4706) /* assignment within conditional expression */
# pragma comment(lib, "Ws2_32.lib") /* link to Ws2_32.lib */
#elif (GCC)
# pragma GCC diagnostic ignored "-Wstrict-aliasing" /* (unsigned*) &x */
#elif (CLANG)
# pragma clang diagnostic ignored "-Wparentheses"
#endif
__FILE__
expands to full path to the current file__LINE__
expands to current line number in the source file, as an integer__DATE__
expands to current date at compile time in the formMmm dd yyyy
as a string, such as “Oct 26 2021”__TIME__
expands to current time at compile time in the formhh:mm:ss
in 24 hour time as a string, such as “16:08:17”__TIMESTAMP__
expands to current time at compile time in the formDdd Mmm Date hh::mm::ss yyyy
as a string, where the time is in 24 hour time,Ddd
is the abbreviated day,Mmm
is the abbreviated month,Date
is the current day of the month (1-31), andyyyy
is the four digit year, such as “Tue Oct 26 12:42:21 2021”__func__
expands to the function name as part of C99
Most C programs call the library routine exit
, which flushes
buffers, closes streams, unlinks temporary files, etc., before calling
_exit
.
No, there’s nothing wrong with assert
as long as you use it as
intended.
- assert: a failure in the program’s logic itself.
- error: an erroneous input or system state not due to a bug in the program.
Assertions are primarily intended for use during debugging and are
generally turned off before code is deployed by defining the NDEBUG
macro.
# with assert
./configure --has-lang
make clean lang_assert_test
# erase assertions: simple way
./configure --has-lang --with-release=yes
make clean lang_assert_test
An assertion specifies that a program statisfies certain conditions at particular points in its execution. There are three types of assertion:
- preconditions: specify conditions at the start of a function.
- postconditions: specify conditions at the end of a function.
- invariants: specify conditions over a defined region of a program.
The static_assert
macro, which expands to the _Static_assert_
,
a keyword added in C11 to provide compile-time assertion.
enum [identifier] { enumerator-list };
enumerator = constant-expression;
enumerator-list
is a comma-separated list, tailing comma permitted
since C99, identifier
is optional. If enumerator
is followed by
constant expression, its value is the value of that constant
expression. If enumerator
is not followed by constant-expression,
its value is the value one greater than the value of the previous
enumerator in the same enumeration. The value of the first enumerator
if it does not use constant-expression is zero.
Unlike struct
and union
, there are no forward-declared enum
in
C.
- fail safe pertaining to a system or component that automatically places itself in a safe operating mode in the event of a failue: a traffic light that reverts to blinking red in all directions when normal operation fails.
- fail soft pertaining to a system or component that continues to
provide partial operational capability in the event of certain
failues: a traffic light that continues to alternate between red and
green if the yellow light fails. A static variable
errno
indicating the error status of a function call or object. These indicators are fail soft. - fail hard aka fail fast or fail stop. The reaction to a detected fault is to immediately halt the system. Termination is fail hard.
Before C11, errno
was a global variable, with all the inherent
disadvantages:
- later system calls overwrote earlier system calls;
- global map of values to error conditions (
ENOMEM
,ERANGE
, etc); - behavior is underspecified in ISO C and POSIX;
- technically
errno
is a modifiable lvalue rather than a global variable, so expressions like&errno
may not be well-defined; - thread-unsafe;
In C11, errno
is thread-local, so it is thread-safe.
Disadvantages of Function Return Value:
- functions that return error indicators cannot use return value for other uses;
- checking every function call for an error condition increases code stabilities by 30%-40%;
- impossible for library function to enforce that callers check for error condition.
char * strerror(int errnum);
Interprets the value of errnum, generating a string with a message
that describes the error condition as if set to errno
by a function
of the library. The returned pointer points to a statically allocated
string, which shall not be modified by the program. Further calls to
this function may overwrite its content (particular library
implementations are not required to avoid data races). The error
strings produced by strerror may be specific to each system and
library implementation.
void perror(const char *str);
Interprets the value of errno
as an error message, and prints it to
stderr (the standard error output stream, usually the console),
optionally preceding it with the custom message specified in str. If
the parameter str is not a null pointer, str is printed followed by
a colon :
and a space. Then, whether str was a null pointer or
not, the generated error description is printed followed by a newline
character '\n'
. perror
should be called right after the error was
produced, otherwise it can be overwritten by calls to other functions.
C90 main()
declarations:
int main(void);
int main(int argc, char **argv);
/* samed with above */
int main(int argc, char *argv[]);
/* classicaly, Unix system support a third variant */
int main(int argc, char **argv, char**envp);
C99 the value return
from main()
:
- the
int
return type may not be omitted. - the
return
statement may be omitted, if so andmain()
finished, there is an implicitreturn 0
.
In arguments:
argc > 0
argv[argc] == 0
argv[0]
through toargv[argc-1]
are pointers to string whose meaning will be determined by the program.argv[0]
will be a string containing the program’s name or a null string if that is not avaiable.envp
is not specified by POSIX but widely supported,getenv
is the only one specified by the C standard, theputenv
andextern char **environ
are POSIX-specific.
- call graph is cyclic
- cross more than one translation unit
Prefixing a macro token with #
will quote that macro token. This
allows you to turn bare words in your source code into text
token. This can be particularly useful for writing a macro to convert
the member of enum
from int
into a string.
enum COLOR { RED, GREEN, BLUE };
#define COLOR_STR(x) #x
The ##
operator takes two separate tokens and pastes them together
to form a single identifier. The resulting identifier could be a
variable name, or any other identifier.
#define DEFVAR(type, var, val) type var_##var = val
DEFVAR(int, x, 1); /* expand to: int var_x = 1; */
DEFVAR(float, y, 2.718); /* expand to: float var_y = 2.718; */
Expression-type macro will expand to expression, such as the following macro definition
#define double_v1(x) 2*x
But double_v1
has drawback, call double_v1(1+1)*8
expands to wrong
expression 2*1+1*8
.
Use parens to quoted input arguments and final expression:
#define double_v2(x) (2*(x))
Now, it expands to (2*(1+1))*8
But, max
macro has side-effect that eval the argument twice
#define max(a, b) ((a) > (b) ? (a) : (b))
when call it with max(a, b++)
.
If the macro definition includes ;
statatment ending character, we
need to block it.
#define incr(a, b) \
(a)++; \
(b)++;
Call it with
int a=2, b=3;
if (a > b) incr(a, b);
just only b
will be incremented. We can block it and convert it to
block-type macro.
#define incr(a, b) { \
(a)++; (b)++; \
}
But the aboved block macro is not good enough: omit ;
is no
intitutive and the tailing ;
will wrong in some cases, such as
int a = 2, b = 3;
if (a < b)
incr(a, b); /* tailing ; */
else
a *= 10;
/* expanded code, and should compile failed */
if (a < b)
{ (a)++; (b)++; };
else
a *= 10;
do { ... } while (0)
resolved those issues.
#define incr(a, b) do { \
(a)++; (b)++; \
} while (0) /* no tailing ; */
/* expanded code */
if (a < b)
do { (a)++; (b)++; } while (0); /* append ; */
else
a *= 10;
We can use same machinism like Lisp’s (gensym)
to rebind the input
arguments to new symbols.
Macro name within another macro is called Nesting of Macro.
#define SQUARE(x) ((x)*(x))
#define CUBE(x) (SQUARE(x)*(x))
cc -E <source-file>
The &
address of.
The *
has two distinct meanings within C in relation to pointers,
depending on where it’s used. When used within a variable
declaration, the value on the right hand side of the equals side
should be a pointer value to an address in memory. When used with
an already declared variable, the *
will deference the pointer
value, following it to the pointer-to place in memory, and allowing
the value stored there to be assigned or retrieved.
Depends on compiler and machine, all types of pointers on specified machine and compiled via specified compiler has same the size, generally occupy one machine word.
Threre is a technique known as the Clockwise/Spiral Rule enables any C programmer to parse in their head any C declaration.
The first const
can be either side of the type.
const int * == int const *; /* pointer to const int */
const int * const == int const * const; /* const pointer to const int */
- pointer to
const
objectint v = 0x11223344; const int *p = &v;
const
pointer to objectint v1=0x11223344; int *const p1 = &v1;
const
pointer toconst
objectint v1=0x11223344; const int *const p = &v1;
- pointer to pointer to
const
objectconst int **p;
- pointer to
const
pointer to objectint *const *p;
const
pointer to pointer to objectint* *const p;
- pointer to
const
pointer toconst
objectconst int *const *p;
const
pointer to pointer toconst
objectconst int **const p;
const
pointer toconst
pointer to objectint *const *const p;
Run example:
./configure --has-lang
make clean lang_ptr_const_test
The volatile
is to tell the compiler not to optimize the reference,
so that every read or write does not use the value stored in register
but does a real memory access.
volatile int v1;
int *p_v1 = &v1; /* bad */
volatile int *p_v1 = &v1; /* better */
restrict
keyword had been introduced after c99- It’s only way for programmer to inform about an optimizations that compiler can make.
return_type_of_fn (*fn)(type_of_arg1 arg1, type_of_arg2 arg2 ...);
void
Pointer
The void*
is a catch all type for pointers to object types, via
void
pointer can get some ploymorphic behavior. see qsort
in
stdlib.h
Pointers that point to invalid addresses are sometimes called dangling pointers.
Decay refers to the implicit conversion of an expression from an array
type to a pointer type. In most contexts, when the compiler sees an
array expression it converts the type of the expression from
N-element array of T to const pointer to T and set the value of
the expression to the address of the first element of the array. The
exceptions to this rule are when an array is an operand of either the
sizeof
or &
operators, or the array is a string literal being used
as an initializer in a declaration. More importantly the term decay
signifies loss of type and dimension.
In computer programming, aliasing refers to the situation where the same memory location can be accessed using different names.
Storage class in C decides the part of storage to be allocated for a variable, it also determines the scope of a variable. Memory and CPU registers are types of locations where a variable’s value can be stored. There are four storage classes in C those are automatic, register, static, and external.
Each declaration can only have one of five storage class specifier:
static
, extern
, auto
, register
and typedef
.
typedef
storage class specifier does not reserve storage and is
called a storage class specifier only for syntatic convenience.
The general declaration that use a storage class is show here:
<storage-class-specifier> <type> <identifer>
Living example:
./configure --has-lang
make clean lang_storage_test
auto
storage class specifier denotes that an identifier has
automatic duration. This means once the scope in which the
identifier was defined ends, the object denoted by the identifier is
no longer valid.
Since all objects, not living in global scope or being declared
static
, have automatic duration by default when defined, this
keyword is mostly of historical interest and should not be used.
auto
can’t apply to parameter declarations. It is the default for
variable declared inside a function body, and is in fact a historic
leftover from C predecessor’s B.
- scope: variable defined with
auto
storage class specifier are local to the function scope or block scope inside which they are defined. - duration: automatic, till the end of the function scope or block scope where the variable is defined
- default initial value: garbage value
Hints to the compiler that access to an object should as fast as
possible.Whether the compiler actually uses the hint is
implementation-defined; it may simply treat it as equivalent to
auto
.
The compiler does make sure that you do not take the address of a vairable with the register storage class.
The only property that is definitively different for all objects that
are declared with register
is that they cannot have their address
computed. Thereby register
can be a good tool to ensure centain
optimizations:
/* error: address of register variable requested */
register int i = 0x10;
int *p = &i;
i
that can never alias because no code can pass its address to
another function where it might be changed unexpectedly
This property also implies that an array
void decay(char *a);
register char a[] = { 0x11, 0x22, 0x33, 0x44, };
decay(a);
cannot decay into a pointer to its first element (i.e. turning into
&a[0]
). This means that the elements of such an array cannot be
accessed and the array itself cannot be passed to a function.
In fact, the only legal usage of an array declared with a register
storage class is the sizeof
operator; Any other operator would
require the address of the first element of the array. For that
reason, arrays generally should not be declared with the register
keyword since it makes them useless for anything other than size
computation of the entire array, which can be done just as easily
without register
keyword.
The register
storage class is more appropriate for variables that
are defined inside a block and are accessed with high frequency.
- scope: function scope or block scope
- duration: automatic, till the end of function scope or block scope in which the variable is defined
- default initial value: garbage value
The static storge class serves different purposes, depending on the
location of the declaration in the file. >=C99
, used in function
parameters to denote an array is expected to have a constant minimum
number of elements and a non-null parameter.
- scope: file scope (confine the identifier to that translation unit only) or function scope (save data for use with the next call of a function)
- duration: static
- default initial value: 0
extern
keyword used to declare an object or function that is defined
elsewhere (and that has external linkage). In general, it is used to
declare an object or function to be used in a module that is not the
one in which the corresponding object or function is defined.
- scope: global
- duration: static
- default initial value: 0
In C, all identifiers are lexically (or statically) scoped.
The scope of a declaration is the part of the code where the declaration is seen and can be used. Note that this says nothing about whether the object associated to the declaration can be accessed from some other part of the code via another declaration. We uniquely identify an object by its memory: the storage for a variable or the function code.
Finally, note that a declaration in a nested scope can hide a declaration in an outer scope; but only if one of two has no linkage.
If neither the extern
keyword nor an initializer are present, the
statement can be either a declaration or a definition. It is up to
the compiler to analyse the modules of the program and decide.
- All declarations with no linkage are also definitions. Other declarations are definitions if they have an initializer.
- A file scope variable declaration without the external linkage storage class specifier or initializer is a tentative definition.
- All definitions are declarations but not vice-versa.
- A definition of an identifier is a declaration for that identifier that: for an object, causes storage to be reserved for that object.
A declaration specifies the interpretation and attributes of a set of identifiers. A definition of an identifier is a declaration for that identifier that:
- for an object, causes storage to be reserved for that object;
- for a function, includes the function body;
- for an enumeration constant or typedef name, is the only declaration of the identifier.
In the following example we declared a function. Using extern
keyword is optional while declaring function. If we don’t write
exern
keyword while declaring function, it is automatically appended
before it.
int add(int, int);
Every variable or function declaration that appears inside a block has block scope. The scope goes from the declaration to the end of the innermost block in which the declaration appears. Function parameter declarations in function definitions (but not in prototypes) also have block scope. The scope of a parameter declaration therefore includes the parameter declarations that appears after it.
goto <label>
is a bit special, which are implicitly declared at the
place where they appears, but they are visible throughout the
function, even if they appear inside a block.
function prototype scope is the scope for function parameters that appears inside a function prototype. It extends until the end of the prototype. This scope exists to ensure that function parameters have distinct names.
All vairables and functions defined ouside functions have file scope. They are visible from their declaration until the end of the file. Here, the term file should be understood as the source file being compiled, after all includes have been resolved.
Indicates whether the object associated to the declaration persists throughout the program’s execution (static) or whether it is allocated dynamically when the declaration’s scope is entered (automatic).
There are two kind of duration:
- automatic
- static
Within functions at block scope, declarations without extern
or
static
have automatic duration. Any other declaration at file scope
has static duration.
Linkage describes how identifiers can or can not refer to the same entity throughout the whole program or one single translation unit.
Living example:
./configure --has-lang
make clean lang_linkage_test
A translation unit is the ultimate input to a C compiler from which
an object file is generated. In casual usage it is sometimes referred
to as a compilation unit. A translation unit roughly consists of a
source file after it has been processed by the C preprocessor, meaning
that header files listed in #include
directives are literally
included, sections of code within #ifdef
may be included, and macros
have been expanded.
A declaration with no linkage is associated to an object that is not shared with any other declaration. All declarations with no linkage happen at block scope: all block scope declarations without the extern storage class specifier have no linkage.
Internal linkage means that the variable must be defined in your translation unit scope, which means it should either be defined in any of the included libraries, or in the same file scope. Within the translation unit, all declarations with internal linkage for the same identifier refer to the same object.
External linkage means that the variable could be defined somewhere else outside the file you are working on, which means you can define it inside any other translation unit rather your current one. Within the whole program, all declarations with external linkage for the same identifier refer to the same object.
The C language specification include the typedefs size_t
and
ptrdiff_t
to represent memory-related quantities. Their size is
defined according to the target processor’s arithmetic capabilities,
not the memory capabilities, such as avaialable address space. Both of
these types are defined in the <stddef.h>
header.
size_t
is an unsigned integeral type used to represent the size of any object in the particular implementation. Thesizeof
operator yields a value of the typesize_t
. The maximum size ofsize_t
is provided viaSIZE_MAX
, a macro constant which is defined in the<stdint.h>
header.ptrdiff_t
is a signed integral type used to reprensent the difference between pointers. It is only guranteed to be valid against pointers of the same type.ssize_t
is POSIX standard not C standard.
l
orL
forlong
, such as123l
,3.14L
f
forfloat
, such as2.718f
A struct
is a type consisting of a sequence of members whose storage
is allocated in order which the members were defined.
struct optional_name { declaration_list; };
struct name;
Initialization, sizeof
and === operator ignore the flexible array
member.
Run example
./configure --has-lang
make clean lang_struct_test
There may be unnamed padding between any two members of a struct or after the last member, but not before the first member. The size of a struct is at least as large as the sum of the sizes of its members.
extern int a[]; /* the type of a is incomplete */
char a[4]; /* the type of a is now complete */
struct node {
struct node *next; /* struct node is incomplete type at this point */
} /* struct node is now complete at this point */
A union is a type consisting of a sequence of members whose storage overlaps.
union optional_name { declaration_list; };
union name;
All C types be represented as binary numbers in memory, the way how to interprete those numbers is what type does.
C provides the four basic arithmetic type specifiers char
, int
,
float
and double
, and the modifiers signed
, unsigned
,
short
and long
.
long
and long int
are identical. So are long long
and long long
int
. In both case, the int
is optional.
specifier | type |
---|---|
long long int | long long int |
long long | long long int |
long | long int |
An incomplete type is an object type that lacks sufficent information to determine the size of the object of that object, and an incomplete type may be completed at some point in the translation unit.
void
cannot be completed.[]
array type of unknown size, it can be completed by a later declaration that specifies the size.
typedef type_specifier declarator;
typedef type_specifier declarator1, *declarator2, (*declarator3)(void);
The typedef used to create an alias name for another types. As such,
it is often used to simplify the syntax of declaring complex data
structure consisting of struct and union types, but is just as
common in providing specific descriptive type names for integer types
of varying lengths. The C standard library and POSIX reserve the
suffix _t
, for example as in size_t
and time_t
.
#define
is a C directive which is also used to define the aliases
for various data types similar to typedef
but with the following
differences:
typedef
is limited to givien symbolic names to types only where as#define
can be used to define alias for values as well.typedef
interpretation is performed by the compiler whereas#define
statements are processed by the preprocessor.
Using typedef
to hide struct
is considered a bad idea in Linux
kernel coding style
Run typedef
example
./configure --has-lang
make clean lang_typedef_test
typeof
operator is not C standard.
Run typeof
example
./configure --has-lang
make clean lang_typeof_test
A declaration can have exactly one basic type. The basic types are argumented with derived types, can C has three of them:
function [(decl-list)] returning
: ()array [number] of
: [][const | volatile | restrict] pointer to
: ***
The array of [] and function returning () type operators have higher precedence than pointer to *.
Don’t cast the result of malloc. It is unneccessary, as void *
is
automatically and safely prompted to any other pointer type in this
case. It adds clutter to the code, casts are not very easy to read
(especially if the pointer type is long). It makes you repeat
yourself, which is generally bad. It can hide an error, if you forgot
to include <stdlib.h>
. This can crashes (or, worse, not cause a
crash until way later in some totally different part of the
code). Consider what happens if pointers and integers are differently
sized; then you’re hiding a warning by casting and might lose bits of
your returned address. Note: as of C11 implicit functions are gone
from C, and this point is no longer relevant since there’s no
automatic assumption that undeclared functions return int
.
To add further, your code needlessly repeats the type information
(int
) which can cause errors. It’s better to dereference the pointer
being used to store the return value, to lock the two together:
int*x = malloc(length * sizeof *x);
This also moves the lengh
to
theront for increased visibility, and drops the redundant
parentheses with sizeof()
; they are only needed when the argument is
a type name. Many people seem to not know or ignore this, which makes
their code more verbose. Remember: sizeof
is not a function!
While moving length to the front may increase visibility in some rare
cases, one should also pay attention that in the general case, it
should be better to write the expression as:
int *x = malloc*x * length);
Compare with malloc(sizeof *x * length * width)
vs.
malloc(length * width * sizeof *x)
the second may overflow the
length * width
when length
and width
are smaller types than
size_t
.
calloc
should zero intializes the allocated memory. Call calloc
is
not necessarily more expensive.
The C standard library is a standardized collection of header files and library routines used to implement common operations.
There has an good answer of What is the difference between C, C99, ANSI C and GNU C:
- Everything before standardization is generally called “K&R C”, after the famous book, with Dennis Ritchie, the inventor of the C language, as one of the authors. This was “the C language” from 1972-1989.
- The first C standard was released 1989 nationally in USA, by their national standard institute ANSI. This release is called C89 or ANSI-C. From 1989-1990 this was “the C language”.
- The year after, the American standard was accepted internationally and published by ISO (ISO 9899:1990). This release is called C90. Technically, it is the same standard as C89/ANSI-C. Formally, it replaced C89/ANSI-C, making them obsolete. From 1990-1999, C90 was “the C language”.
- Please note that since 1989, ANSI haven’t had anything to do with the C language. Programmers still speaking about “ANSI C” generally haven’t got a clue about what it means. ISO “owns” the C language, through the standard ISO 9899.
- In 1999, the C standard was revised, lots of things changed (ISO 9899:1999). This version of the standard is called C99. From 1999-2011, this was “the C language”. Most C compilers still follow this version.
- In 2011, the C standard was again changed (ISO 9899:2011). This version is called C11. It is currently the definition of “the C language”.
name | std | intro |
---|---|---|
assert.h | C90 | conditionally compiled macro that compare its argument to zero |
ctype.h | C90 | functions to determine the type contained in character data |
errno.h | C90 | macros reporting error conditions |
float.h | C90 | limits of float types |
limits.h | C90 | sizes of basic types |
locale.h | C90 | localization utilities |
math.h | C90 | common mathematics functions |
setjmp | C90 | nonlocal jumps |
signal.h | C90 | signal handling |
stdarg.h | C90 | variable arguments |
stddef.h | C90 | common macro definitions |
stdio.h | C90 | input/output |
stdlib.h | C90 | general utilities: memory, program, string, random, algorithms |
string.h | C90 | string handling |
time.h | C90 | time/date utilites |
iso646.h | C95 | alternative operator spellings |
wchar.h | C95 | extended multibyte and wide character |
wctype.h | C95 | functions to determine the type contained in wide character utilities |
complex.h | C99 | complex number arithmetic |
fenv.h | C99 | floating-point environment |
inttypes.h | C99 | format conversion of integer types |
stdbool.h | C99 | macros for boolean types |
stdint.h | C99 | Fixed-width integer types |
tgmath.h | C99 | type-generic math |
stdalign.h | C11 | alignas and alignof convenience macros |
stdatomic.h | C11 | atomic types |
stdnoreturn.h | C11 | noreturn convenience macros |
threads.h | C11 | thread library |
uchar.h | C11 | UTF-16/32 character utilities |
- History of C
- Basic concepts
- Preprocessor Output
- Clockwise/Spiral Rule
- C: Scope, Duration & Linkage
- Pointers
- What is array decaying?
- printf size_t
- Steve Friedl’s Unixwiz.net Tech Tips: Reading C type declarations
- cdecl
- wikibooks: C Programming/Standard libraries
- wikipedia: C11 (C standard revision)
- wikipedia: C99
- wikipedia: C data types
- wikipedia: Linkage
- wikipedia: Maximal munch
- wikipedia: Pointer aliasing
- wikipedia: Translation unit
- wikipedia: typedef
- http parser
- How to use assertions in C
- Beyond errno Error Handling in C
- What should main() return in C and C++?
- Why should we typedef a struct so often in C?
- What is the difference between ‘asm’, ‘__asm’ and ‘__asm__’?
- The Development of the C Lanuage
- Linux kernel coding style
- Kenneth A.Reek: Pointers on C
While memory stores the program and data, the Central Processing Unit does all the work. The CPU has two parts: registers and Arithmetic Logic Unit(ALU). The ALU performs the actual computations such as addtion and multiplication along with comparison and other logical operations.
Load instructions read bytes into register. The source may be a constant value, another register, or a location in memory.
;; load the constant 23 into register 4
R4 = 23
;; copy the contents of register 2 into register 3
R3 = R2
;; load char (one byte) starting at memory address 244 into register 6
R6 = .1 M[244]
;; load R5 with the word whose memory address is in R1
R5 = M[R1]
;; load the word that begins 8 bytes after the address in R1.
;; this is known as constant offset mode and is about the fanciest
;; addressing mode a RISC processor will support
R4 = M[R1+8]
Store instructions are basically the reverse of load instructions: they move values from registers back out to memory.
;; store the constant number 37 into the word beginning at 400
M[400] = 37
;; store the value in R6 into the word whose address is in R1
M[R1] = R6
;; store lower half-word from R2 into 2 bytes starting at address 1024
M[1024] = .2 R2
;; store R7 into the word whose address is 12 more than the address in R1
M[R1+12] = R7
;; add 6 to R3 and store the result in R1
R1 = 6 + R3
;; subtract R3 from R2 and store the result in R1
R1 = R2 - R3
By default, the CPU fetches and executes instructions from memory in
order, working from low memory to high. Branch instructions alter this
order. Branch instructions test a condition and possibly change which
instruction should be executed next by changing the value of the PC
register. The operands in the test of a branch statement must be in
registers or constant values. Branches are used to implement control
structures like if
as well as loops like for
and while
.
;; begin executing at address 344 if R1 equals 0
BEQ R1, 0, 344
;; begin executing at address 8 past current instruction if R2 less than R3
BLT R2, R3, PC+8
;; The full set of branch variants:
BLT ... ;; branch if first argument is less than second
BLE ... ;; less than or equal
BGT ... ;; greater than
BGE ... ;; greater than or equal
BEQ ... ;; equal
BNE ... ;; not equal
;; unconditional jump that has no test, but just immediately
;; diverts execution to new address
;; begin executing at address 2000 unconditionally: like a goto
JMP 2000
;; begin executing at address 12 before current instruction
JMP PC-12
The types char
, short
, int
, and long
are all in the same
family, and use the same binary polynomial representation. C allows
you to freely assign between these types.
- broaden: When assigning from a smaller-sized type to a larger, there is no problem. All of the source bytes are copied and the remaining upper bytes in the destination are filled using what is called sign extension – the sign bit is extended across the extra bytes.
- narrow: Only copy the lower bytes and ignores the upper bytes.
Remember a floating point 1.0 has a completely different arrangement of bits than the integer 1 and instruction are required to do those conversions.
;; take bits in R2 that represent integer, convert to float, store in R1
R1 = ItoF R2
;; take bits in R4, convert from float to int, and store back in same Note
;; that converting in this direction loses information, the fractional
;; component is truncated and lost
R4 = FtoI R3
A typecast is a compile-time entity that instructs the compiler to treat an expression differently than its declared type when generating code for that expression.
- casting a pointer from one type to another could change the offset was multiplied for pointer arithmetic or how many bytes were copied on a pointer dereference.
- some typecasts are actually type conversions. A type conversion is required when the data needs to be converted from one representation to another, such as when changing an integer to floating point representation or vice versa.
- most often, a cast does affect the generated code, since the compiler will be treating the expression as a different type.
int i;
((struct binky *)i)->b = 'A';
What does this code actually do at runtime? Why would your ever want to do such a thing? The typecast is one of the reasons C is a fundamentatlly unsafe launguage.
16-bits | Size (bytes) | Size (bits) |
---|---|---|
Word | 2 | 16 |
Doubleword | 4 | 32 |
Quadword | 8 | 64 |
Paragraph | 16 | 128 |
Kilobyte | 1024 | 8192 |
Megabyte | 1,048,576 | 8388608 |
In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized piece of data handled as a unit by the instruction set or the hardware of the processor. The number of bits in a word is an important characteristic of any specific processor design or computer architecture.
pushq <address-of-after-callq>
jmp <address-of-$rsp>
cmp dst src
perfomans a substraction but does not store result. Such
as sub dst src
.
cmp dst, src | CF | PF | AF | ZF | SF | OF |
---|---|---|---|---|---|---|
unsigned src < unsigned dst | 1 | |||||
parity of LSB is even | 1 | |||||
carry in the low nibble of (src-dst) | 1 | |||||
0, (i.e src == dst) | 1 | |||||
if MSB of (src-dst) == 1 | 1 | |||||
sign bit of src != sign bit of (src-dst) | 1 |
Jump | Description | signed-ness | Flags |
---|---|---|---|
je | jump if equal | ZF = 1 | |
jg | jump if greater | signed | ZF = 0 and SF = OF |
jge | jump if greater or equal | signed | SF = OF |
jl | jump if less | signed | SF != OF |
jle | jump if less or equal | signed | ZF = 1 or SF != OF |
RFLAGS Register
Bit(s) | Label | Description |
---|---|---|
0 | CF | Carry Flag |
1 | 1 | Reserved |
2 | PF | Parity Flag, set if LSB contains 1 is even bits |
3 | 0 | Reserved |
4 | AF | Auxiliary Carry Flag |
5 | 0 | Reserved |
6 | ZF | Zero Flag, set if result is zero |
7 | SF | Sign Flag, set MSB of result |
8 | TF | Trap Flag |
9 | IF | Interrupt Enable Flag |
10 | DF | Direction Flag |
11 | OF | Overflow Flag |
12-13 | IOPL | I/O Privilege Level |
14 | NT | Nested Task |
15 | 0 | Reserved |
16 | RF | Resume Flag |
17 | VM | Virtual-8086 Mode |
18 | AC | Alignment Check / Access Control |
19 | VIF | Virtual Interrupt Flag |
20 | VIP | Virtual Interrupt Pending |
21 | ID | ID Flag |
22-63 | 0 | Reserved |
- OS Dev
- Using Assembly Language in Linux
- Yale: x86 Assembly Guide
- Virginia: x86 Assembly Guide
- CPU Registers x86-64
- Introduction to x64 Assembly
- AMD64 Architecutre Programmer’s Manual
- C Function Call Conventions and the Stack
- Harvard CS 61-2019
- System V ABI Calling Conventions
Run the examples under src/memory
.
./configure --has-memory
make clean test
The smallest unit of memory is the bit.
A bit can be in one of two states: on
vs. off
,
or alternately, 1
vs. 0
.
Most computers don’t work with bits individually, but instead group eight bits together to form a byte. Eash byte maintains one eight-bit pattern. A group of N bits can be arranged in 2^N different patterns.
Strictly speaking, a program can interpret a bit pattern any way it chooses.
The byte is sometimes defined as the smallest addressable unit of memory. Most computers also support reading and writting larger units of memory: 2 bytes half-words (sometimes known as a short word) and 4 byte word.
Most computers restrict half-word and word accesses to be aligned: a half-word must start at an even address and a word must start at an address that is a multiple of 4.
Logical shift always fill discarded bits with 0s while arithmetic shift fills it with 0s only for left shift, but for right shift it copies the Most Significant Bit thereby preserving the sign of the operand.
Left shift on unsigned integers, x << y
- shift bit-vector
x
byy
positions - throw away extra bits on left
- fill with 0s on right
Right shift on unsigned integers, x >> y
- shift bit-vector
x
right byy
positions - throw away extra bits on right
- fill with 0s on left
Left shift, x << y
- equivalent to multiplying by 2^y
- if resulting value fits, no 1s are lost
Right shift, x >> y
- logical shift for unsigned values, fill with 0s on left
- arithmetic shift for signed values
- replicate most significant bit on left
- maintains sign of
x
- equivalent to
floor(2^y)
- correct rounding towards 0 requires some care with signed numbers.
(unsigned)x >> y | ~(~0u >> y)
The ASCII code defines 128 characters and a mapping of those characters onto the numbers 0..127. The letter ‘A’ is assigned 65 in the ASCII table. Expressed in binary, that’s 2^6 + 2^0 (64 + 1). All standard ASCII characters have zero in the uppermost bit (the most significant bit) since they only span the range 0..127.
2 bytes or 16 bits. 16 bits provide 2^16 = 65536 patterns. This number is known as 64k, where 1k of something is 2^10 = 1024. For non-negative numbers these patterns map to the numbers 0..65535. Systems that are big-endian store the most-significant byte at the lower address. A litter-endian (Intel x86) system arranges the bytes in the opposite order. This means when exchanging data through files or over a network between different endian machines, there is often a substantial amount of byte-swapping required to rearrange the data.
4 bytes or 32 bits. 32 bits provide 2^32 = 4294967296 patterns. 4 bytes is the contemporary default size for an integer. Also known as a word.
4,8, or 16 bytes. Almost all computers use the standard IEEE-754 representation for floating point numbers that is a system much more complex than the scheme for integers. The important thing to note is that the bit pattern for the floating point number 1.0 is not the same as the pattern for integer 1. IEEE floats are in a form of scientific notation. A 4-byte float uses 23 bits for the mantissa, 8 bits for the exponent, and 1 bit for the sign. Some processors have a special hardware Floating Point Unit, FPU, that substantially speeds up floating point operations. With separate integer and floating point processing units, it is often possible that an integer and a floating point computation can proceed in parallel to an extent. The exponent field contains 127 plus the true exponent for sigle-precision, or 1023 plus the true exponent for double precision. The first bit of the mantissa is typically assumed to be 1._f_, where f is the field of fraction bits.
sign | exponent | mantissa | |
(base 2 + 127) | (base 2, 1/2, 1/4…) | ||
(base 2 + 1023) | |||
---|---|---|---|
signle precision | 1 [31] | 8 [30-23] | 23 [22-00] |
double precision | 1 [63] | 11 [62-52] | 52 [51-00] |
The size of a record is equal to at least the sum of the size of its component fields. The record is laid out by allocating the components sequentially in a contiguous block, working from low memory to high. Sometimes a compiler will add invisible pad fields in a record to comply with processor alignment rectrictions.
The size of an array is at least equal to the size of each element multiplied by the number of components. The elements in the array are laid out consecutively starting with the first element and working from low memory to high. Given the base address of the array, the compiler can generate constant-time code to figure the address of any element. As with records, there may be pad bytes added to the size of each element to comply with alignment retrictions.
A pointer is an address. The size of the pointer depends on the range of addresses on the machine. Currently almost all machines use 4 bytes to store an address, creating a 4GB addressable range. There is actually very little distinction between a pointer and a 4 byte unsigned integer. They both just store integers– the difference is in whether the number is interpreted as a number or as an address.
Machine instructions themselves are also encoded using bit patterns, most often using the same 4-byte native word size. The different bits in the instruction encoding indicate things such as what type of instruction it is (load, store, multiply, etc) and registers involved.
We use the term pointee for the thing that the pointer points to, and we stick to the basic properties of the pointer/pointee relationship which are true in all languages.
Allocating a pointer and allocating a pointee for it to point to are two separate steps. You can think of the pointer/pointee structure are operating at two levles. Both the levels must be setup for things to work.
The dereference operation starts at the pointer and follows its arrow over to access its pointee. The goal may be to look at the pointee state or to change the state.
The dereference operation on a pointer only works if the pointer has a pointee: the pointee must be allocated and the pointer must be set to point to it.
Pointer assignment between two pointers makes them point to the same pointee. Pointer assignment does not touch the pointees. It just changes one pointer to have the same refrence as another pointer. After pointer assignment, the two pointers are said to be sharing the pointee.
A C array is formed by laying out all the elements contiguously in memory from low to high. The array as a whole is referred to by the address of the first element.
The programmer can refer to elements in the array with the simple []
syntax
such as intArray[1]
. This scheme works by combing the base address of
the array with the simple arithmetic.
Each element takes up a fixed number of bytes known at compile-time.
So address of the nth element in the array (0-based indexing) will be
at an offset of (n * element_size)
bytes from the base address of the whole
array.
The square bracket syntax []
deals with this address arithmetic for you, but
it’s useful to know what it’s doing. The []
multiplies the integer index by
the element size, adds the resulting offset to the array base address, and finally
deferences the resulting pointer to get to the desired element.
a[3] == *(a + 3);
a+3 == &a[3];
a[b] == b[a];
The C standard defines the []
operator as follows:
a[b] => *(a+b)
, and b[a] => *(b+a) => *(a+b)
, so a[b] =
b[a]=.
In a closely related piece of syntax, adding an integer to a pointer
does the same offset computation, but leaves the result as a pointer.
The square bracket syntax dereferences that pointer to access
the nth element while the +
syntax just computes the pointer
to the nth element.
Any []
expression can be written with the +
syntax instead. We just need
to add in the pointer dereference. For most purposes, it’s easiest
and most readable to use the []
syntax. Every once in a
while the +
is convenient if you needed a pointer to the element
instread of the element itself.
If p
is a pointer to an element in an array, then (p+1)
points to the
next element in the array. Code can exploit this using the construct p++
to
step a pointer over the elements in an array. It doesn’t help readability any.
Both []
and ++
implicitly use the compile time type of the pointer to
compute the element size which effects the offset arithmetic.
int *p;
p = p + 12; /* p + (12 * sizeof(int)) */
p = (int*) ((char*)p + 12); /* add 12 sizeof(char) */
Each int
takes 4 bytes, so at runtime the code will effectively
increment the address in p
by 48. The compiler figures all this out
based on the type of the pointer.
What is sizeof(void)
? Unknown! Some compilers assume that it should be
treat it like a (char*)
, but if you were to depend on this you would be
creating non-portable code.
Note that you do not need to cast the result back to (void*)
, a (void*)
is
the universal recipient of pinter type and can be freely assigned
any type of pointer.
One effect of the C array scheme is that the compiler does not meaningfully distinguish between arrays and pointers.
One subtle distinction between an array and a pointer,
is that the pointer which represents the base address of an array
cannot be changed in the code. Technically, the array base
address is a const
pointer. The constraint applies to
the name of the array where it is declared in the code.
Since arrays are just contiguous areas of bytes, you can allocate your
own arrays in the heap using malloc
. And you can change the size of
the malloc=ed array at will at run time using =realloc
.
Row-major order, so load a[0][0]
would potentially load a[0][1]
,
but load a[1][0]
would generate a second cache fault.
Writing a generic container in pure C is hard, and it’s hard for two reasons:
The language doesn’t offer any real support for encapsulation or information hiding. That means that the data structures expose information about internal representation right there in the interface file for everyone to see and manipulate. The best we can do is document that the data structure should be treated as an abstract data type, and the client shouldn’t directly manage the fields. Instead, he should just rely on the fuctions provided to manage the internals for him.
C doesn’t allow data types to be passed as parameters. That means a generic
container needs to manually manage memory in terms of the client element size,
not client data type. This translates to a bunch of malloc
, realloc
,
free
, memcpy
, and memmove
calls involving void*
.
Endianness refers to the sequential order used to numerically interpret a range of bytes in computer memory as larger, composed word value. It also describes the order of byte transmission over a **digital link**.
However, if you have a 32-bit register storing a 32-bit value, it makes no to talk about endianness. The righmost bit is the least significant bit, and the leftmost bit is the most significant bit.
The little-endian system has the property that the same value can be read from memory at different lengths without using different addresses. For example, a 32-bit memory location with content 4A 00 00 00 can be read at the same address as either 8-bit (value = 4A), 16-bit (004A), 24-bit (00004A), or 32-bit (0000004A), all of which retain the same numeric value.
Some CPU instruction sets provide native support for endian swapping, such as bswap (x86 and later), and rev (ARMv6 and later).
Unicode text can optionally start with a byte order mark (BOM) to
signal the endianness of the file or stream. Its code point is U+FEFF.
In UTF-32 for example, a big-endian file should start with 00 00 FE FF
;
a little endian should start with FF FE 00 00
.
Endianness doesn’t apply to everything. If you do bitwise or bitshift operations on an int you don’t notice the endianness.
TCP/IP are defined to be big-endian. The multi-byte integer representation used by the TCP/IP protocols is sometimes called network byte order.
In <arpa/inet.h>
:
htons()
reorder the bytes of a 16-bit unsigned value from processor order to network order, the macro name can be read as “host to network short.”htonl()
reorder the bytes of a 32-bit unsigned value from processor order to network order, the macro name can be read as “host to network long.”ntohs()
reorder the bytes of a 16-bit unsigned value from network order to processor order, the macro name can be read as “network to host short.”ntohl()
reorder the bytes of a 32-bit unsigned value from network order to processor order. The macro name can be read as “network to host long
hexdump
on Unix-like system
The only thing that C must care about is the type of the object which a pointer addresses. Each pointer type is derived from another type, its base type, and each such derived type is a distinct new type.
- Pointer Basics
- How Endianness Effects Bitfield Packing
- Arrays
- IEEE Standard 754 Floating Point Numbers
- The Lost Art of C Structure Packing
- Understanding Big and Little Endian Byte Order
- Clang: Address Sanitizer
- Arithmetic shift
- Endianness
- Logical shift
- Programming Paradigms
- The Ins and Outs of C Arrays
- Structure padding and packing
- Do I cast the result of malloc
- Are the shift operators arithmetic or logical in C?
- Big and Little Endian
- Optimizing Memcpy improves speed
- Writing endian-independent code in C
- Linux
ll /sys/devices/system/cpu/cpu0/cache/
cat /sys/devices/system/cpu/cpu0/cache/cherency_line_size
- Windows
wmic cpu list
wmic cpu get
wmic cpu get L2CacheSize, L2CacheSpeed
time ls /tmp
# ...
# ls -G /tmp 0.00s user 0.00s system 73% cpu 0.003 total
real
refers to actual elapsed time, user
and sys
refer to CPU
time used only by the process.
real
is wall clock time.user
is the amount of CPU time spent in user-mode code within the process.sys
is the amount of CPU time spent in the kernel within the process.
user+sys
is the actual all CPU time the process used.
- Executable and Linkable Format (ELF)
- The 101 of ELF files on Linux: Understanding and Analysis
- Apple: Overview of the Mach-O Executable Format
The asteriod to kill this dinosaur is still in orbit. – Lex Manual Page
- Linux Unicode programming
- The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Set
- Wikipedia: UTF-8
Streams are a portable way of reading and writing data. They provide a flexible and efficient means of I/O.
A Stream is a file or a physical device (e.g. printer or monitor) which is manipulated with a pointer to the stream.
There exists an internal C data structure, FILE
, which represents
all streams and is defined in stdio.h
.
Stream I/O is buffered: That is to say a fixed chunk is read from or written to a file via some temporary storage area (the buffer).
There are stdin
, stdout
, and stderr
predefined streams.
>
: redirectstdout
to a file;<
: redirectstdin
from a file to a program;|
: putsstdout
from one program tostdin
of another.
All stdio.h
functions for reading from FILE
may exhibit either
buffered or unbuffered behavior, and either echoing or
non-echoing behavior.
The standard library function setvbuf
can be used to enable or
disable buffering of IO by the C library. There are three possible
modes: block buffered, line_buffered, and unbuffered.
Buffered output streams will accumulate write result into immediate
buffer, sending it to the OS file system only when enough data has
accumulated (or flush()
is requested).
C RTL buffers, OS buffers, Disk buffers.
The function fflush()
forces a write of all buffered data for the
given output or update stream via the stream’s underlying write
function. The open status of the steam is unaffected.
The function fpurge()
erases any input or output buffered in the
given steam. For output streams this discards any unwritten output.
For input streams this discards any input read from the underlying
object but not yet obtained via getc()
; this includes any text
pushed back via ungetc()
Unbuffered output has nothing to do with ensuring your data reaches
the disk, that functionality is provided by flush()
, and works on
both buffered and unbuffered steams. Unbuffered IO writes don’t
gurantee the data has reached the physical disk.
close()
will call flush()
.
The open
system call is used for opening an unbuffered file.
Terminals, keyboards, and printers deal with character data. When you
want to write a number like 1234
to the screen, it must be converted
to four characters {'1', '2', '3', '4'}
and written. Similarly, when
you read a number from the keyboard, the data must be converted from
characters to integers. This is done by the sscanf
routine.
Binary files require no conversion. They also generally take up less space than ASCII files. The drawback is that they cannot be directly printed on a terminal or printer.
simple.c
using getaddrinfo()
API call to query name.
query.c
using domain name protocol to query name directly without -lresolv
library.
getaddrinfo()
is a POSIX.1g extension and is not available in pure C99,
on Linux, so We need -D_GNU_SOURCE
if -std=c99
be specified (see
c99 does not define getaddrinfo).
socklen_t
represents the size of an address structure, see Linus Torvalds talk about socklen_t.
- RFC 1034: DOMAIN NAMES - CONCEPTS AND FACILITIES
- RFC 1035: DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION
- RFC 1536: Common DNS Implementation Errors and Suggested Fixes
- Sockets Tutorial
- RFC 26116: HTTP Response
In POSIX-Extended regular expressions, all characters match themselves
except for the following special characters: .[{}()\*+?|^$
Run example in browser:
// directly call, shorten version
Module._sum(10, 0);
// ccall
Module.ccall('sum', 'number', ['number', 'number'], [10, 0]);
OS | name | command line |
---|---|---|
MacOS | otool | otool -L <bin> |
Linux | objdump | objdump -p <bin> |
ldd | ldd <bin> | |
Windows | dumpbin | dumpbin -dependents <bin> |
readelf displays information about one or more ELF format object files.
This readelf program performs a similar function to objdump but it goes into more detail and it exists independently of the BFD library, so if there is a bug in BFD then readelf will not be affected.
On Darwin, there are no readelf, but we can use otool do the trick.
OS | name | command line |
---|---|---|
MacOS | otool | otool -l <bin> |
Linux | reaelf | readelf <bin> |
Windows |
pkg-config
On Unix-like platform, there are nm program can view the symbol table in a executable.
OS | name | command line |
---|---|---|
MacOS | nm | nm <bin> |
nm -m <bin> | ||
Linux | nm | nm <bin> |
OS | name | command line |
---|---|---|
MacOS | strip | nm <bin> |
Linux | strip | nm <bin> |
OS | name | command line |
---|---|---|
MacOS | otool | otool -tV <bin> |
Linux | objdump | objdump -d <bin> |
OS | name | command line |
---|---|---|
MacOS | hexdump | hexdump <file> |
Linux | hexdump | hexdump <file> |
Window | ||
Emacs | hexl-mode | |
OS | name | command line |
---|---|---|
MacOS | dtruss | dtruss <bin> |
Linux | strace | strace -o <out-file> -C <bin> |
- MacOSX:
ktrace
example | command |
---|---|
set working directory | (lldb) platform settings -w <pwd> |
(gdb) cd <pwd> | |
list env vars | (lldb) env |
(lldb) settings show target.env-vars | |
(gdb) show env | |
set env var | (lldb) env XXX=zzz |
(lldb) settings set target.env-vars XXX=aa YYY=bb | |
(gdb) set env XXX=zzz | |
unset env var | (lldb) settings remove target.env-vars XXX |
(gdb) unset env XXX | |
set argv for main entry | (lldb) r arg1 arg2 arg3 |
(lldb) settings set target.run-args arg1 arg2 | |
(gdb) r arg1 arg2 arg3 | |
(gdb) set args arg1 arg2 | |
0:000> .kill; .create <target> arg1 arg2 | |
0:000> .exepath+ <path> |
example | command |
---|---|
run process | (lldb) process launch |
(gdb) r | |
0:000> g | |
attach process with pid | (lldb) process attach --pid 123 |
(gdb) attach 123 | |
attach process with name | (lldb) process attach --name a.out |
(lldb) attach a.out | |
wait for process | (lldb) process attach --name a.out --wait-for |
(gdb) attach -waitfor a.out | |
example | command |
---|---|
list dependents of executable | (lldb) image list |
(gdb) info sharedlibrary | |
0:000> lm | |
lookup main entry address in the executable | (lldb) image lookup -a main -v |
(gdb) info symbol main | |
lookup fn or symbol by regexp | (lldb) image lookup -r -n'[fsv]printf' |
lookup type | (lldb) image lookup -t'FILE' |
add moudle | (lldb) image add /opt/local/lib/libgeo.dyld |
0:000> .reload -f -i libcffix.dll | |
unload module | (lldb) == |
0:000> .reload -u libcffix.dll | |
example | command |
---|---|
list breakpoint | (lldb) b |
(lldb) breakpoint list | |
(gdb) info break | |
0:000> bl | |
breakpoint at fn | (lldb) b main |
(lldb) b -nmain | |
(gdb) b main | |
0:000> bu <module>!main | |
breakpoint at line | (lldb) b -ftest.c -l32 |
(gdb) b test.c:32 | |
breakpoint at fn by regexp | (lldb) b -rm[a-z]in |
breakpoint at source by regexp | (lldb) b -p'm[a-z]in' -ftest.c |
conditional breakpoint | (lldb) breakpoint set -fvar.c -l23 -c'2 = argc’= |
delete breakpoint | (lldb) breakpoint delete 1.1 |
(lldb) breakpoint delete 2 | |
0:000> bc 1 2 |
example | command |
---|---|
print argv in /main entry | (lldb) p -Z`argc` -- argv |
0:000 == | |
(gdb) p -- argv[0]@argc | |
examine argv in main entry | (lldb) x -t'char*' -c`argc` argv |
0:000> dp @@(argv) | |
(gdb) == | |
examine array of char* of /argv | (lldb) x -s`sizeof(char*)` -c`argc` -fx argv |
exmaine &argc in main entry | (lldb) x -s`sizeof(int)` -fx -c1 &argc |
(gdb) x/1xw &argc | |
memory read | (lldb) memory read -o/tmp/x.out -s1 -fu -c10 – &argv[0] |
~*** Frame
example | command |
---|---|
check stack frame | (lldb) frame info |
0:000> k | |
list frame variable | (lldb) frame variable |
0:000> dv |
example | command |
---|---|
evaluate argc in main entry | (lldb) e -- argc |
(lldb) e -fx -- argc | |
0:000> ?? argc | |
0:000> .formats poi(argc) | |
example | command |
---|---|
disassemble | 0:000> u |
disassemble function | 0:000> uf main |
disassemble | (lldb) d |
disassemble function | (lldb) d -nmain |
disassemble favor | (lldb) d -Fatt |
disassemble | (gdb) disassemble |
example | command |
---|---|
quit | (lldb) q |
(gdb) q | |
0:000> -q= | |
continue | (lldb) c |
0:000> g | |
step over | (lldb) n |
0:000> p | |
step into | (lldb) s |
(gcc) s | |
0:000> t | |
example | command |
---|---|
list threads | 0:000> ~ |
- Linux:
lscpu
- Darwin:
sysctl -a | grep machdep.cpu.features