FENIX_main/notes/notes on code standards and style
2022-10-29 09:57:17 -05:00

555 lines
No EOL
23 KiB
Text

Before delving in here, do note this is largely guidelines. I'm known to not
follow my own code standards. This is largely just to help keep anyone who wants
to help out on the same page. Ultimately, if you need to break a "rule", break
it. Just make sure you know what you're breaking and why.
Intellectual property
=====================
Regarding proprietary programs:
Don't reference them. We don't need to get in trouble.
Regarding free programs:
Be careful referencing them, especially if they use a GPL-style license. I
don't need that fucking licensing nightmare getting dragged into this. If it
uses a 0BSD-style license, you can freely reference it. I just ask that you
properly credit the original. Also, try not to straight up copy it. I'm going
for original code here.
Trademarks:
Don't bother with acknowledgements. Just make sure it's written in a way to
not be mistaken for our own trademark.
Program design
==============
Which language:
C. We use C around here. It's honestly the only language I'm still comfortable
with. I'm out of practice with anything else. Though, for what we do here, C is
kinda the most suitable language we have anyways.
In particular, though, of what I'm even familiar with, I don't like
object-oriented programming, so no C++ or Java. Java's too bloated anyways.
Same for Clojure. FORTRAN's not really useful here, though I do want to have
a library for FORTRAN. And Prolog is really only useful as a proof language.
So, really, C is the only thing I want to use (except where we need asm).
Compatibility:
We do compatibility here. Current, we're working off POSIX/the Single UNIX
Specification, the ISO C standard, and the Filesystem Hierarchy Standard.
Everything implemented should *not* conflict with these standards. We can add
things here and there, where needed or where it might be fun. But anything added
should either be something not specified by the standard (like the init system)
or that is otherwise optional, e.g. a game.
Non-standard features:
Don't use them. For instance, don't use any GNU C extensions. You should be
writing standard compliant code. Specifically, we use `-std=c99`. This said,
extensions may be unavoidable in places. The packing requirements for i386
interrupt structs, for instance, seemingly necessitates __attribute__((packed)),
but that may just be a lack of understanding of C structure packing on my part.
Maybe, just maybe, that could be done without. But, yeah. Unless absolutely
necessary, don't use extensions beyond the POSIX extensions.
More on C standards:
We use C99. However, there are some exceptions. For instance, we never use
single-line comments (//). Instead, we always use block comments (/* */).
Other things not to do include using trigraphs (why do those even still exist?)
and writing anything in pre-standard style.
Conditional compilation:
I guess use if statements in functions where possible. Outside of functions,
of course, use #if/#ifdef/#ifndef. In general, that's what you should do.
Writing Programs
================
Standards:
We obey POSIX around here. We want to implement all stuff found in the standards
referenced above. We don't want to deviate where it matters. Where does it
matter? Well, if the standard has something to say about it, it matters. For
instance, the POSIX standard specifies that cat has the following form:
cat [-u] [file...]
Don't add shit to that. We don't need `cat -v`.
If the standard doesn't have something to say, then consider the following:
1. Are we considering implementation details?
2. Is this optional, insofar as it's just a program that a user might install?
In the first case, for instance, POSIX doesn't say how printf should do its
thing. It says what printf needs to support (format specifiers, etc.) and what
results it should put out, but it doesn't specify the internals. It's a black
box. Here's what goes in, here's what goes out. What happens in the middle?
That's up to us to decide. Anything like that is fair game. POSIX and SUS don't
specify the initialization process/sequence for a UNIX system. Thus, how we
actually start up the system, our init program, is completely up to us.
In the second case, just make sure it's optional. Don't make it a key part of
the system. Like, if we were to add a COBOL compiler, don't make it important.
You should be able to remove the COBOL compiler from the system without it
breaking things. It should basically function like a software package that the
user'd install themselves. (In fact, you may just want to make it an installable
software package.) Do keep in mind for this kind of stuff, if there's a
standard, you should probably follow it. Like, if we do a COBOL compiler, follow
the COBOL standards. Please.
Robustness:
In general, arbitrary limits should be avoided. In general, try to accommodate,
e.g., long file names. It's okay to use, say, limits.h to get a limit for
something like path name lengths, but try to not do something like that, to have
a cap on that kinda stuff.
Keep all non-printing characters found in files. Try to support different
locales and character encodings, like UTF-8. There should be functions to help
you deal with that stuff.
Check for errors when making calls, especially system calls. In cases of errors,
print a fucking error message. Nothing is more frustrating for an end user than
having something fail and not being told why. (This was frustrating for me
during my first attempt at moving FENIX's kernel to the higher half. grub-file
tells you whether it's a valid multiboot header but not why it fails if it
isn't. I still have no clue why it wasn't working.)
Check memory allocation calls (malloc, calloc, realloc) for NULL return. Don't
just assume it allocated successfully. Be careful with realloc. Don't just
assume it's going to keep stuff in place.
If you call free, don't try to reference that memory again. Assume that the
moment you free'd it, something else immediately snapped that memory up.
If a malloc/calloc/realloc fails, that should be a fatal error. The only
exception should be in the login shell and related important system processes,
like TTY. If those have failed calls, that needs to be dealt with somehow.
After all, these are important bits. So, I don't know quite how to deal with
that, but definitely keep that in mind.
When declaring variables, most should be immediately initialized. For instance,
if you create a variable for a flag, it should be initialized to a sane default.
For example, if you want a flag for "hey, use this option", it should be
initialized to whatever the default should be (off or on). This is especially
important for static variables, which may be referenced by a different call
without it being initialized, which would be bad!
In the case of impossible conditions, print something to tell the user that
the program's hit a bug. Sure, someone's going to have to go in with the source
and a debugger and find that bug, but at least let the user know that a bug's
occurred. That way, the program doesn't just stop and they have no idea why.
You don't need to give a detailed explanation (though you may want to give some
info that might be useful to someone who wants to debug), but at least give
them something like "Oops, we've hit a bug! Killing <program>..." so they know
that what they tried to do led to a bug.
For exit values, don't just use error counts. Different errors should have
different error values. In general, the POSIX standard will give you an idea
of what error values matter, and you can define others as they're needed.
For temporary files, you should generally put them in /tmp, but probably have
some way for the user to tell the program where they might actually want temp.
files to go. Check TMPDIR, maybe? (By the way, be careful with this stuff. You
might hit security issues.)
Library behaviour:
Try to make functions reentrant. In general, if a function needs to be
thread-safe, it'll be reentrant, but even if it doesn't, it's not the worst
idea to make it reentrant. Even dynamic memory, though if it can't be avoided,
then so be it.
Also, some naming convention for you:
- If it's a POSIX function/macro/whatever, use that name
- Structs should be struct posix_name, not just posix_name
- If it's a function/whatever for making something work, start it with a _
- For instance _do_thing()
Basically, if it's not supposed to be user-facing, start it with _. Otherwise,
use a normal name, probably with one given in the standard.
Formatting error messages:
In compilers, use give error messages like so:
<sourcefile>:<lineno>: <message>
Line numbers start from 1.
If it's a program, tell give an error message that tells the user the program
and the issue. You'll probably want to also include a PID. For example:
<program> (<PID>): <message>
You might be able to get away with omitting the program and PID.
Interface standards:
Don't make behaviour specific to the name used. Even if we were to add GNU awk
compatibility to our awk implementation, don't make whether it uses GNU awk
mode dependent on whether you call awk or gawk. Make it a flag. In general,
though, this probably won't be an issue.
Don't make behaviour dependent on device. For instance, don't make output
differ depending on whether we're using a VTTY or a proper physical TTY. Leave
the specifics to the low-level interfaces (unless, of course, you're working on
those low-level interface, in which case you should do what you need to do).
The only exception is checking whether a thing is running interactively before
printing out terminal control characters for things like color. Like, in cal,
someone may just want to redirect that into a file. It shouldn't include the
codes used to change the color of the current day. Also, if you're outputting
binary data, maybe don't just send it to stdout. Ask, probably.
Finding executable and stuff:
Start with argv[0]. If it's a path name, it should be basename(argv[0]).
That's all I have to say, really.
CLIs:
Follow POSIX guidelines for options. I've broken this next rule quite a bit,
but maybe make use of getopt() for parsing arguments, instead of what I've been
doing in manually searching for them.
We're not doing long-named options. The idea is nice, but for now, we're
sticking with the POSIX standard. Maybe we'll add 'em in one day. (I mean, I
guess we could use them in non-POSIX utilities, but still.)
Memory usage:
Try to keep it reasonably low. Obviously, we don't need to keep it that low, but
don't go using all the RAM just to print out a message.
Valgrind's not the worst tool to play with. You might not need to worry about
all the messages it gives, but in general, try to keep it quiet, unless that
would really fuck up things otherwise. In other words, if it bitches at you
about not freeing up memory before exiting, add in the necessary free()s.
File usage:
In general, /etc is where configuration for system-level stuff goes. Runtime
created files should go in /var/cache or /tmp. You can also use /var/lib.
Files may be stored in /usr, but be prepared for a read-only /usr. Don't assume
you can write to /usr. In general, for system scope files (i.e. everything
outside of /home), refer to the filesystem hierarchy.
Style and other important things about C
========================================
Formatting:
Keep lines to 80 characters or less (especially since FENIX currently only
supports 80 character lines).
The open brace for any code block should go on the line where it starts.
int main(void) {
if(test) {
do{
} while(test);
}
}
There should be a space before any open brace or after any close brace (if
something follows said close brace).
Keep any function definitions to one line. Like this:
int main(void)
not:
int
main(void)
If a function definition is longer than 80 character, you are allowed to split
it.
For function bodies, use the following standards:
No space between function/keyword and paren. Like this:
int main(void)
if(test)
for(int i = 0; i < 10; i++)
printf("Hello, World!\n");
When splitting expressions, split after the operator:
if(condition1 && condition2 &&
condition3)
Do-whiles, as hinted at above, should have the end brace and while on the same
line, like so:
do {
thing();
} while(test);
For indentation, spaces, 2 spaces specifically. And, yeah. Use your brain on
how to actually use that. (The exception to this rule is makefiles, which
require tabs and a specific indentation style. Look up more on makefiles for
that information.) I know that tabs are technically better for accessibility,
but I've already written so much code with 2-space indents, and I really don't
feel like going back and fixing every single file, and I'd like everything to
remain consistent. If you want to replace every single space indent with tabs,
let me know. I'll give you my blessing and we'll use tabs for all new code.
Until then, sorry, I guess.
Comments:
Try to start programs and headers with a description of what it is. For example,
from stdlib.h:
/*
* <stdlib.h> - standard library definitions
*
* This header is a part of the FENIX C Library and is free software.
* You can redistribute and/or modify it subject to the terms of the
* Clumsy Wolf Public License v4. For more details, see the file COPYING.
*
* The FENIX C Library is distributed WITH NO WARRANTY WHATSOEVER. See
* The CWPL for more details.
*/
In general, it should say what the file/program is, what it does, and include
that free software, no warranty header.
Try to write comments in english, but if you can't then write it in your native
language. I'd prefer you use english, since that's my native language. I also
kinda understand spanish and know a small bit of danish. But, if you can't do
english or spanish, write in what you know. Just romanize your comment. For
instance, if you're writing in japanese, please use romaji instead of normal
japanese script. Otherwise, things get weird.
Probably not the worst idea to have a comment on what functions do, but don't
feel like you *need* to add them, especially if the function has a sensible
name. Like, if they function is called do_x(), you don't need a comment that
says that it "Does X".
Please only use one space after a sentence. It'll annoy me otherwise. Otherwise,
I'm not your English teacher. You're [sic] grammar doesn't need to be perfect,
as long as i [sic] (and others) can tell what you mean.
When commenting code, try not to comment the obvious. In general, if you need
to have a comment on what a variable does, you need to re-name the variable.
If you need to comment on what a block of code is doing, consider whether it
needs to be that complicated or if you can simplify it. This isn't always true,
but in general, try to keep to that rule of thumb.
If a variable is supposed to be a command-line flag, maybe include a comment
for what option it's supposed to be (i.e. /* -b */) if the name doesn't make it
otherwise obvious (e.g. the using_unbuffered_output variable in cat.c
corresponding to the -u flag.) If your flags are those kinda octal things
(04 for -x, 02 for -y, 01 for -z), definitely include a comment on what
corresponds to what (/* 04: -x, 02: -y, 01: -z */).
Using C constructs:
Always declare the type of objects. Explicitly declare all arguments to
functions and declare the return type of the function. So, it's `int var`, not
just `var`, and it's `int main(void)`, not `main()`.
When it comes to compiler options, use -Wall. Try to get rid of any warnings you
can, but if you can't, don't fret about it.
Be careful with linting tools. In general, don't bother with them, unless you
think it'll help with a bug. Don't just run linting tools, though. Like, you
generally don't need to bother casting malloc(). If your tool is telling you
that you need to, you can probably ignore it.
extern declarations should go at the start of a file or in a header. Don't put
externs inside function if you can avoid it.
Declare as many variables as you need. Sure, you can just carefully reuse the
same variable for different things, but it's probably better to just make
another variable. (The exception to this rule is counters like i, j, etc,
unless you specifically need to preserve the value of one of them.)
Avoid shadowing. If you declare a global variable, don't declare local variables
with the same name.
Declarations should either be on the same line or in multiple declarations.
So, do this:
int foo, bar;
Or this:
int foo;
int bar;
Not this:
int foo,
bar;
In if-else statements, *always use brackets*. Please. It makes it much clearer
as to what belongs to what. It'll keep you from ending up in a situation where
you've accidentally got a function call outside of an if-statement that should
actually be inside it. Also, single line else if.
Typedef your structs in the declaration. Basically, your structure declarations
should probably look like this:
typedef struct _f_foo {
/* Stuff goes here */
} foo;
Names:
Don't be terse in your naming. Give it a descriptive english name. Like, name
it `do_ending_newline`, not just `d`.
If it's only used shortly for an obvious purpose, you can ignore this. Like,
you can continue with for(int i = 0...). You don't need to name that variable
something else like counter. We're programmers. We know what i is used for.
Try not to use too many abbreviations in names. You can if you need to, but
you should try not to.
Use underscores to separate words in identifiers (snake_case), not CamelCase.
For names with constant values, you can probably decide on your own whether it's
better to use an enum or #define. If you use a #define, the name should be in
all uppercase ("DEFINE_CONST"). Enums should use lowercase ("enum_const").
For file names, I guess try to keep them short (14 characters), but don't feel
like you need to. It's not the worst idea for working with older UNIXes. Just
don't feel the need to be compatible with DOS 8.3 filenames. We don't care about
Microsoft's shit. Just other UNIXes.
Portability:
FENIX doesn't necessarily need to be portable, but should be portable purely
as a result of being completely to standard. So, our programs should be able to
run on any system that is standards compliant.
In general, you should use features in the standard. Don't try to write an
interface in a program if you can just use an interface in POSIX. Again, if
you're doing kernel or libc dev, you can kinda fudge this, but for util dev,
definitely use the standard interfaces.
Don't worry about supporting non-UNIX systems. Windows? Don't worry about it.
If you can do something using pure ISO C, then it doesn't hurt, but don't
overcomplicate it if you can just use a POSIX function instead.
Porting between CPU:
For a start, we're not worried about anything not 32-bit or higher. 16-bit?
Not a concern. In general, though, anything architecture dependent should be
in an arch dir. So, the actual low-level kernel code? arch/i386 (or whatever
arch you're writing for).
In general, assume that your types will be as defined in limits.h.
Don't assume endianness. Be careful about that stuff.
Calling system functions:
Just call the POSIX functions. Use standard interfaces to stuff.
Internationalization:
Um. Not sure how to handle this right now. If you want to port into another
language, get in touch.
Character set:
Try to stick to ASCII. (This is why I asked for romaji earlier.)
If you need non-ASCII, use UTF-8. Try to avoid this if possible.
Quote characters:
Be careful. Quotes are 0x22 ('"') and 0x27 ('''). Don't let your computer fuck
that stuff up.
If you're internationalizing, though, use the right quotes for stuff. Like, in
French, you'd want «».
mmap:
Don't assume it works for everything or fails for everything.
Try it on a file you want to use. If it fails, use read() and write().
Documentation
=============
Man pages:
The primary method of documenting stuff for FENIX is in the man pages. All else
is secondary. Man pages should have a fairly standard format of:
Title
Synopsis
Description
Options
Examples
Author
Copyright
The title is fairly basic. Name the thing and give a short description. For
instance: head - copy the first part of files.
The synopsis is just a listing of how the program works. So, for head, it's
head [-n number] file ...
Any optional arguments should be in square brackets. Mutually exclusive options
should be in the same brackets separated by pipes ([-s|-v]). Options that don't
take a further parameter (unlike -n in the above example) can be grouped
togethers if the software can take them that way ([-elm] for you can do -e, -l
and/or -m, -el, -lm or -em, or -elm). Variable type stuff, like file or number
should be """italicized""" (/fIvariable/fR). If you want to note that you can
repeat the last argument, e.g. take multiple files, use ellipses.
Description should give a full description of how it works. Tell what it does,
note any oddities or behaviour to be noted, and give a quick rundown of how
options change things.
Options should list each option one by one and explain it in detail.
Examples should give at least one example of how to use it and what the example
would do.
The author should be whoever wrote the program. List the original author(s).
So, you may notice plenty of utils have the author as "Kat", since I, Kat, am
the one who wrote them.
Copyright should contain the following:
Copyright (C) 2019 The FENIX Project
This software is free software. Feel free to modify it
and/or pass it around.
A lot of older man pages will list the copyright as:
Copyright (C) 2019 Katlynn Richey
Not the worst idea to update that if you see it.
You might also want to include the package of a thing, if it's got one. For
example, utilities include:
This <util> implementation is a part of the fenutils package.
Physical manual:
The secondary documentation is the manual written in roff. Practically, it
documents the same kinda things. Name, synopsis, options, etc. It also includes
some other stuff, like See Also, for related stuff; Diagnostics, for what all
errors it can produce; and Bugs, for any bugs in a program. Additionally, the
author field is different, and the details for that, along with other bits,
can be found in the Intro.
License for the manuals:
The manuals are tentatively under CC-BY. Maybe I'll write my own license, like
with the CWPL. For now, though, it's CC-BY.
Credits:
In man pages (including physical), include the original author of the program,
not the author of the man page. (Generally, you should be writing the man
pages yourself anyways.)
On the title page of the physical manual, all the folx behind the project should
be named. For now, this is fairly small. If it gets too big, we'll give them
all credit within the first few pages and have the author on the manual as
"The FENIX Project Manual Team".
On other manuals:
Don't copy other people's manuals wholesale. Try to write it yourself. The
exception is in working with the POSIX standard. Don't copy that wholesale, but
you can base your man page on the relevant POSIX page. Just know that the
POSIX page probably won't give you a complete man page!
Releases
========
I'll worry about that later.