fediglam/CONTRIBUTING.md

315 lines
11 KiB
Markdown
Raw Normal View History

2022-10-01 12:10:15 +00:00
# Overview
## Packages
- `main`: primary package, has application-specific functionality
- `util`: utility packages
- `sql`: SQL library
- `http`: HTTP Server
## Code Standards
- We currently target zig 0.10
* This is not currently released!
* You need a recent-ish build of zig from git
* When the stable release of 0.10 drops, we will pin to that and will
only move up when a newer version comes out
- Follow [style guide](https://ziglang.org/documentation/0.9.1/#Style-Guide)
* Except where it doesn't make sense to.
2022-10-02 05:18:39 +00:00
- Line length: Try to keep it below 100 columns
2022-10-01 12:10:15 +00:00
* Except where it would harm other aspects. Examples:
- Keep full, copy-pastable URL's on a single line. Always. This lets
you click on the url in your editor (if it supports it)
- Keep error messages on a single line when possible, to promote greppability
- Whenever possible, limit the scope of variables. Don't be afraid of subscopes
- Prefer to pass by value when you're not conceptually modifying anything.
* The compiler is smart enough to turn it into a `*const T` when needed
- Also prefer to return by value (take advantage of [Result Location Semantics](https://github.com/ziglang/zig/issues/287!)).
- **RUN `zig fmt` BEFORE SUBMITTING A PR, IDEALLY ON EVERY SAVE**
- Unit tests should go in the same directory, if not the same file, as the
code they are testing.
- Integration tests should go in `integration_tests/`
- When making changes to DB queries, please test them against both SQLite and Postgres
* Queries should, whenever possible, run on *both* database engines.
* If you absolutely need to do something different, switch on `db.engineType()`
- Document who owns memory/pointers/slices when they are created.
Read zig [zen](https://ziglang.org/documentation/0.9.1/#Zen)
### Package Standards
As much as possible, packages (aside from main) should avoid depending
on other packages aside from `std` (which is linked automatically) and `util`.
This makes `build.zig` much simpler and avoids the problem of tangled dependency graphs.
Within a package, you should also avoid importing using relative upwards paths
(so, avoid `@import("../../my_thing.zig")`). This is done both for simplicity's sake
and because it makes refactoring less annoying. It's the intra-package thing of
avoiding tangled dependency graphs.
## Package Overview
### `main` package
* TODO: consider moving controllers and api into different packages
* `controllers/**.zig`:
- Transforms HTTP to/from API calls
- Turns error codes into HTTP statuses
* `api.zig`:
- Makes sure API call is allowed with the given user/host context
- Transforms API models into display models
- `api/**.zig`: Performs action associated with API call
* Transforms DB models into API models
* Data validation
- TODO: the distinction between what goes in `api.zig` and in its submodules is gross. Refactor?
* `migrations.zig`:
- Defines database migrations to apply
- Should be ran on startup
### `util` package
#### Components:
- `Uuid`: UUID utils (random uuid generation, equality, parsing, printing)
* `Uuid.eql`
* `Uuid.randV4`
* UUID's are serialized to their string representation for JSON, db
- `PathIter`: Path segment iterator
- `Url`: URL utils (parsing)
- `ciutf8`: case-insensitive UTF-8 (TODO: Scrap this, replace with ICU library)
- `DateTime`: Time utils
* TODO add a TimeSpan or Duration type
- `deepClone(alloc, orig)`/`deepFree(alloc, to_free)`
* Utils for cloning and freeing basic data structs
* Clones/frees any strings/sub structs within the value
### `sql` package
Currently supports 2 SQL engines:
- Postgres (Meant for production use)
- SQLite (Meant for development use)
TODO: Allow compiling the program without support for one or the other engine using build options
There's nothing stopping you from using Postgres in dev and SQLite in prod,
but the SQLite backend is more prone to panic if an error happens instead
of returning proper errors.
#### Usage Example
```c
// open transaction
// only necessary if you need transactional sematics, otherwise
// just call query methods directly on the database and use the
// implied transaction
var tx = try db.begin();
errdefer tx.rollback();
var results = try tx.query(
Tuple(&.{[]const u8}),
"SELECT username FROM account WHERE community_id = $1",
.{community_id},
allocator,
);
defer results.finish();
std.log.info("Listing users", .{});
for (try results.row(allocator)) |row| {
// Don't forget to free values if applicable!
defer allocator.free(row[0]);
// do some stuff with the username idk
std.log.info("- {s}", .{row[0]};
}
std.log.info("Done", .{});
// commit transaction
try tx.commit();
```
#### `queryRow`
If you need exactly one row, use `queryRow`. It calls finish() and returns the row for you
```c
const row = try db.queryRow(
Tuple(&.{[]const u8}),
"SELECT username FROM account WHERE id = $1",
.{user_id},
allocator,
);
defer allocator.free(row[0]);
std.log.info("Username: {s}", .{row[0]});
```
`queryRow` returns `error.NoRows` or `error.TooManyRows` if the query does not return
exactly one row. Consider adding a `LIMIT 1` clause on all queries you use with `queryRow`:
```c
const row = db.queryRow(
Tuple(&.{[]const u8}),
"SELECT username FROM account WHERE id = $1 LIMIT 1",
.{user_id},
allocator,
) catch |err| switch (err) {
// early return on error
error.NoRows => return error.UserNotFound,
else => return err,
};
defer allocator.free(row[0]);
std.log.info("Username: {s}", .{row[0]});
```
#### Avoiding SQL Injection
<b style="font-size: 32px">
DO *NOT*
<br />
UNDER *ANY* CIRCUMSTANCES
<br />
*EVER*
<br />
PUT USER-SUPPLIED TEXT DIRECTLY INTO THE QUERY TEXT
</b>
The SQL library *intentionally* does not provide an escape function.
Because you should not be using it anyways.
**Instead, use query arguments.**
```c
const row = try db.queryRow(
std.meta.Tuple(&.{[]const u8, []const u8}),
\\SELECT account.username, community.host
\\FROM account JOIN community
\\ ON account.community_id = community.id
\\WHERE account.id = $1 AND community.id = $2
\\LIMIT 1
,
// In SQL, parameters are referenced by `$N` where N refers to the
// *one*-indexed argument number. Note that this means that `args[0]`
// will be bound to `$1` and `args[1]` will be bound to `$2`.
.{user_id, community_id},
allocator,
);
std.log.info("user handle: @{s}@{s}", .{row[0], row[1]});
```
In this, the db library will perform any necessary string escapes on the
arguments before they are used in the query.
Additionally, while you cannot put *user supplied* text into the query,
you can, in exceptional cases, put *program supplied* text into the
query *conditionally* based on user input. For example, parsing the
user text into an `enum` and then putting `@tagName(val)` into the query.
This should be used judiciously, and only where necessary to avoid code
explosion (for example, in complicated query methods taking multiple
optional arguments).
In general, query text should be calculated at `comptime` whenever possible.
#### Query Argument types
Query arguments *must* be one of:
- A tuple containing the types to pass (shown above)
- The monovalue `{}` (of type `void`), signifying zero parameters
* Technically this has the same behavior as passing an empty tuple `.{}`.
But because every tuple is a different type to the compiler, this
could subvert memoization efforts, causing compilation to slow down
and potentially bloating the binary. **Use your helper methods!**
#### Unused arguments
In general, arguments *should* be used in the query. While the postgres backend
does not currently verify that all arguments are used in the query, the SQLite
backend does.
Some queries may benefit from being able to pass in arguments that are not
used in the actual query. For example, to prevent branching paths of different
argument tuple types. In this case, you can use the `queryWithOptions` method:
```c
const query_to_execute_that_probably_varies_at_runtime = "SELECT id FROM account WHERE username = $3";
var results = try db.queryWithOptions(
std.meta.Tuple(&.{[]const u8}),
query_to_execute_that_probably_varies_at_runtime,
.{unused_arg_1, unused_arg_2, username},
.{ .prep_allocator = allocator, .ignore_unused_arguments = true },
);
// use the query results
```
#### Row Result types
Query result types are somewhat more flexible than argument types. They can
be one of:
- A tuple type containing the types to retrieve (shown above)
- A struct type containing the names of the columns to retrieve:
```c
const handle = try db.queryRow(
struct { username: []const u8, host: []const u8 },
\\SELECT account.username AS username, community.host AS host
\\FROM account JOIN community
\\ ON account.community_id = community.id
\\WHERE account.id = $1 AND community.id = $2
\\LIMIT 1
,
.{user_id, community_id},
allocator,
);
std.log.info("user handle: @{s}@{s}", .{row.username, row.host}
```
Note that this may require you to rename some of your result columns using `AS`.
#### Helper Methods
##### Executable on `db` or `tx`
- `query(RowType, sql, args, allocator)`
- `queryRow(RowType, sql, args, allocator)`
- `exec(sql, args, allocator)`
* Used for queries that don't return any rows, calls finish() for you
- `insert("accounts", Account{ ... }, allocator)`
* Equivalent to exec("INSERT INTO accounts(.....) VALUES (......)", .{...}, allocator);
- `queryWithOptions(RowType, sql, args, .{ advanced options and stuff... });`
##### `db` only
- `const tx = try db.begin();`
* Will return an error if a transaction is open on this database connection
#### `tx` only
- `try tx.commit()`
- `tx.rollback()`
* Note that this *does not* return an error union, and *does not* need
to be prefixed with `try`. This is because rollback is *supposed*
to be an operation that cannot fail (except in the case of connection
losses, in which the DB should have done a `rollback` for us anyway.
If connection errors do occur, then `tx.rollback()` will log them
to the console and swallow the error value.
In all cases, you can supply a null allocator if you know for certain that
the query engine does not need to allocate a buffer to bind the arguments
or save the return data. For example, if you are executing a query that
does not have arguments.
### `http`
Implements a basic HTTP/1.1 server.
This module sucks and needs refactoring
General organization:
- `request/parser.zig` - parses incoming requests
- `server/response.zig:ResponseStream`
* Attempts to write response body in one go at the end if possible.
* If the body is too big, it uses `Transfer-Encoding: chunked` and
transfers the body piece-by-piece.
* Create one using `server/response.zig:open(...)`
- `routing.zig` - handles route matching. See usage (currently in `main.zig`) in the `main` package
* Can currently match based on HTTP method and on path.
* Path format:
- "/path/to/my/endpoint" - static path
- "/posts/:post_id/reacts/:react_id" - path with 2 arguments (post_id and react_id)
* Arguments get passed to handler functions through the RouteArgs param
* There is not currenlty a way of typing args
* This whole thing needs a refactor