You have reached the objective obſerver. I hope you will have an informative journey.
Warning: This is a Chriſtian ſite, part of the Chriſtian Programmers League. Trannies, jews and homos are not welcome here.
If you are a white Chriſtian virgin woman 30 years old or younger and ſeek a Chriſtian marriage, contact me!
Regards,
Steffen "RmbRT" Rattay
Ambaſſador, Kingdom of Heaven
rmbrt@objective.observer Odysee GitHub
Update 09/16/24
I had recently gained ſome inſight into the clock-baſed conſenſus mechaniſm I had been ſtuck on. Inſtead of trying to make a perfect objective deciſion in each node, I can ſimply accept everything locally, but add a global challenge duration for every tranſaction. If ſomeone challenges a tranſaction, the collective decides to ban either the accuſer or the accuſed via a vote (Proof of Capacity voting). Since node identities are tied to a proof of capacity, ſpam / malicious actions are time-expenſive, and therefore, voting to ban ſomeone is a rare occaſion, but enſures a coherent view of the ſyſtem among honeſt nodes.
Additionally, I have been working on a ſmall tool for creating animated pixel graphics, inſpired by this video. This technique is quite intereſting and will be uſed to make the characters and items in my game (3D terrain and buildings + 2D ſprite characters and items).
Update 03/28/24
I'm currently reworking the general appearance of the ſite. Recently, I have been quite buſy at work, and did not really get a chance to continue much on my perſonal projects.
Update 11/03/23
I have to throw away the C code generator and implement a more low-level code generator first. Directly printing the AST is meaningleſs becauſe it does not contain code like exception handling, deſtructors, code-blocks-as-expreſsions, etc. For that, I will firſt create a detailed intermediate repreſentation, with the goal that it can also be interpreted or be uſed for debugging etc. All machine-agnostic optimiſation alſo has to take place in the intermediate repreſentation or the AST, not in the code generator. I am ſtill uncertain about how exactly I will implement that, given the many vague conſtraints I have to meet. So I will probably be doing some praying and thinking for a while before reſuming my work. In the end, I will probably end up with baſically aſſembly-like C code being generated, uſing C as a portable aſſembly language. Another thing that bugs me is that the type ſyſtem is no longer fully in line with what I want in the final language, and especially things like laſt-uſe detection are pretty complicated, ſo I alſo have to come up with a concrete algorithm and ruleſet for that.
Update 10/13/23
Finally got around to outputting my firſt lines of analyſed and template-inſtantiated C code from my ſelf-hoſted compiler.
As a teaser, I preſent you below an example of how OR
types work in my language (baſically tagged unions) together with automatic return types:
automatic_return_type() ?
{
X: UINT := 0;
= X + 5;
= <INT>(5);
= FALSE;
}
Obviously this is a contrived example. But it currently generates the following C code (ſlightly ſhortened):
struct or_0x56492915f1f0 {
uint8_t kind;
union{
uint_t opt_0;
int_t opt_1;
_Bool opt_2;
};
};
struct or_0x56492915f1f0 fn_0x56492915f070()
{{
uint_t local_1;
local_1 = ((uint_t)0);
return (struct or_0x56492915f1f0){
.kind = 0,
.opt_0 = (local_1+5)
};
return (struct or_0x56492915f1f0){
.kind = 1,
.opt_1 = ((int_t)5)
};
return (struct or_0x56492915f1f0){
.kind = 2,
.opt_2 = false
};
}
}
I ſtill have quite a long way to go until everything works properly, especially ſtuff like exceptions, ſtatement expreſſions (like x + ({ Y: INT := 5, = Y + 5;})
, etc.), and the whole OOP ſtuff.
These OR
types truly begin to ſhine once I add the >>
prefix operator, which implicitly branches the code it is ſurrounded by, and inſtantiates it for each concrete type contained in the OR
type.
So I could write x += >>automatic_return_type()
, and it would look at the returned types, and create a caſe that perform x += <actual value>
on each one.
If any caſe would fail to compile, but at leaſt one would compile, it inſtead results in a runtime error being thrown if a caſe that would fail to compile is triggered.
This allows me to do ſtuff like this: >>(some_function()).member_function()
, and it would either execute that function if it exiſts on the actual returned type, or throw an error.
The run-time error can be ſuppreſſed via a poſtfix ?
(at least, that's how I think I'll do it), and you can add a fallback caſe with OR
: >>(some_function()).member_function()? OR fallback_action()
.
Update 08/11/23
I've been burned out recently, slowly recovering.
I made a very primitive prototype implementation of the game, which I plan to extend over time. It is currently styled like an auto-battler card-game, but the next stage will be a bird's-eye view 2D RPG. The final stage will be a 3D game with first-person controls.
On the compiler front, I made some progress, and am now in the midst of the template instantiation and type checking / overload resolution code. I guess it will take me until the middle of 2024 or so to finish the self-hosted compiler version that outputs C code (if, Yahweh willing, I don't have another burnout). I hope that the self-hosted, self-compiled compiler will not take over two minutes to compile (as the bootstrap compiler currently outputs a monolithic C++ file).
I hope to finish the compiler as soon as possible so that I can write the game in my own language, and compile to WASM, targeting WebGL/WebGPU, to run it in the browser. While browsers are a bit gay, they are still useful as a simple-to-access cross-platform multimedia engine, without many discrepancies or quirks to look out for (compared to native applications). And since I will output C code in my compiler anyway, I might as well run that through a C-to-WASM compiler.
Update 04/06/23
Over the past months, I didn't really make that much progress on my code; I have been thinking a lot about the CPU ISA I want to design for my language, however, and made good progress there. I have settled on a lot of stuff already:
- Registers are merely a fast scratchpad storage, but they aren't required at all.
There are special instructions dedicated to loading immediate values or register values to the ALU, reducing the complexity of the actual ALU instructions.
The result of the previous calculation automatically becomes the first argument to the next ALU instruction, and does not have to be written back to a register, allowing for very compact encodings of chained calculations, such as
~LOAD4(a + b)
intoARG0 a; ARG1 b; ADD; LOAD4; NOT
. There are 16 registers. Since most operations are now operand-less, they are more compact and easier to decode, and the few that do have operands are also easier to decode because they are few in number. - Registers and words are 5 bytes large, but all memory accesses from 1-5 bytes are supported. Due to supporting non power-of-2 data types, there is also no alignment requirement for data, meaning that no padding bytes are required within data structures, leading to better cache usage. 64-bit integers are too large for almost all practical use cases. 40-bit integers range support a trillion values, and address a TiB of memory with byte resolution, which is also more than any reasonable system should ever require.
- Special loop instructions that allow for zero-overhead loop jumps: instead of decoding relative jump targets ahead of time and prefetching them, the CPU keeps track of the loops' start (and optionally the end) using a stack, and keeps the first instructions of a loop (and optionally the first instructions after the loop) decoded and available for zero-friction loops without unrolling. Tracking the end of the loop is only required for loops that break from within. At least loop 2 nesting levels per stackframe are guaranteed to be supported this way.
Instead of the traditional multi-core paradigm, we leverage two single-core features for speed:
- There's a general SIMD mode, which works like a GPU shader, in that it works on separate register sets, and executes abitrary ALU instructions, and with the same constraint that all SIMD units execute the same instructions, meaning that they cannot branch heterogenously. The SIMD mode is not fully specified yet, but it guarantees at least 4 SIMD units.
- There will be no true multi-threading, instead there will be short-lived micro-threads that are intended to speed up two independent computations of the same critical path of a single procedure. This makes concurrency/atomics superfluous, and will boost the critical path, which is the greatest bottleneck for most computations. Micro-threads are forked/joined efficiently, and are intended to pay off for anything over 5 ALU instructions, and can be spawned without requiring any OS or library calls, or any elevated permissions. There are at least two micro-threads (master and at least one slave) on available.
Parallel throughput is accelerated via SIMD mode on homogenous data, in tight loops. Linear throughput is accelerated via micro-threading, and replaces the need for dedicated out-of-order execution units. Of course, when stalling, future instructions can be executed ahead of time, but nothing fancy like branch prediction units is required, leading to more simplex CPU circuitry. Compilers should manually direct out-of-order execution via micro-threads. Independent processes are implemented via pre-emptive scheduling, while independent threads are implemented via cooperatively scheduled green threads.
Why no multi-core support? Once you achieve full ALU pipeline utilisation using SIMD and micro-threads, you already have a massive performance boost on data and code that is highly localised, meaning all available cache capacity is put to good use and reduces memory bandwidth consumption. Once you go multi-core, you effectively halve (or worse) your cache capacity available per core (for the same amount of silicon), and you require twice the previous memory bandwidth for instruction and data fetching in order to not have slow-downs. Instead of running two cores that interfere with each other, we aim to make the single core we have as fast as possible, to fully consume all memory bandwidth and cache capacity for a single stream of execution. And once we maxed out the memory bandwidth with a single thread, it makes no sense to add another core, anyway. Instead, we would add more SIMD units, redundant ALUs for faster instruction dispatch, larger ALUs for faster processing of complex instructions, a bigger cache, deeper loop nesting support, etc.. All of these scale better with respect to memory bandwidth than just adding more cores.
Notice 09/02/22
The compiler rewrite is going well; I'm making good progress. Today, I seemingly finished the scoper stage of the compiler, next up are the symbol resolver, template instantiator, type checker/overload resolver, and code generation stages. Due to the better compiler architecture, adding new stages has very little overhead, and only the actually new parts have to be written.
Notice
The paper for a clock-based alternative to blockchains is on hold until further notice because I hit a major roadblock with my design. I realised that while it works in clearly honest and clearly adversarial scenarios, when someone is right on the edge of the allowed clock discrepancy, it breaks down, as decisions cannot be clearly categorised as honest or malicious by all honest participants, and disputes cannot be solved without voting, or at least I didn't find a way to do so. I will still look into it from time to time, and try out new ways, since I really like the project, but for now, I'll focus entirely on my compiler until I get some new inspiration. I'm looking into web of trust designs similar to PGP, that may also be viable. Also I'm rethinking the transaction model, and whether I need a total global ordering of all transactions or whether something like a causally consistent ordering is sufficient, as well as the general system design that would arise from that.
P.S.: I'm probably also going to write a tool for publishing small blog posts like this and putting them into an RSS feed.