Some thoughts about language design · ideas

I'm sick of using C ABI as the only choice for linking code written in different languages.
I'm sick of not being able to do something because the programming language author made assumptions about how people will use that language, and only supported those use cases.
I'm sick of the bridging between two different programming languages. Only some program language pairs have auto binding generators, but no the other way around. Code reuse should be easy and safe.
I want a programming language that can describe any expression in System Fω without "hacks" on my side.

I think that theoretical soundness of a programming language is more important to have than syntax sugar of its textual form.

I think that syntatic sugar should never add additional construct to a language. It's better to add syntatic sugar be in a user-defined library, than to make the language more complicated to use.

Core ideas

values are "data".
effects are holes in a program.
functions are code fragments. code fragments are data.

Steps of program compilation: Text -> AST -> symbolic IR -> Static Analysis (type check & etc.) -> Code Gen (sent to LLVM or MIR)

There is no reason that type checking cannot only happen at compile-time. Code is code, after all.

Roc should define a intermediate representation, so program fragments can be linked together easily. The textutal representation of the program, or even AST (used by Paredit) don't matter that much, and can be changed independently.

RISV's model of support

Specific types should not be baked into the language. We can have standards that describe the level of support of a platform.

What is a "type" anyways?

At first, we have primitive types, which are provided by the platform. Types only differ in structure, not by the name we call them.

The platform should also provide functions that let us convert between primitive values.

union type

There should be an operator to combine two types to get a new type. A value of the new union type is of either of the old type.

We also need an operation to get the type of a value. With dependent types the constraint on return value can be described better, but it's more complicated.

composite type

The composite operation has no order (associative and commutative).
Written as $x * y$ in academic literature. There is the unit type, with only one possible value, written as $1$.

There's also an operation to get one of its parts, usually written as object.field in text-based programming languages.

qualitative type system

This is where types differ by name, not by structure. Rust, C, Haskell, and a lot of programming languages use this approach. This is a limitation posed by humans, not by type theory.

This feature has not stopped people from confusing vector space, affine space, and the real number set. Famous examples of this include using untyped real numbers (without physical unit) and losing a rocket.

Well, if you have done game development before, and have to manually convert between "different" vector types, you know what is "qualitative type".

In reality, we need special types for scalar values too, so that we don't try to multiply 1 second by 1 second and expect to get the result in seconds.

Several programming languages use term rewrite to solve this. I'm not familiar with those, so please add more details here if you know about that.

Functions and Operators

Custom infix function and precedence (pain!)

BQN does this well, with only 3 levels of precedence determined by the shape of the operator glyph. Julia tries to support unicode operators, and... can you remember what those symbols mean?. It's not a problem if you know the code, but even you will forget about what you wrote after a few weeks.

Type of a function

Functions are program fragments. Even int main(int argc, char* argv[]) in C is a fragment, waiting for the compiler to put it inside an ELF file (if you are on Linux).

Functions are usually interchangable in a programming language, if they have the same Input type, Output type, and Effect.

In some languages, a function can take multiple arguments (input). But the "function" I'm talking about here only has one input and one output. You can use "composite type" described above to combine multiple values into one. (theoretical stuff)

About Effects

Some programming languages say that effects are part of the return type. No. Effects are holes in a program. Effect handlers fills the holes, so the program can run. If your program have holes, you can't run it.

Typed holes

So, your function has some parts missing. You delegate those parts for the library user to fill in. The types of holes in a functions are a simple set, and you if you want to fill a type of holes, you must fill in that type completely. You can create new type of holes to delegate certain functionality within your effect handler (like how money transfer in monero works).

We need editor support for creating effects from missing functions with one click.

function/operator overloading global/default effect handlers

template <class T>
T GetMax (T a, T b) {
  T result;
  result = (a>b)? a : b;
  return (result);
}

The hole in the fragment is the ">" operator, and that's why C++ templates cannot be type checked before instantiation: we don't know if > is valid until we know T.

polymorphism via effects

C++ template did polymorphism the best, and we can easily add effects to the template system of C++.

template <T: type> {
    effect wing_count<T> {
        fn wing_count(T) -> usize;
    }
}

template <T: type> {
    effect abort {
        fn abort(String) -> T; // do not need to return anything though
    }
}

template <T: type> {
    fn can_fly(animal: T) -> {wing_count<T> + abort} bool {
        if wing_count(animal) == 0 return false;
        if wing_count(animal) == 1 return abort("What?");
        return true;
    }
}

template <T: type> {
    effect partial_compare<T> {
        fn (>)(T, T) -> bool;
    }
}

handler partial_compare<Int> {
    fn (>)(a: Int, b: Int) -> bool {
        // implementation provided by platform
        // this would be LLVM IR, for example
    }
}

I think definitions should be seperate from "Roc the language". If it can be seperate, it should be seperate. I restate that I don't like built-in precedence for birany operators.

static analysis vs sandbox

If you can check the effect of untrusted code at runtime, you don't need a sandbox to run the code.

I'm also seen game developers use Lua to write game logic (trusted code), because C++ doesn't have dynamic features of Lua.

I believe that we can have safety at runtime too! We only need to retain type information at trust boundaries for type checking to work. Well, we still need to get the symbolic representation of untrusted code for static analysis.

If we have effect annotations for libc, then we can build C programs that only depend on libc work anywhere.

Knowing effects of every function can make post-mortem debugging very easy. "Edit and continue" will be easy too. We don't need to rely on hacks like recording syscalls, linux performance profiling hooks, gdb hooks (used by rr and alike).

Richard Feldman (Jan 13 2022 at 16:47):

Locria Cyber (Jan 18 2022 at 15:55):

What I've talked about already exist as parts of different programming languages. The problem of "programming language" is that one syntax often has one implementation, and they don't share components between them. Ideally, we will have the same program graph representation for different languages, and the tools to turn text into that representation, and vice versa.
I'm experimenting with drawing programs by hand, and it's faster than creating programs by typing, since I can draw in 2D, where holes in a program are physical holes on the paper. Well, so far, only I can understand it, and I'm making a compiler for symbolic representation of programs.
Maybe I'm sick from typing too much.

Stream: ideas

Topic: Some thoughts about language design

Locria Cyber (Jan 13 2022 at 16:44):

Motivation of writing this document