Learning Roc: with an AI mentor, and/or with exercism.org · ideas

I recently learned Haskell, and my mentor was ChatGPT. It was a fantastic experience, it was incredibly helpful, in many different ways, to answer specific code questions, or give broad overviews of concepts and best-practices, libraries and comparisons with other languages, etc. I absolutely loved it.
Now I'm learning Roc, and sadly ChatGPT really sucks at it. It's confidently wrong every time I ask it anything. I'm better off asking questions about Elm and hoping it works in Roc.

So I'm thinking perhaps a great way to help newbies learn the language would be to provide a fine-tuned LLM who really knows Roc inside out. And if fine-tuning is too costly, perhaps we could come up with a really good prompt (e.g., with many code examples) that would turn ChatGPT into an expert.

Another great tool for learning is exercism.org. Sadly, Roc is not currently supported. Perhaps we could contribute problems that they could add to their platform?

Luke Boswell (Aug 02 2024 at 02:30):

Aurélien Geron (Aug 02 2024 at 03:27):

Haha, I should have known that your team had already done it, congratulations, that will be most useful! We should absolutely point to RocGPT on the tutorial page.

Aurélien Geron (Aug 02 2024 at 03:27):

Aurélien Geron (Aug 02 2024 at 04:06):

Sadly, custom ChatGPTs are only available for paying subscribers.
I tried Gemini, it mostly generated Rust code. :worried:
However, I tried Claude and it was much better, the code was almost functional.
Perhaps @Anton could share the prompts he used to create RocGPT? This way we people could use it with any LLM.

Aurélien Geron (Aug 02 2024 at 08:54):

I started a topic on Exercism's forum, if anyone else wants to participate in adding Roc to this great learning platform, that would be awesome! :smile:

Aurélien Geron (Aug 02 2024 at 09:04):

For those who don't know, the general idea of Exercism.org is that it helps you learn a programming language such as Roc by solving little programming puzzles (sort of like Advent-of-Code), either entirely on their website, using their online editor, or on your own machine, by downloading puzzles using their CLI (exercism download --track=roc --exercise=hello-world) and submitting your solutions also using the CLI (exercism submit). The CLI sends your code to the website, they run it and validate that it works. You score points, you make progress, you can share your code with others and ask for help, you can do a bunch of stuff.

Apparently the first thing to do to add a new language to Exercism is to decide on a unit testing framework, so that the code that people write can automatically be tested. What would be a good solution for this? There's expect, I guess, but that's not a full unit testing framework.

Anton (Aug 02 2024 at 09:13):

Aurélien Geron (Aug 02 2024 at 09:15):

Aurélien Geron (Aug 02 2024 at 09:19):

Anton (Aug 02 2024 at 09:29):

I mainly just provided it with the tutorial, pages from the website and examples from the examples repo. The tutorial alone will likely max out the input of any free AI. I'm thinking we set up a Roc assistant page up on the website were people only need to provide an API key for e.g. the clause 3.5 sonnet API and we insert all the necessary Roc information(tutorial, examples...) into the prompt. That allows anyone to use a model as cheap or expensive as they see fit.

Anton (Aug 02 2024 at 09:30):

I've also been wanting to make an example with a single Roc file that demonstrates all Roc syntax. I bet claude would already do really well with only that.

Aurélien Geron (Aug 02 2024 at 09:31):

Aurélien Geron (Aug 02 2024 at 09:33):

What's the convention for Roc file names? I see FooBar.roc, fooBar.roc, foo-bar.roc.
More generally, is there a style guide? The equivalent of Python's PEP8. (of course, there's roc format, which is great, but there's more)

Anton (Aug 02 2024 at 09:33):

btw the claude API is a great way to pay a lot less compared to the monthly subscription if you run out of free questions.

Anton (Aug 02 2024 at 09:33):

Anton (Aug 02 2024 at 09:34):

We've talked about making the formatter change the filename to follow the convention but we were doubting if that's an overreach for the formatter

Sam Mohr (Aug 02 2024 at 09:34):

Since we've changed the module headers to not have the name of the module, we should probably enforce that non-main modules are in UpperCamelCase

Luke Boswell (Aug 02 2024 at 09:35):

Sam Mohr (Aug 02 2024 at 09:35):

I've never seen an editor change the case of a filename, I think it's sufficient to just make it impossible to import them, yeah

Sam Mohr (Aug 02 2024 at 09:36):

Maybe we can do what rust-analyzer does and give a warning at the top of modules that aren't accessible from the root of the package, if it's detected that we are in a package file tree

Aurélien Geron (Aug 02 2024 at 09:38):

So the rule is supposed to be FooBar.roc for all modules? How about main.roc? I'm a bit confused.

Aurélien Geron (Aug 02 2024 at 09:39):

Btw, I'm asking because that's one of the things to decide when setting up the first exercise in a new Exercism track.

Anton (Aug 02 2024 at 09:51):

Luke Boswell (Aug 02 2024 at 09:51):

So app, package, and platforms are lower and just normal modules upper first letter.

Aurélien Geron (Aug 02 2024 at 09:54):

Notification Bot (Aug 02 2024 at 10:28):

Joshua Warner (Aug 07 2024 at 04:43):

I've been poking at the edges of building a library of Roc examples that can easily + automatically be plopped into the context window of an LLM, to effectively teach it about parts of the language, the stdlib, various platforms, etc.

Luke Boswell (Aug 07 2024 at 04:50):

Luke Boswell (Aug 07 2024 at 04:51):

I'm not sure how hard it is to wire up to an API for an LLM. But it's something Anton has been talking about.

Joshua Warner (Aug 07 2024 at 05:24):

Anton (Aug 07 2024 at 09:15):

Luke Boswell (Aug 07 2024 at 09:43):

I spent a few minutes and watched a video introduction to Hugging Face, and thought it was cool how there's all these different parts of an ML pipeline and models you can use from an API. But I also appreciated that things have changed a lot since I had anything to do with it... I once did a project with computer vision using using support vector machines.

Is is possible to take the .roc files from the repository, and encode them somehow into a more compact representation for LLMs to use?

Is it as simple as throwing files into an S3 bucket and dropping some $$ on a SaaS tool that fine tunes a model that you can then use?

Would an interactive setup, like a REPL or interpreter be useful to train or fine tune a coding model autonomously?

Anyway, random thoughts from someone who has very little experience with AI -- but it sounds cool.

Anton (Aug 07 2024 at 10:06):

We can go even simpler :) my idea was to keep it all in the frontend, users just paste in an API key of for e.g. claude sonnet 3.5 and we provide a system prompt (pre-prompt) that contains a good amount of typical Roc code. To the API we send our pre-prompt + the user's question and we display the reply. For a first step we can even only provide the pre-prompt for pasting into e.g. https://console.anthropic.com/workbench in the system prompt field.

Anton (Aug 07 2024 at 10:06):

For questions about Roc, e.g. "What are open tag unions" we have a different pre-prompt that contains the tutorial etc.

Anton (Aug 07 2024 at 12:06):

Anton (Aug 07 2024 at 12:12):

I'd also love to have a system prompt that can show/teach the LLM how to reduce Roc code to a minimal reproduction that results in a specific error, rust panic, segmentation fault...

Anton (Aug 07 2024 at 15:46):

Success :)
I removed one unused variable and and one old Num.toNat. The user request was "sieve of Eratosthenes in Roc" in combination with my general Roc system prompt. The full request+answer costs 3 cents.

app [main] { pf: platform "https://github.com/roc-lang/basic-cli/releases/download/0.12.0/Lb8EgiejTUzbggO2HVVuPJFkwvvsfW6LojkLR20kTVE.tar.br" }

import pf.Stdout
import pf.Task

main =
    limit = 100
    primes = sieveOfEratosthenes limit
    Stdout.line "Prime numbers up to $(Num.toStr limit): $(List.map primes Num.toStr |> Str.joinWith ", ")"

sieveOfEratosthenes : U64 -> List U64
sieveOfEratosthenes = \limit ->
    # Create a list of booleans, all initially set to true
    initialSieve = List.repeat Bool.true (limit + 1)

    # Mark non-prime numbers
    markedSieve = markNonPrimes initialSieve 2 limit

    # Collect prime numbers
    List.walkWithIndex markedSieve [] \primes, isPrime, index ->
        if index >= 2 && isPrime then
            List.append primes (Num.toU64 index)
        else
            primes

markNonPrimes : List Bool, U64, U64 -> List Bool
markNonPrimes = \sieve, current, limit ->
    if current * current > limit then
        sieve
    else
        newSieve = markMultiples sieve current
        markNonPrimes newSieve (current + 1) limit

markMultiples : List Bool, U64 -> List Bool
markMultiples = \sieve, prime ->
    List.mapWithIndex sieve \isPrime, index ->
        if Num.toU64 index % prime == 0 && Num.toU64 index != prime then
            Bool.false
        else
            isPrime

expect
    result = sieveOfEratosthenes 30
    result == [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

Anton (Aug 07 2024 at 15:51):

It would be extra cool to hook this up with a machine that can run roc check and roc test so it can correct itself.

Stream: ideas

Topic: Learning Roc: with an AI mentor, and/or with exercism.org

Aurélien Geron (Aug 02 2024 at 02:25):

Luke Boswell (Aug 02 2024 at 02:30):

Luke Boswell (Aug 02 2024 at 02:30):

Luke Boswell (Aug 02 2024 at 02:30):

Aurélien Geron (Aug 02 2024 at 03:27):

Aurélien Geron (Aug 02 2024 at 03:27):

Aurélien Geron (Aug 02 2024 at 04:06):

Aurélien Geron (Aug 02 2024 at 08:54):

Aurélien Geron (Aug 02 2024 at 09:04):

Anton (Aug 02 2024 at 09:13):

Aurélien Geron (Aug 02 2024 at 09:15):

Aurélien Geron (Aug 02 2024 at 09:19):

Anton (Aug 02 2024 at 09:29):

Anton (Aug 02 2024 at 09:30):

Aurélien Geron (Aug 02 2024 at 09:31):

Aurélien Geron (Aug 02 2024 at 09:33):

Anton (Aug 02 2024 at 09:33):

Anton (Aug 02 2024 at 09:33):

Anton (Aug 02 2024 at 09:34):

Anton (Aug 02 2024 at 09:34):

Sam Mohr (Aug 02 2024 at 09:34):

Luke Boswell (Aug 02 2024 at 09:35):

Sam Mohr (Aug 02 2024 at 09:35):

Sam Mohr (Aug 02 2024 at 09:36):

Aurélien Geron (Aug 02 2024 at 09:38):

Aurélien Geron (Aug 02 2024 at 09:39):

Anton (Aug 02 2024 at 09:51):

Luke Boswell (Aug 02 2024 at 09:51):

Aurélien Geron (Aug 02 2024 at 09:54):

Notification Bot (Aug 02 2024 at 10:28):

Joshua Warner (Aug 07 2024 at 04:43):

Luke Boswell (Aug 07 2024 at 04:50):

Luke Boswell (Aug 07 2024 at 04:51):

Joshua Warner (Aug 07 2024 at 05:24):

Anton (Aug 07 2024 at 09:15):

Luke Boswell (Aug 07 2024 at 09:43):

Anton (Aug 07 2024 at 10:06):

Anton (Aug 07 2024 at 10:06):

Anton (Aug 07 2024 at 10:06):

Anton (Aug 07 2024 at 12:06):

Anton (Aug 07 2024 at 12:12):

Anton (Aug 07 2024 at 15:46):

Anton (Aug 07 2024 at 15:51):

Anton (Aug 07 2024 at 18:03):