I always found snake_case a lot nicer to read, and I'd love to use it instead of camelCase for defs, pattern bindings, and type variables
I think we should first allow _ to be used everywhere we parse an ident that starts with a lowercase letter, and then make the formatter automatically convert all camelCase ones to snake_case
While I do agree snake case is easier to read, it's more annoying to type. I want fewer key strokes when coding.
Well, you can type camelCase and let the formatter do it for you :smiley:
I wouldn't make uppercase letters a syntax error but a warning. This would allow you to run the formatter on it and have it clean it up.
This will also help migrate all code
I did that when I implemented the new modules syntax, and it worked great
I'm a big fan of snake_case for uncertain reasons that I'm excited to introspect on and report back here tomorrow when at a keyboard.
In languages that prefer camel case but allow snake_case, I use it and enjoy that I can tell instantly which names are mine and which are built-in/third-party. (I believe that says more about my personal preference for snake_case than it says about my desire for the visual distinction.)
I will say right now that any coding environment with autocomplete renders almost all _ typing troubles moot.
For example, I'd type uslanaTAB for user_last_name.
an argument for camelCase is that it's more consistent with the type syntax, which is PascalCase (and I think should stay that way)
that said, I also enjoy snake_case aesthetically
Yeah, I also think types and module names should remain UpperCamelCase
I think that's what every snake_case language I've used does and it seems fine
and it's worth noting that Rust has PascalCase for types and tags (or rather Rust's equivalent of tags) and snake_case for values and field names, and it seems fine
yeah, python and ruby too
I think both options are fine personally, although for some reason I do prefer how function named ending in ! look in snake_case
Same. I think that might be because the only two languages I've used where ! appears in names also happen to prefer snake_case (Rust and Ruby) :smiley:
I think starting or trailing _ does look better with snake case
oh yeah that one too :big_smile:
I also kinda like variables being distinctly separate from type names (and module). Like more distinct than the difference between PascalCase and camelCase
and using _ to ignore a variable fits in better this way
I like typing camel case more than snake. Snake case is more readable though
I don't have a preference either way, but would definitely prefer types and modules to remain UpperCamelCase.
In the super rare/probably something wrong case of someone wanting to have a multi-word type variable (like List thisFeelsWrong), I would prefer it to stay lowerCamelCase though. Just for some consistency with other types and distinguishing from normal variables. I can't think of a use case for this, but it is possible I guess
I want to suggest something so that I can vote against it: Nim is the only language I know that allows you to use either camelCase or snake_case and the compiler will automatically convert between the two, as in, you can define myFunction and then call it as my_function if you want.
It was something I really disliked about Nim, and I'm not sure I know why, although I'm sure I could've got used to it if I stuck with Nim.
I also like snake_case, and using different casings for different types of value. Prior art: Ruby and Zig do this too.
Assuming we have the formatter convert from one to the other (in either direction), one gotcha we'll need to avoid: If you define both fooBar and foo_bar, then the formatter needs to not convert, because that would be a logic change. You'd need a human to rename one to resolve the conflict.
But if you define fooBar and then reference foo_bar, should the compiler tolerate that, and should the formatter rename one to match the other? It'd be similar to the Nim problem, except with the assumption that when you eventually run the formatter there will be one correct version. It'd also be a very common case, for example if you're trying to call a standard library function with the wrong casing.
I'm in favor of snake_case, even though camels are more friendly towards birds :)
Since we don't have the feature of _ being a marker where the piped value should go amongst the arguments.
Also, make sure we can't have a definition name be just an _, except if we ment to ignore the value.
does anyone have a preference for camelCase? I'd like to hear from that perspective too!
Richard Feldman said:
does anyone have a preference for camelCase? I'd like to hear from that perspective too!
i mean, it's faster to type, and the most popular among other languages, so it does have some beginner familiarity for people coming from java or javascript
which do you personally prefer?
I think that I'm used to snake case and it can improve readability, but between type names and type variables being camelCase, and camelCase being easier to type, I lean towards it
Do you think that would still matter if you could write camelCase and formatter fixed it?
If the auto-formatter fixed it, I'd write snake_case instead
It's so minor to me that I'd rather write the right code the first time
but you could save some time :grinning:
speedrunning.mov
lol
The main point I want to impress is the value of consistency
I think Rust is able to communicate that well with TypeNames being in a different case than variable_names
But type vars make it a bit trickier in Roc
Though by 0.01%, since the number of times you'll see a multi-word type var are extremely small
I guess we're gonna do snake_case, then. I'm here for it, I was only pushing against it because it seemed like Richard was surprised that everyone was onboard for a conversion and literally no one was disagreeing
But it works well, it's readable, yada yada
Sam Mohr said:
I guess we're gonna do snake_case, then. I'm here for it, I was only pushing against it because it seemed like Richard was surprised that everyone was onboard for a conversion and literally no one was disagreeing
well I think making informed decisions requires understanding the major different perspectives, and the status quo hadn't had any advocates in this thread :big_smile:
I mean maybe that's because absolutely everyone prefers the change, which can certainly happen, but I'd like to give some time to hear from people who maybe just haven't passed by the discussion first
Yeah, not rushing us making a decision! Just making explicit the common surprise that we've been doing camelCase this long and no one has opposed a move to snake_case yet
I also prefer underscores. Here's an interesting article that includes some arguments from both sides, though its conclusion is pro-underscores: https://whatheco.de/2011/02/10/camelcase-vs-underscores-scientific-showdown/.
I think my personal favorite is actually kebab-case, though that might not be a good use of the strangeness budget. (For non-Lisps, the usual counterargument is that the dash could be confused with a minus sign, but I've heard that it hasn't caused any confusion for Pyret, which has been used to teach hundreds of beginning programmers. Pyret requires spaces around operators.)
I didn't notice that Rust has that same convention we're considering. Another step toward ROC being Rust, Only Cuter.
Ajai Nelson said:
I also prefer underscores. Here's an interesting article that includes some arguments from both sides, though its conclusion is pro-underscores: https://whatheco.de/2011/02/10/camelcase-vs-underscores-scientific-showdown/.
I think my personal favorite is actually
kebab-case, though that might not be a good use of the strangeness budget. (For non-Lisps, the usual counterargument is that the dash could be confused with a minus sign, but I've heard that it hasn't caused any confusion for Pyret, which has been used to teach hundreds of beginning programmers. Pyret requires spaces around operators.)
I actually agree that kebab-case is better than either option, but we tend to rule it out for negation reasons, e.g. -three-word-var is easy to miss the - prefix on. If we could make that work for Roc (suggested here), that'd be my vote. This ignores the importance of matching the style of other languages
I just assumed it wasn't a real option
I am definitely in the pro snake case crowd. For my own coding it won very naturally. I used camel cased languages in my first few years of programming and then slowly moved to some snake cases ones and now I naturally always do snake case if possible .
The main motivator is that acronyms and abbreviations are much harder for me to read in camel case.
For me the minus sign meaning negation or subtraction is way too deeply ingrained to even consider kebab case.
I mildly prefer camel case, I've used snake case languages for most of my career, but I find camel case slightly nicer, but I wouldn't say it's easier to read, just a purely aesthetic preference
I mildly prefer camelCase. I switch between both in different contexts at work, and I like that camelCase is easier to type and also easier to edit, like changing the second half of a variable name. Similar to how I prefer ML-like spaces between function args rather than commas and spaces, there's just one less piece of punctuation to juggle. But it's minor. snake_case is definitely easier to read. The most important thing is to be consistent. Seems like there's consensus on snake_case and I have no objection.
I do like kebab-case even more than snake_case in places that allow it, because you don't have to hold shift while typing it. But it would eat up a bit of strangeness budget.
I also think it's worth trying to keep our variables written in the same way as in the languages that platforms are written in (Rust+Zig), so that if a roc is embedded in larger application it can use the same variable names. That would mean snake_case.
so to summarize, it seems like:
does that all sound like a reasonable summary of the thread?
I think @Sam Mohr's point about type variable casing is an important one that may be overlooked.
I would also pushback against multi-word type variable names being a rarity. Sometimes it's good to be more verbose for clarity, so I think this will definitely come up for some folks.
hm, I can't think of a time think I've ever seen it come up so far. :thinking:
I've done it in my own code, let me see if I can dig up an example
Well I thought I did, but I guess I didn't :sweat_smile:. But I think the point stands that there could conceivably be a case where someone wouldn't be satisfied with the clarity afforded by a single letter or word, and would want to use a multi-word type var.
Maybe type vars staying camelCase would be a good enough solution to the issue though. The one thing I wouldn't want to see is snake_case type vars, as they'd easily get confused with variables
I assume type vars stay camelCase and this just changes variable_names and function_names. Though it practice most type vars will probably just be lowercase due to being a single word. Then all types and alias and module names as PascalCase
I'm okay with that, but leaving one use of camelCase for 0.0:face_with_spiral_eyes:01% of lines of code seems like a confusing surprise for most users a weird annoyance for a few unlucky users a piece of trivia for power users. It would also be an extra thing to remember to document/implement in several places: "In Roc, variable names use snake_case (unless it's a type variable name, like that a you saw earlier, but don't worry, you don't need to remember that because long ones are rare)."
Would it be that weird for type variable names to use the same casing as non-type variable names? We do that today and I don't think anyone's complained.
camelCase for type vars seems more consistent to me than snake_case if you think of camelCase as AlmostPascalCase :grinning_face_with_smiling_eyes:. PascalishCase for types, snake_case for variables and functions. I seem to be in the minority regarding the commonality of multi-word type vars, but if people are right that it's super rare, then I can't imagine too many people will be upset either way, and in that case I'd vote for consistency (although we seem to disagree on which is more consistent as well :sweat_smile:)
In Roc, variable names use snake_case (unless it's a type variable name, like that
ayou saw earlier, but don't worry, you don't need to remember that because long ones are rare).
I would be surprised to see it framed this way. I see type variables as a special kind of type, not a special kind of variable
I think that given that it will almost never come up (again, I've literally never seen it come up organically yet), it would be really strange to see it for the first time in your life after years and be like "oh, camelCase in Roc does exist! TIL!"
seems like "lowercase is snake_case and uppercase is PascalCase" is the least surprising convention
Fair enough
I'm clearly in the minority here :sweat_smile:, but I'll just close with saying that keeping the distinction between types and variables as clear is possible seems worthwhile, and to that end, distinct casing is useful. A type variable is closer to a type than a variable in my opinion (others may disagree), and camelCase is closer to PascalCase than snake_case, thus the natural choice seems to be camelCase (for type variables specifically). I'll also point out that if multi-word type vars are so rare, then there's no difference between snake_case and camelCase in the majority of cases!
That said, types and "actual code" don't mix as much in Roc as they do in other languages where annotations are necessary, so maybe the visual distinction is less important than I'm thinking. And (while I somewhat disagree), the consensus seems to be that mult-word type vars are so rare that this probably isn't even worth a discussion.
Thank you for at least hearing my argument :smile:
ok, let's do it!
If anyone wants to open an issue (and/or just implement it), I think what needs to be done is:
for now, let's not do any compiler warnings for using the wrong style or anything like that...we can separately discuss introducing them later if desired, but at a minimum it seems clear that both styles should be accepted by the parser so that the formatter can automatically convert to the preferred style
Yesssss, sssssir #7214 is free for anyone to pick up!
I'm so glad for this change, be it however small.
nice! want to link to this discussion in case anyone is wondering where the issue came from?
let’s gooo
Richard Feldman said:
nice! want to link to this discussion in case anyone is wondering where the issue came from?
Good call, did it!
sweet, thanks!
As someone who’s never written any significant code in snake case, I’ve never understood the appeal of it. Is it just for readability? As someone used to camel case, adding in underscores everywhere feels like extra ceremony for little gain, like adding semi colons everywhere.
Does it play nicer with certain editors people use maybe?
It's more readable IMO because the underscores act like spaces
Meaning when you read it, you can visually group the words more easily
with camelCase, I have to look a little bit longer each time for where one word starts and another ends
It's minor, but it's definitely a thing for me
I don’t have the experience to make a fair comparison, but I fail to see the appeal at least. :shrug:
Norbert Hajagos said:
I'm so glad for this change, be it however small.
Within the context of someone's first 5 seconds with Roc, this could be in the top 5 biggest changes to the language since its creation!
Richard Feldman said:
and then automatically combine consecutive underscores into 1 underscore so names never have 2 underscores in a row)
What if you actually want to have 2 underscores? I wouldn't know when though
if someone raises a specific use case they have, we can talk about it, but by default that's a mistake the formatter can fix
I’ve seen that used as a poor man’s namespacing technique. I’m in favor of disallowing that by default, though.
Yeah, I have seen it in python as essentially extra namespacing for tests
We use the two underscores a lot in Elm for namespacing. Because Elm discourages nesting, we sometimes use this naming convention for “nested” items (both record items and in constructor names)
If it is common practice in Elm to subvert an anti-nesting design decision with variable naming conventions, then we probably should take that as a sign that the anti-nesting decision is hampering good code quality and do the opposite.
A.k.a. we should disallow double underscores and encourage nested data structures where appropriate.
Kasper Møller Andersen said:
As someone who’s never written any significant code in snake case, I’ve never understood the appeal of it. Is it just for readability? As someone used to camel case, adding in underscores everywhere feels like extra ceremony for little gain, like adding semi colons everywhere.
Does it play nicer with certain editors people use maybe?
It's harder to come across a pathologically unreadable name. yesterday, I worked on the compiler. Glad I didn't find acallIndex , but rather a call_index variable. the first one I have to parse letter-by-letter to see what's there, while with the second one, I recognise the words "without reading them".
There also aren't dilemmas about acronyms. getHttpStatus is straight forward to me. Some prefer to capitalize HTTP (since that's an acronym), but they are highly discuraged to do it with camelCase. getHTTPStatus puts me off for a second, since I see HTTPS right there. Minor things, but still, not having to squint every time I read a longer var name is really nice. I also tend to use longer var names, so that's my bias.
Norbert Hajagos sagde:
It's harder to come across a pathologically unreadable name.
Which is fair, but still feels to me like one of those things where the cure is worse than the disease. As in, to avoid pathological cases, everything becomes more verbose. Generally the cases you mention don’t bother me much at least, but again, I don’t have the experience to compare them fairly. So :shrug:
Sam Mohr said:
anti-nesting design decision
The _anti-nesting_ was something I hated about Elm so much. They tell you: craft your types, compose them, but for some bizarre reason: composing a record of other records was a bad idea. Don't ask why, just "trust-me-bro".
I concur one reason to use snake_case over camelCase is acronyms. I have seen many real world scenarios in my career where acronyms are very inconsistent between properCamelCase and UPPERCase / innerUPPERCase for the acronym.
I have registered interest in making this my first contribution to Roc in the GH issue. Parsers are the thing I've done most in my PL career to this part, so I feel at home there.
Welcome @Anthony Bullard, I assigned the issue to you. Let us know if you need anything. :smiley:
What is the perferred place for discussion of issues in active development? Here or the GH issue?
I guess #compiler development may also be appropriate
Anthony Bullard said:
What is the perferred place for discussion of issues in active development? Here or the GH issue?
The GitHub issue is good for anyone that's helping you review the change, otherwise #compiler development is good once we are past the discussion stage and have started trying to implement stuff
Yeah, but #compiler development and #contributing work. You are much more likely to get implementation and debugging help quickly on Zulip then on a GitHub issues.
If it is about the design and there is already a thread in #ideas, you can also just continue that thread.
@Richard Feldman I just want to be super clear here. We are allowing underscores in Tags and other uppercase names?
So
SomeTag
and
Some_Tag
are going to be valid?
I only ask because the latter is not exactly aesthetically pleasing, but I guess it does allow one to escape some of the same pitfalls as you would find in lowercase names. This is obviously not my call, but I just like absolute clarity before I begin implementation
yeah I think the parser should accept both
I'm not really concerned about people adopting Pascal_Snake case :big_smile:
Great, yeah I think the aesthetics of it is it's own limiting function :laughing:
Put up the first commit of my first PR. Good progress, but feedback very much appreciated.
Is there a risk people use UPPER_CASE for types and modules? Is that ok? I would associate that with a constant.
yeah I think if people wanted to use that for anything it'd be values, but that wouldn't even compile since values have to be lowercase
Personally I don't see a reason to allow either pascal with underscores or full upper
Just allows people to break standards and write code that is really different from everything else
Someone will do it and others will have to deal with it
I just generally think it's nice to have the parser be more permissive
and the formatter less so
I agree about keeping the parser permissive and the formatter strict in cases where the formatter immediately fixes the mistake the user made, but we couldn’t do that here because it could cause name collisions so I would prefer to not allow Pascal_Snake.
oh that's a good point!
fair enough - since the formatter can't fix it without potentially introducing compiler errors, let's give a parse error for underscores in uppercase names @Anthony Bullard
I’ll fix up my PR tonight. It’s a small change
awesome, thank you!
So I've added some open questions to my PR that I would to have feedback on from the participants in this topic: https://github.com/roc-lang/roc/pull/7233
Do we wish to allow trailing '_' in lowercase idents?
definitely! This is a convention we actually want to make use of in the future :big_smile:
Do we want to allow multiple '_'s to parse successfully?
I'd say we should treat this the same way as underscores in uppercase names, and for the same reason: if the formatter quietly fixes it, that could introduce bugs, so instead we should have the compiler complain.
that said, I think both of those scenarios should be a warning - that is, the compiler pushes a Problem but otherwise accepts the name as valid, so it doesn't block you from running your program over a stylistic problem
Do we want to be strict on disallowing uppers in snake_case idents?
long term, my default thinking is that we should do the same thing here as what we do for double underscore or underscores in capitalized names (that is, warn but don't block).
However, short term I think we should actually have the formatter change this one for you, because otherwise converting all the existing code from camelCase to snake_case will take forever. :big_smile:
Cool, I'll add these answers to the PR description and I'll treat these are invariants for me to track / test. I'll have to dig a bit to look at Problem and how that system works next.
What's our plan for acronyms? I remember some folks above liking snake case partly because names like parse_HTTP_response are so readable. Maybe a name shouldn't start with caps like ID_card? Maybe proper noun caps aren't allowed like latest_message_from_Jan? I'd be a little bummed if acronyms must always be lowercase, but it isn't critical.
I personally never use uppercase acronyms in snake_case. And I don’t think I’ve seen it
But I haven’t worked in many code bases the use snake_case (a little Rust and all of Starlark and Elixir)
:thinking: Hmm, I could just be imagining this being a thing other people do too, after years of doing it myself in languages like Typescript, Python, and Terraform.
I haven't seen it either
I would say we follow whatever norms there are in snake case languages. Python, Elixir, Ruby. In those languages there doesn’t seem to be a language level prohibition on uppers or multiple underscores
I think the question is would we want to be more constrained in the parser or fallback to linting / formatting to address things that we don’t want
Of course after there is consensus on exactly what we don’t want
So, I’ll post this in separate messages and people can register their feedback with :+1: and :-1:
Do we want to allow uppers in snake case identifiers?
Do we want multiple _ in the middle of identifiers?
so parse_HTTP_header and parse_node__expr are the two types identifier we are discussing just so the examples are clear
I think both should be allowed in the parser, but:
_Treating either of these as syntax errors is needlessly frustrating
You should be able to run your program fine
yeah for sure none of these should be blocking errors
like the parser should always accept them, the question is just whether it also records a warning to display to the user as a nonblocking FYI
Yes and then if we say yes to the latter, I need to decide if this is a Problem pushed during canonicalization by inspecting the identifier after parsing
So maybe we say :warning: for “provide warning” and :check: “let it be”?
Anthony Bullard said:
Yes and then if we say yes to the latter, I need to decide if this is a Problem pushed during canonicalization by inspecting the identifier after parsing
yeah this seems like the best time to do it :thumbs_up:
Random thought, possibly a terrible idea: What if the compiler internally normalizes identifiers, so that foo_bar, foo__bar, foo_BAR and fooBar are all considered the same identifier? Then the formatter would be able to standardize variable names without fear of introducing naming conflicts or other changes of behavior.
Nim does something like this so that people can use their favorige casing style for variable names. This would be a similar feature with the opposite purpose: letting the formatter help enforce a single naming convention.
The issue with that is that if someone does not use the formatter then you‘ll have to read code that uses different formats for the same name which I would definitely like to avoid.
We _could_ do something like that in canonicalization
But I think I agree with Isaac here
Do we think many people would forgo the formatter? A while ago I argued against a change that would make the coding experience a bit worse for people like me that (so far) are still LSP-less, but even I use the formatter :sweat_smile:.
relevant from earlier in the thread:
Hannes said:
I want to suggest something so that I can vote against it: Nim is the only language I know that allows you to use either camelCase or snake_case and the compiler will automatically convert between the two, as in, you can define
myFunctionand then call it asmy_functionif you want.It was something I really disliked about Nim, and I'm not sure I know why, although I'm sure I could've got used to it if I stuck with Nim.
Definitely feel like I'm far enough down the road where I'd like to pause and get some feedback on the PR. I've absorbed a lot about the compiler in the past 2.5 days, but I want to make sure I'm following the correct patterns and not misunderstanding expectations.
In case it got lost in the thread, here is a link to the PR: https://github.com/roc-lang/roc/pull/7233
Yeah, I read Hannes' comment and agree, I'm not a fan of that Nim feature either.
The difference would be that in Nim being able to use a mix of ways to refer to the same variable is the point. In our case we'd use it for the opposite purpose: applying a single style (through the formatter) better then we otherwise could. So it'd be a formatter-facing feature and not a user-facing feature, if that makes sense.
@Jasper Woudenberg I think I get what you are saying, but that requires canonicalization be part of formatting, no? That could significantly slow down the formatter
A very low priority question for @Richard Feldman : What is the plan for the stdlib as regards this change from camel -> snake case? Will we just rip the bandaid off - maybe with a codemod? Or will we alias and deprecate the camelCase members?
rip the band-aid off I think
I think it would be too big a project to try to make this backwards compatible, especially when the upgrade path is "run roc format and you're done" :big_smile:
I'm concerned _slightly_ that that won't quite work as well as hope. But moving forward with that approach, getting some feedback and then pivoting if necessary seems reasonable
If we want more general Internet opinions and debates....not that we actually do, but:
https://x.com/zack_overflow/status/1860046682018447877?t=FeeTxYmEPO2aMCkjAozTGw&s=19
And probably the most interesting reply in favor of camelCase:
https://gist.github.com/redbar0n/c011f0e0c682a9e1baf3f273fddf730c
that gist is actually an interesting example of the context of whitespace vs parens-and-commas calling seeming relevant to me
the first line is:
// The following is an example from the language Kitten, but generalizes to other languages.
I'm actually not sure that it does generalize, because the point seems to be that it's harder to tell the whitespace apart from the spaces separating the arguments
but if spaces aren't separating the arguments, but rather parens and commas, that seems like a pretty relevant distinction to the point!
For sure
I also still prefer snake case for readability in their example
But also, I never make my text so small to not see the underscores
Looking at it without syntax highlighting on mobile in a bright environment, I actually agree. The underscores and kebab are much less "separated" than camel case
the underscores and kebabs definitely look a lot more like a big sea for words where the camel each token is much easier to identify and pickup and it is easier to read generally.
I'd like to see this comparison with syntax highlighting, I'm not sure if the difference would still stand
I mean overall, the revealed preference is the most important thing imo
it's not like all the excitement in this thread about switching from camelCase to snake_case would go away if people saw a sufficiently logical argument :laughing:
that's not how syntax preferences work!
That could also just be confirmation bias with a fairly small sample of people who are susceptible to snake case. I’m personally very unexcited by snake case, but I don’t have any strong arguments over what’s already been presented, so there’s no point in banging that same drum for me. It might be that many others feel the same. Or it might be that most people just genuinely prefer snake case
Generally I’m also concerned about even the idea of a formatter changing names for me (though that doesn’t seem to be the aim anymore at least?) or just getting compiler warnings that I’ve used a “bad” name. If I’ve ended up in a situation where using double underscores feels like a reasonable solution, any compiler warning has to have some damn good reasons for being there. And “it’s not consistent with our preferences” and “you should use nesting” doesn’t feel like that, at least from here. For example, in Elm we might have several custom types relating to the same domain in the same file. My experience says that even if those custom types are technically distinct, the code can still become much more readable from adding a bit of explicit namespacing in the name, so you can tell at a glance what you’re looking at, without needing to refer back to a type signature somewhere. At the same time, I also get thoroughly pissed anytime I’m told to do something that doesn’t seem to make any sense. So any compiler warning needs to work hard to sell why I shouldn’t be doing this, and showing what I should do instead
Kasper Møller Andersen said:
So any compiler warning needs to work hard to sell why I shouldn’t be doing this, and showing what I should do instead
This is actually where I'm at. I highly doubt with a snake_case norm that people would throw out identifiers with multiple underscores willy-nilly - they would do it for a purpose. Warning there is just likely to be frustrating for people just trying to get their job done. I've actually already implemented this warning in my PR locally - and it would be annoying to roll it back - but I would rather do that than deliver a frustrating user experience, especially if it lands before the AOC release.
As for the formatter changing names, I can understand the concern. But I am putting in extra effort to make sure the conversion function creates the snake_case identifier that you would want, and not just a "add a _ before an upper and then lowercase the whole thing" type algorithm, but one that tries to understand where acronyms exist and respecting digits as boundaries are well. When I have it complete (I'm traveling currently and have limited time), I'll @ you on the PR and you can check out the test cases and tell me what you think.
Did we decide to allow capitals in snake case or are those warnings?
Can someone do
some_HTTP_varMY_CONSTANTSoMeThInG_hOrRiDpersonally, I really want all of those to be warnings.
I have no qualms with __. It might be accidental, but it doesn't hurt consistency in the same way capitals can: equality__nested_failure_test
I think the formater changing things from camelCase to snake_case is alright, I just got the impression it would do more I guess. Feel free to add me though :smile:
I guess what I don’t understand is why we want to put so much effort into disallowing “weird” names. It’s fair enough to choose either camel or snake case, but beyond that, it feels like needless control to me. It’s the sort of thing where somebody is going to run foul of them even when they have a reasonable use case, and there’s no good reason for it as far as I can tell.
I appreciate that perspective, but I'd least like to have the warnings in place at the outset while we're transitioning the ecosystem.
if people complain about it getting in the way of reasonable use cases in practice we can always reevaluate in the future!
Roc is an opinionated language in general and enforces a lot of things for consistency. People used to say the same about enforcing a formatter config. Go was the first language to force everyone to use the same formatter (with no config options), and it turned out great. Enforcing consistency often adds value to the community as a whole even if it adds friction for some individuals.
All that said, I don't think the plan has anything too harsh in it. I think the full list of current rules is:
Variable identifiers are lowercase letters and numbers with underscores interspersed. Variables may not start with a number or underscore followed by a number. There can be a leading underscore for unused vars. There can be a trailing underscore for reassignable vars. Repeated underscores are not allowed.
I agree that more constraints could be dropped, but I don't think any of these constraints will significantly hinder someone.
Especially if some of the constraint failures just lead to warnings which still allow the code to compile.
And I'd be alright with having consistent naming on acronyms personally, but I'm not sure it's a rule that has large enough benefits to be worth the downsides. If you force lowercasing, you can no longer tell the difference between a SoC and a SOC for example. Not that those two exact examples are likely to occur together, but it just feels like a situation that is bound to occur in real life.
As for constants, how does Roc plan to let you match against constants? As in, can I do something like:
MY_CONSTANT = "constant"
...
where myString is
MY_CONSTANT -> ...
"someConstantString" -> ...
...
In Scala, an upper case first letter allows you to match against a value like this. That's quite subtle in practice, so I wouldn't recommend it necessarily, but it is a potential use case at least.
Finally, I don't really consider SoMeThInG_hOrRiD as a valid example, as that seems very much like "banning something for the sake of banning it", as it's not something I've ever encountered in practice.
Anyway, it's not a huge deal, I just want to be sure that we're not making rules for the sake of making rules. That tends to give more work and pain that it's worth, in my experience.
Yeah, totally fair points. And good context.
Given everything is semantically constant in roc, I don't think capitalizing constants has any real meaning in roc. Would just be capitalizing everything that isn't a function.
On the topic of caps:
I like making variables that store an environment variable the same as the environment variable they came from. So I like to have IS_DEV over is_dev.
Then again, roc being pure global environment variable flags are not super relevant I guess :sweat_smile:
I do appreciate the goal of not having to think about capitalization of acronyms, but it struck me that I don’t actually care about that in function names and variables. The vast majority of the time, I will be consuming such functions rather than defining new ones, and then my IDE will just tell me what they look like, which means I don’t have to expend any energy deciding what I think they should be called.
Instead, where I really care about them is in module names (or module aliases specifically). Because when I call functions from e.g. a GraphQL module, I always have to stop and think whether we are usually importing that module as GraphQL or Graphql in the rest of the application.
In other words, the vast majority of the time when I am spending decision power on how my acronyms should be capitalized, it’s when I’m importing modules. And it seems like we are not solving for that here, but rather only the (in my opinion) much smaller issue of function and variable names.
Eli Dowling sagde:
Then again, roc being pure global environment variable flags are not super relevant I guess :sweat_smile:
Roc can read environment variables just fine, so that’s a fair use case I’d say :big_smile:
Trying to get some early feedback on the use of numbers in identifiers, here are some test cases I have and what my first instinct was (which after running text_syntax's tests I'm not so sure about any more:
test_once(&arena, "some123", "some_123");
// nll lll lll lld _ldd ddd ddn
test_once(&arena, "thehttpstatus404", "the_http_status_404"); // _ldd
test_once(&arena, "inthe99thpercentile", "in_the_99th_percentile"); // _ldd ddl dll llu
// _lul
test_once(
&arena,
"all400serieserrorcodes",
"all_400_series_error_codes",
); // _ldd _dll
test_once(&arena, "number4yellow", "number_4_yellow"); // _ldu _dul
test_once(&arena, "usecases4cobol", "use_cases_4_cobol"); // _lul _ldu _duu
test_once(&arena, "c3po", "c_3_po") // _udu _duu
What would everyone here expect these to be formatted to?
My biggest concern being stuff like in our test cases like infinityF32 which is becoming infinity_f_32 but it feels like most people would expect infinity_f32
yeah infinity_f32 seems better to me!
Here's the cases in a more presentable format:
CAMELCASE -> SNAKE_CASE_V1
some123 -> some_123
theHTTPStatus404 -> the_http_status_404
inThe99thPercentile -> in_the_99th_percentile
all400SeriesErrorCodes -> all_400_series_error_codes,
number4Yellow -> number_4_yellow
useCases4Cobol -> use_cases_4_cobol
c3PO -> c_3_po
And this is without treating digits are boundaries
some123 -> some123
theHTTPStatus404 -> the_http_status404
inThe99thPercentile -> in_the99th_percentile
all400SeriesErrorCodes -> all400_series_error_codes,
number4Yellow -> number4yellow
useCases4Cobol -> use_cases4_cobol
c3PO -> c3_po
I think I prefer not treating the digits as boundaries. How aggressively will this format? Will it un boundaries, so that it would convert some_123 to some123 ... or is this just going to leave some_123. If it will leave some_123 unchanged then it's an easy one I think.
At any rate I'd have thought number suffixes are common enough that they should normally not be boundaries. isValidUtf8 should not become is_valid_utf_8 but is_valid_utf8.
(byTheWayReallyHappyToSeeSnakeCaseComingInAndCamelCaseTakingItsHorribleUnreadableSelfBackToWhereItBelongs. Now if only someone would sacrifice ? as an operator to allow it in identifiers, we'd be in a perfect world ...)
Question mark in identifies? What for?
I don't think I have ever seen that
Brendan Hansknecht said:
Question mark in identifies? What for?
Ruby and Elixir I think use it for predicate functions, just to say “this returns a Boolean”
Interesting. I guess that is the equivalent to an is prefix used in some styles
I feel like I'm roc results are more common than booleans. So a ? operator for early return would be used more than a ? at the end of a function name, but I guess it is just a style difference/decision
Wouldn’t argue that. Though an early return from an effect function would look ugly give_me_something!?
in the parens and commas world, that will be foo!()?
I wasn't seriously suggesting it -- I think I've seen it in scheme, and I always thought it was rather elegant, because the isX convention looks like a statement than a question. Where the convention is used it instantly signals what the function is going to do.
(In fact, it wouldn't be a bad heuristic for a Result type name ... because everyone would immediately know from the name that List.get? would return a Result, whereas List.takeLast does not. But I'm still not suggesting it, mostly because every language is short of easily typed symbols, and ? has better uses.)
Kilian Vounckx said:
JanCVanB said:
Unless you're looking for something deeper than equality, I believe that already works with
myConstantsince every def is constant and every Eq-able type is matchable.It wouldn't I think? It would just match everything (same as underscore). The only difference is that is would actually bind the thing to the variable. You would get a shadow error if the name is bound somewhere already, and a variable not used error if it isn't.
Hmm, I don't understand what you're communicating here, so I infer that it involves a deep concept that I haven't perceived before. Hopefully someone more experienced can answer your question.
Anthony Bullard said:
without treating digits are boundaries
I prefer this, not because your examples look better on the lower right side (they look best on the upper right to me) but because sometimes a word will mix letters and numbers
internationalization_handler
i18n_handler
or suffixes like @Paul Stanley said above
u8_converter
I'm okay if the formatter misses some ambiguous cases like my501c3nonprofit that should be my_501c3_nonprofit
If someone's intentionally coding in camel case where a language wants snake case, it seems reasonable that a couple of variable names will be suboptimal. For the mass conversion of existing codebases, we could just double check all variable names with numbers in them (or actually all variable names, at the small scale we're at) to make sure the results feel right.
I volunteer to manually read every variable name in GitHub.com/*/*/**/*.roc if the formatter update generates a CSV of proposed conversions :nerd:
I just wanna highlight this again for opinions, since it seems like it got lost in the conversation :grinning_face_with_smiling_eyes: https://roc.zulipchat.com/#narrow/channel/304641-ideas/topic/snake_case.20instead.20of.20camelCase/near/484442330
Ok, I think messages are cleanup up now and named constant in matching was moved to #ideas > pattern matching on named constants which has all of the old context as well.
As for the message above on module names. That is an interesting point. I definitely see fights over acronyms in names.
I think what I see most often for readability is that only the first letter of acronyms are capitalized
ParseHtml in pascal case. Or GraphQl....
If it’s important to create more consistency around names with these rules in general, and likening it to a formatter that is not configurable, then module names are the names with the highest impact to me at least.
Making decisions about how I need to name a module alias with acronyms is probably 97% of the cases where I need to make naming decisions on acronyms in our large code base at work. Which is also why I don’t really care about this rule in function names: the impact for me personally is quite minuscule in that space.
If that still true with the static dispatch proposal where it will be much less common to see module names. Many function calls will use method syntax instead of being qualified
Here's something that doesn't have to do with the actual format itself. Having the formatter fix the casing of identifiers means that this is somewhere where the AST will be changed by the formatter, which is currently not allowed. I can think of two ways of handling this:
--migrate flag to the format command that allows for the formatter to change the AST. Elixir did this recently in 1.17 for exactly the same reason(allow for changes to AST in formatter, not casing as that has also been the case)lI'm leaning towards the latter as it is simpler. We could also probably add a confirmation prompt for this. I think that means the formatter would have to pass configuration for this around and only perform the migration when the flag is passed. This could be powerful for parens-and-commas migration in the future (@Richard Feldman ).
seems reasonable! I'm curious what others think.
The issue with the latter is we implemented most of the formatting functionality as a trait. So in order to accomplish this we will have to add yet another parameter to format_with_options, maybe a struct that will hold all of the immutable (per run) options.
struct FmtOptions {
snakify: bool,
// Others like parensAndCommas will come later....
}
And this will need to be drilled down all the way
What if the formatter treated numbers as word boundaries except in a hardcoded list of special cases, f32, utf8, and whatever else shows up in the builtins. It seems like most examples are better with the extra word boundary, so that with as many exceptions as we can think of should cover nearly every case.
hm, I don't think the formatter needs to have a concept of "word boundaries" :thinking:
I think it's sufficient to have the rules be:
I don't think we need any other rules than that!
Richard Feldman said:
I think it's sufficient to have the rules be:
- convert uppercase letters to lowercase and add an underscore before it
- collapse consecutive underscores down to one underscore
My experience writing test cases is a lot of camelCase identifiers will have some very terrible snake_case analogs. Any sort of uppercase acronym with come out like _h_t_t_p instead of the more logical _http.
So in my implementation I track the previous current and next letter so that a series of uppers will only emit a single underscore. If you don’t want to take this approach please let me know now so I can adjust
Does the formatter actually need to adjust the ast? Can't it just print the identifiers "wrong" such that they are in snake case?
Brendan Hansknecht said:
Does the formatter actually need to adjust the ast? Can't it just print the identifiers "wrong" such that they are in snake case?
The formatted checks the file afterwards and ensures the ast was the same between runs
Anthony Bullard said:
Richard Feldman said:
I think it's sufficient to have the rules be:
- convert uppercase letters to lowercase and add an underscore before it
- collapse consecutive underscores down to one underscore
My experience writing test cases is a lot of camelCase identifiers will have some very terrible snake_case analogs. Any sort of uppercase acronym with come out like _h_t_t_p instead of the more logical _http.
So in my implementation I track the previous current and next letter so that a series of uppers will only emit a single underscore. If you don’t want to take this approach please let me know now so I can adjust
that seems fine :thumbs_up:
Brendan Hansknecht sagde:
If that still true with the static dispatch proposal where it will be much less common to see module names. Many function calls will use method syntax instead of being qualified
It would definitely impact it, but I don't think the overall importance would change much. For example, in Elm I might write
someSelectionSet |> MyGraphQLModule.runQuery
where someSelectionSet is an opaque type defined in the GraphQL library we're using, whereas we have our own function for actually making it into an HTTP request. So in Roc, with static dispatch, we'd write it as
someSelectionSet.pass_to(MyGraphQLModule.runQuery)
which I think would be a normal pattern? You can imagine the same thing for any package you're using, which might build up some API information, which it then leaves to you to send out over your acronymly-named protocol of choice.
Makes sense.
I guess the next question is how people feel about module names being snake cased too? Is there any technical reason not to do it, or is it purely a preference thing?
Pretty close to having my PR accomplish all of the above AND passes all of the tests. I have one annoying warning that I can't seem to get rust analyzer to stop emitting short of an explicit annotation. Anyone ever have a test helper function just always say that it is unused even though it's very plainly used many times?
It might be because it's only enabled with a feature
I mean it's a test helper, so it's only used in functions with the #[test] annotation
You can see it here: https://github.com/roc-lang/roc/pull/7233/files#diff-1e2afbfc20f8b630b4bb8a987e1e88e148662ff6dfefc9cdcc58d6fc27f11e03R378
I think it's because rust analyzer doesn't understand stuff only used in tests
There's proooobably a config option for it you can set in your LSP settings
If it was just my LSP I couldn't care less, but it's when you build or test as well. And no other test helper seems to have this issue.
Thanks for that, read it all the way through and then checked back on what we do elsewhere, missed having #[cfg(test)] at the top of the test module
The PR is now ready for review: https://github.com/roc-lang/roc/pull/7233
@Anton Could I ask for checks to be run against this PR?
Sure, can you take a look at the merge conflicts first? CI can not be triggered if there are conflicts
OK. That might take me a bit. I'm working on this right now
And then I'll be traveling back to Chicago
Feel free to @ me when you're done :)
I’m not sure what to make of the silence on module names. It is my clear experience that most decisions on acronym casing happens when importing modules, and I don’t think static dispatch will change that significantly. So if everyone was fond of snake case for solving acronym casing (among other things) for functions and variables, why does no one want to discuss solving it for modules?
Do you disagree with my analysis? Or was the acronym casing not all that important to begin with perhaps?
It may be nice to keep module names camel case so they they are easily identifiable in the code. I also think it's fine to just wait to see how the currently proposed changes feel in practice and iterate step by step vs making multiple changes at once.
Sorry, what I've implemented allows for snake_case in ALL lowercase identifiers, so packages/platforms could be lowercase, but modules - which are uppercase identifiers - will remain camelCase
For reference, Rust has snake case module names. But instead of using dot between a module name and a function name, it uses :: of course, which makes it easier to differentiate them.
Rust uses :: for all static member access. I'd personally prefer modules - being records, aka values - to have lowercase identifiers (maybe it's the Gopher in me?). But modules in some ways are types as well.
I do think it’s fair to just push this discussion for a later time, if it’s because it would be nicer to break it into steps:smile:
Anthony Bullard said:
Rust uses
::for all static member access. I'd personally prefer modules - being records, aka values - to have lowercase identifiers (maybe it's the Gopher in me?). But modules in some ways are types as well.
if we did that and kept . for module access, then you could never name a variable list or num or str (etc.) because then you'd be shadowing the (lowercase) module name. That doesn't sound enjoyable to me. :sweat_smile:
we could change to :: for module access like Rust does, but then . for autocomplete works less consistently, beginners have to learn when it's . and when it's :: (the latter would probably also have to be used for custom tag unions, like in Rust) etc.
also it still wouldn't address the "how to uppercase acronyms" question because tags would still be uppercase, so the question would remain for how to uppercase tags in acronyms
my personal view on this is:
Types will still need acronym casting decisions, but the reason I focus on module names is that, in my Elm experience, the vast, vast majority of the time where I need to actually decide on how I want acronyms to be cased, is when writing module names and aliases. For types and functions, I will mostly be consuming them, so I’m not expending decision power. But we always import modules with an alias at work, so I need to make a decision on casing there over and over again.
Anyway, in the grand scope of things, it’s definitely minor. It just struck me that people wanted to solve the casing problem for function names and variables only, when that is, in my view, the least impactful part of the language to solve it in :blush:
sure, but it's no worse than the status quo haha
I agree that it's not much of a selling point for snake_case
At least you could actually have consistent casing before, so in that sense it feels like a regression. It’s really just trading one kind of consistency for another I guess :sweat_smile:
I’m not sure what to make of the silence on module names.
I haven't said anything cause I don't know if I have a useful opinion. In most languages with a feature like static dispatch, I find that I don't type module names too often and when I do it is in an import line that will autocomplete to the correct capitalization. So I'm personally not too worried. I feel like it is up to each codebase and that is ok. I feel much stronger about function and variable names then I do about module names.
@Anton My PR is ready for a test run now, and I also took the opportunity to make it a single _signed_ commit after setting up commit signing.
Thanks @Anthony Bullard, I can do a quick review and test run tomorrow. Because we use some self-hosted CI servers we do need to check for malicious code before approving tests
Sounds totally reasonable to me
I'll look for more issues to tackle while I wait :-)
Awesome!
I saw my PR failed in some of the CI Manager checks, looks like when I update to resolve conflicts, I need to also fix this up:
< Slice { start: 0, length: 0 },
> Slice<roc_parse::ast::CommentOrNewline> { start: 0, length: 0 },
Make sure to do the clippy check as well before you want to run CI:
cargo clippy --workspace --tests -- --deny warnings
Clippy ran, all OK. All tests pass. Build succeeds. All merge conflicts resolved (again; this time it was pretty painful). Test run requested
Done
:fingers_crossed:
All tests passed :tada: , I'll review today
This has been merged, can you mark as complete @Agus Zubiaga ?
I guess the next task would be to update the tutorial and the builtins, and then the roc-lang org owned platforms?
Maybe we should hold off on that until after advent of code?
I think we could get the PRs ready and just hold off on merging?
in the case of the platforms, could actually go ahead and land the change + release since the current releases would be unaffected
I think we could get the PRs ready and just hold off on merging?
They could accumulate substantial conflicts
fair
I don't think there's any downside to updating platforms and packages, is there?
That should be good, but they probably should be based on purity-inference branches if those exist for that package/platform
I definitely would not leave PRs lying around in the compiler repo until PI lands and we are ready for it
I think updating the tutorial to mention that snake case is now preferred for lowercase idents would be good
I think updating the tutorial to mention that snake case is now preferred for lowercase idents would be good
Hmm, snake case has not been extensively tested on platforms and packages. There is also no example code using it, so those seems like good reasons to wait with that.
Fair
they probably should be based on purity-inference branches if those exist for that package/platform
Re purity inference, I haven't started on basic-webserver... I've been holding off until we figure out if the basic-cli thing is purity inference or platform related. I'm starting to think it's specific to the platform.
Also I think basic-webserver needs some love in the upgrade to purity inference to remove/cleanup a bunch of the glue types. It makes the most sense to do all this at the same time and they can also align with the improvements made in basic-cli (i.e. handling IO errors).
Anything I can do to help there, let me know
I haven't been wanting to rush it, and have been prioritising AoC rn.
I'm definitely tracking it and it shouldn't take very long. Just need a few hours free to make it happen.
I've got a nice Christmas break coming up so plan on getting lots of stuff done then (if I dont get distracted :upside_down:)
Last updated: Jun 16 2026 at 16:19 UTC