Add mod to the builtins · ideas · Zulip Chat Archive

We had a mod at one point but I think we removed it due to confusion and lack of common hardware support. Essentially people from many mainstream languages think they want mod, but actually want and have been using what Roc calls rem.

Then there are cases where the difference doesn't matter (i.e. when the second operand is guaranteed to be positive), in which case mod just happens to be slower than rem.

I suppose we can optimize mod into rem for the cases where their behavior is guaranteed to align.

Kevin Gillette (Oct 12 2023 at 00:55):

That said, are we certain that mod is warranted to reinclude? and if we reinclude it, are we certain that mod is a good name for it, given the confusion and pit of suboptimality that perhaps the majority of people reaching for it will fall into?

Ajai Nelson (Oct 12 2023 at 01:27):

I would prefer to reinclude mod. It's happened more than once that I've needed to define mod myself in languages that only define rem. (And it made me very happy that I didn't have to do that in Elm when I needed it!) It makes sense that one might prefer rem by default because of performance, but personally, I think mod is much nicer mathematically and useful enough that it's worth including.

When doing modular arithmetic in math, I think the thing you generally care about is whether two numbers are the same modulo n (written

x \equiv y \pmod n

). For example,

4 \equiv 7 \pmod 3

because 4 mod 3 and 7 mod 3 are both 1. Similarly,

4 \equiv -2 \pmod 3

because 4 mod 3 and -2 mod 3 are both 1. But if you use rem, that no longer works, since 4 rem 3 = 1 but -2 rem 3 = -2.

Ajai Nelson (Oct 12 2023 at 01:37):

With that said, I might be in the minority. In previous discussions, people said they thought one would want rem the vast majority of the time. That's the opposite of my experience, so maybe I just think about the operation differently than most programmers

Brendan Hansknecht (Oct 12 2023 at 02:25):

The actual mod instruction itself is so slow that the extra work should be very minimal

Brendan Hansknecht (Oct 12 2023 at 02:26):

I think the question should be around usefulness, correct defaults, and clarity to end users.

Brendan Hansknecht (Oct 12 2023 at 02:27):

Declan Joseph Maguire (Oct 12 2023 at 07:23):

This might be a bad idea, but could we bury mod in some deeper submodule, or a more specific module? I feel that standard operations should have standard implementations to prevent people building 3 dozen homebrew implementations (a dozen of which fail some edge case), but if we're afraid people will inevitably misunderstand or misuse it then I'm not sure where it could naturally go that would reliably avoid the problem. I suspect this will be a recurring problem, so we might want to think about how to systematically deal with it.

Kevin Gillette (Oct 13 2023 at 03:27):

i suspect this is more a matter of what we name it rather than where we put it. If we give it the right name, then people who actually need it will know what it means and use it anyways, while people who don't need it will realize it's not what they're looking for based on the name alone.

Kevin Gillette (Oct 13 2023 at 03:28):

People tend to favor using things with shorter names when they have the same prefix.

Ajai Nelson (Oct 13 2023 at 04:01):

If we don't think performance is a concern, are there many reasons to prefer rem over mod? Because to me, rem seems like more of a pitfall. Here's an example from Wikipedia:

Brendan Hansknecht (Oct 13 2023 at 04:02):

I think there is also the complication of do you want módulo or do you want the absolute value of the remainder

Brendan Hansknecht (Oct 13 2023 at 04:03):

I think most of the time people actually want the absolute value of the remainder

Brendan Hansknecht (Oct 13 2023 at 04:04):

For example, get all of the individual digits of a number. There you want the absolute value of the remainder.

Brendan Hansknecht (Oct 13 2023 at 04:06):

Probably we should have signedRem, unsignedRem and euclideanRem(modulo? Trying to think of a longer name to avoid people defaulting to it).

Declan Joseph Maguire (Oct 13 2023 at 07:29):

I like that idea. Give people all the options explicitly, force them to choose if they want to use it. After all, they can just import it with a new local def if they want the pithier name.

Just to spitball another idea, maybe we have a single function that also accepts a tag with the desired behaviour? This is likely a worse solution, but I thought it worth putting forward.

Richard Feldman (Oct 13 2023 at 10:58):

I wanted to try out using % instead, but I'm certainly ok with the more explicit design!

Notification Bot (Oct 13 2023 at 14:41):

Brendan Hansknecht (Oct 13 2023 at 15:08):

At this point, I think that % probably is a foot gun no matter how it is defined. It just means too many different things from various languages, the differences are subtle, and people often don't know which they want.
Wanted unsigned rem:
lowestDigit = \x -> x % 10

Wanted euclidean rem, though others work cause hash is generally a nat:
hashToIndex = \hash, list -> hash % (List.len list)

Honestly not sure of a great example where you would want signed modulo and it would get a different result from unsigned or euclidean. Like I have never written % and wanted a negative number as the result. I am sure there are cases when you want he true division remainder, but all that came to mind for me would be positive reminders, not negative.

I would argue for one of the following apis (Though please give other suggestions). All removing % as an operator.
1)

Num.rem : (Num a, [Signed, Unsigned, Euclidean]) -> Num a
# Here I would also maybe be fine with adding
Num.mod = Num.rem

# for discoverablitly should `rem` go at the beginning of the name so they are grouped alphabetically? `remUnsigned`?
Num.unsignedRem : Num a -> Num a
Num.signedRem : Num a -> Num a
Num.euclideanRem : Num a -> Num a # Maybe call it modulo or mathematicalRem or something else but equally long

3) Not sure this one is doable with the current type system or a good idea, but essentially only allow rem where it would agree with mod. Make users deal with signs always. I guess in this cause we could maybe keep % if it followed the rem semantics.

Num.rem : (Num Unsigned a) -> Num Unsigned a

Richard Feldman (Oct 13 2023 at 15:41):

a thing I like about (2) is that if you're not sure which you want, it gives you pause to read the docs for both

Richard Feldman (Oct 13 2023 at 15:41):

whereas with the other designs, you might just reach for what's most familiar without realizing there are alternatives that differ in some edge cases

Richard Feldman (Oct 13 2023 at 15:42):

personally I like the idea of switching the current design to (2) but with each of them starting with rem____ so they all autocomplete together

Richard Feldman (Oct 13 2023 at 15:42):

Brendan Hansknecht (Oct 13 2023 at 15:43):

Richard Feldman (Oct 13 2023 at 15:47):

mainly because I don't think there's a big ergonomics advantage either way, but it'll run faster in the dev backend

Richard Feldman (Oct 13 2023 at 15:48):

Richard Feldman (Oct 13 2023 at 15:49):

it'd be a different story if there's a configuration option an application might want to set, that might switch between the different modes at runtime, but I don't see that happening in practice haha

Brian Carroll (Oct 13 2023 at 17:47):

An argument against starting all the names with rem is that whenever I see people asking for "modulo" they are from a mathematical background and used to modulo arithmetic, and would search for that and not "remainder".

Brian Carroll (Oct 13 2023 at 17:48):

Jacob (Oct 22 2023 at 05:54):

Yeah, calling it mod makes more sense. Having mod and rem separate is confusing. Also, rem as the name doesn't really make much sense. Is mod just the operation and remainder is the result of the mod operation with certain assumptions? If that's true, maybe just have a sensible default (unsigned floor/euclidean per the wiki article) mod and then have the others. Although while writing that I get the want to separate into rem since writing all the variants of mod in the function name is pretty long.

Jacob (Oct 22 2023 at 06:34):

Also probably makes sense to add the different forms of integer division, which is similar. fmod fdiv, emod ediv, tmod tdiv, etc. for each rounding convention (with better names too lol).

Jacob (Oct 22 2023 at 07:02):

The only valid distinction I can think of for having separate mod and rem is that rem implies the definition of the performed mod is just the remainder of division, in which it makes sense to have a mod 0 be [divByZero]. Then mod would have a mod 0 = a.

Brian Carroll (Oct 22 2023 at 08:39):

No, they're different operations. They give the same result for positive numbers, but different results for negative numbers.

Brian Carroll (Oct 22 2023 at 09:10):

Jacob (Oct 22 2023 at 19:21):

They aren't different other than the fact that you chose the parameters of the modulo operation differently. My understanding is that the rem operation is just the modulo but with truncation used as the rounding convention rather than euclidean or flooring or something else. Modulo is irrespective of the rounding convention other than the fact that some PL's decided to arbitrarily call a specific rounding convention just mod or just rem. However there is a difference in implication of the definition used (e.g remainder after division rather than another formulism). Similar to how there are different variations of integer division depending on the rounding convention used, but its all integer division still.

Jacob (Oct 22 2023 at 19:27):

That's why to me it only makes sense to have a separate rem function to heavily imply you are using only the remainder after division definition, and hence why you get [divByZero] for a mod 0 rather than a. Which is a valid distinction, but not sure if it comes up a lot actually.

Brendan Hansknecht (Oct 22 2023 at 19:49):

While it is true that all of these are technically modulo operations, I don't think that is a useful property to focus on.

Technically speaking, modulo is an operation that returns the remainder. That said, when many people see the term mod they do not refer to the modulo operation. They refer to modular arithmetic. They are thinking about values wrapping around at a modulus (thus always a positive result). This is made much worse by the various terms that different programming languages use. mod is a very overloaded term.

Brendan Hansknecht (Oct 22 2023 at 20:00):

I think for practical programming usefulness, there are 2 main wanted operations (and a 3rd occasionally wanted operation).

1) Getting the Euclidean remainder. This is what you would normally get with modular arithmetic. It is always positive. a = bq + r and 0 ≤ r < |b|
2) Getting the signed remainder. a = bq + r and |r| < |b|. (note this generally is truncated, but could be floored or rounded)
3) Less important 3rd operations. Getting the unsigned remainder. Same as 2, but you take the absolute value after getting 2.

Brendan Hansknecht (Oct 22 2023 at 20:02):

I believe for performance reasons, 2 should stay truncated. Would need to double check some assembly instructions, but I think it is the only form of 2 supported by hardware.

Jacob (Oct 22 2023 at 20:18):

Yeah I think having a sensible default makes sense (euclidean division and a mod 0 is a). But whether its a good or bad thing other math PL's like lean also have various definitions of integer modulus using various rounding conventions which are called mod. just prepended by the first letter of the rounding convention used. So just because the word mod is in the function name doesn't necessarily mean it has to mean the number theoretic definition.

Brendan Hansknecht (Oct 22 2023 at 20:21):

Richard Feldman (Oct 23 2023 at 00:00):

I think Brendan's first API suggestion from earlier (except rename it from Num.rem to Num.mod would address :point_up: as well as the other concerns about mod vs rem, yeah?

Richard Feldman (Oct 23 2023 at 00:01):

so the idea would be to remove the current mod/rem operations from builtins, as well as the % operator, and replace them all with:

Num.mod : Num a, [Signed, Unsigned, Euclidean] -> Num a

Richard Feldman (Oct 23 2023 at 00:03):

Richard Feldman (Oct 23 2023 at 00:04):

although maybe |> Num.mod 2 Euclidean is clear enough? Or arguably even clearer? :thinking:

Brendan Hansknecht (Oct 23 2023 at 01:31):

I feel like I approve of that API currently, but will later be kinda annoyed if I use mod in a much of places and have to put Euclidean everywhere.....

That said, helper functions are a thing some file is only using 1 version of mod. So SGTM.

Brian Carroll (Oct 23 2023 at 05:32):

Brian Carroll (Oct 23 2023 at 05:38):

I have to admit, I don't actually know how valid this argument is because I don't really have mathematical background myself!

If there's anyone here who does, what do you think? If we implement this function with the tag for Signed/Unsigned/Euclidean, would mod or rem both make sense or is one of them better?

Sven van Caem (Oct 23 2023 at 11:35):

Num.mod : Num a, Num a, [Signed, Unsigned, Euclidean] -> Num a

Richard Feldman (Oct 23 2023 at 12:04):

timotree (Dec 18 2023 at 00:12):

As a data point, the Lean programming language which is targeted towards a mathematical audience settled on Euclidean division and remainder. The consensus was that besides for use-cases where you're trying to implement the semantics of another programming language, Euclidean division and remainder were the "correct" functions, and it is worth executing a few extra instructions to get the correct behavior. Discussion thread from their Zulip

timotree (Dec 18 2023 at 00:17):

Sorry if I'm missing the answer to this earlier in the thread, but is there a use case where Euclidean division and remainder give an undesired answer? If the only downside to those operations is that they can't be compiled down to a single instruction, it would seem compatible with Roc's overall approach to still make x / y and x % y use the "correct" functions, Num.divEuclid and Num.remEuclid, and have named functions like Num.divTrunc and Num.remTrunc for micro-optimization

Brendan Hansknecht (Dec 18 2023 at 00:26):

Euclidean can be wrong in the case where you are trying to get the digits of a numbers. That will give an undesired answer for negative numbers. Cause Num.remEuclid -34 10 is 6, but the user really wanted the 4.

Brendan Hansknecht (Dec 18 2023 at 00:26):

timotree (Dec 18 2023 at 00:26):

Personally, this isn't an operation that I would ever expect n % 10 to denote. I would be fine with having the shortest way to write this be Num.abs n % 10

Brendan Hansknecht (Dec 18 2023 at 00:27):

Oh for sure. Just one case I know of that is easy to mess up with Euclidean division. Would have to think more for other cases.

timotree (Dec 18 2023 at 00:30):

Actually I suppose it really does need to be written Num.abs (Num.remTrunc n 10) because Num.abs Num.minI32 % 10 is an overflow error

Brendan Hansknecht (Dec 18 2023 at 00:33):

Another case with array bounds that I have seen: do you actually want x % len to work with negative values? In some cases you intend for negative values to access from the back of the array. In many cases you assume negative values are impossible and getting a negative value there would be better to crash. So preserving the negative there and forcing the user to handle it could be safer.

Brendan Hansknecht (Dec 18 2023 at 00:33):

Cause if you expect negatives you probably will hit the crash quickly in a test or real code and write the extra code to handle it.

Brendan Hansknecht (Dec 18 2023 at 00:33):

If you don't expect negatives, you may miss it and have wrong elements returned silently until you figure out the bug.

timotree (Dec 18 2023 at 00:35):

It seems unlikely to me that you're expecting negatives are impossible but using a signed integer type

Brendan Hansknecht (Dec 18 2023 at 00:41):

This is why go doesn't support negative indices automatically wrapping like you see in python
They thought it would be too error prone and easy to miss.

Brendan Hansknecht (Dec 18 2023 at 00:41):

In roc, probably not too likely. But I have seen lots of good arguments for using I64 as your index type instead of U64.

Brendan Hansknecht (Dec 18 2023 at 00:42):

Anyway, that is pretty tangential, just trying to think of examples/cases. I think I personally be fine with picking Euclidean and just having others as named functions.

Fabian Schmalzried (Dec 20 2023 at 14:51):

Could we allow the infix % only for unsigned integers? And add a nice error message when using it with signed integers?

Becker A. (Jan 04 2024 at 01:28):

not sure if others feel this way, but in my mind rem/mod goes hand-in-hand with div. Would div & // be changed in a similar way as described above for rem/mod & %?

Brendan Hansknecht (Jan 04 2024 at 02:13):

Brendan Hansknecht (Jan 04 2024 at 02:14):

Becker A. (Jan 04 2024 at 02:34):

the gains would just be consistency & clarity among the standard library builtins, i.e. to avoid confusion as much as possible.

specifically, both div* and rem are related functions with a similar problem:

because of how similar & related rem and div are, it feels to me somewhat odd to implement them asymmetrically with different APIs (unless there was strong reasoning to do so). i.e.,

Num.mod: Int a, Int a, [Signed, Unsigned, Euclidian]

Num.divInt: Int a, Int a, [Signed, Unsigned, Euclidian]

(refactored Num a -> Int a & to use divInt since div is taken for Frac numbers)

Brendan Hansknecht (Jan 04 2024 at 03:13):

I don't think the same tags would work. It is not clear what the different div results would be. As in does signed, unsigned, or Euclidean map ro floor/trunc/etc?

Brendan Hansknecht (Jan 04 2024 at 03:15):

I think the difference between the tag or not is around correctness. I think it is much easier to pick the wrong mod function without realizing and be confused by correctness bugs than to pick the wrong form of div. The forms of div are clear as standalone functions. The forms of mod have nuance that leads to bugs and require careful comparison of the variants. As such, a tag leads to mod having a single documentation that covers all variants to help avoid bugs. Div doesn't need this treatment as each variant is clear by itself.

Richard Feldman (Jan 04 2024 at 03:22):

Richard Feldman (Jan 04 2024 at 03:24):

this is relevant historical context because consistency was already considered, and the decision was that it was worth sacrificing consistency for the sake of making an error-prone API less so

Becker A. (Jan 04 2024 at 03:25):

I agree with all the comments re: the infix operators. I was thinking less about the infix ops and more about the syntactic-sugar-free API

Becker A. (Jan 04 2024 at 03:26):

Richard Feldman (Jan 04 2024 at 03:26):

Num.divInt: Int a, Int a, [Signed, Unsigned, Euclidian]

in general this seems like a reasonable potential option, but I don't think the names Signed, Unsigned, and Euclidian are as clear here as names like Floor, Ceiling, and Truncate would be

Becker A. (Jan 04 2024 at 03:28):

sure, IIUC I think that was the core of what @Brendan Hansknecht was saying as well

Becker A. (Jan 04 2024 at 03:32):

I guess I was just thinking that mod and div are so closely related, that it'd be kind of a shame to not preserve that relation in the implementation. but on second thought I might just be picking at a nit with this thought

Becker A. (Jan 04 2024 at 03:33):

anyway, I wanted to pose the thought just to see how others felt about it. thanks for reading :pray:

Brendan Hansknecht (Jan 04 2024 at 03:40):

Makes sense. Thanks for bringing it up. Definitely good to consider when we update the api

Stream: ideas

Topic: Add mod to the builtins

Isaac Van Doren (Oct 04 2023 at 01:31):

Isaac Van Doren (Oct 04 2023 at 01:38):

Isaac Van Doren (Oct 04 2023 at 01:41):

Anton (Oct 04 2023 at 09:14):

Isaac Van Doren (Oct 04 2023 at 11:44):

Kevin Gillette (Oct 12 2023 at 00:53):

Kevin Gillette (Oct 12 2023 at 00:55):

Ajai Nelson (Oct 12 2023 at 01:27):

Ajai Nelson (Oct 12 2023 at 01:37):

Brendan Hansknecht (Oct 12 2023 at 02:25):

Brendan Hansknecht (Oct 12 2023 at 02:25):

Brendan Hansknecht (Oct 12 2023 at 02:25):

Brendan Hansknecht (Oct 12 2023 at 02:26):

Brendan Hansknecht (Oct 12 2023 at 02:27):

Declan Joseph Maguire (Oct 12 2023 at 07:23):

Kevin Gillette (Oct 13 2023 at 03:27):

Kevin Gillette (Oct 13 2023 at 03:28):

Ajai Nelson (Oct 13 2023 at 04:01):

Brendan Hansknecht (Oct 13 2023 at 04:02):

Brendan Hansknecht (Oct 13 2023 at 04:03):

Brendan Hansknecht (Oct 13 2023 at 04:04):

Brendan Hansknecht (Oct 13 2023 at 04:06):

Declan Joseph Maguire (Oct 13 2023 at 07:29):

Richard Feldman (Oct 13 2023 at 10:58):

Richard Feldman (Oct 13 2023 at 10:58):

Notification Bot (Oct 13 2023 at 14:41):

Brendan Hansknecht (Oct 13 2023 at 15:08):

Richard Feldman (Oct 13 2023 at 15:41):

Richard Feldman (Oct 13 2023 at 15:41):

Richard Feldman (Oct 13 2023 at 15:41):

Richard Feldman (Oct 13 2023 at 15:42):

Richard Feldman (Oct 13 2023 at 15:42):

Brendan Hansknecht (Oct 13 2023 at 15:43):

Richard Feldman (Oct 13 2023 at 15:47):

Richard Feldman (Oct 13 2023 at 15:48):

Richard Feldman (Oct 13 2023 at 15:49):

Brian Carroll (Oct 13 2023 at 17:47):

Brian Carroll (Oct 13 2023 at 17:48):

Jacob (Oct 22 2023 at 05:54):

Jacob (Oct 22 2023 at 06:34):

Jacob (Oct 22 2023 at 07:02):

Brian Carroll (Oct 22 2023 at 08:39):

Brian Carroll (Oct 22 2023 at 09:10):

Jacob (Oct 22 2023 at 19:21):

Jacob (Oct 22 2023 at 19:27):

Brendan Hansknecht (Oct 22 2023 at 19:49):

Brendan Hansknecht (Oct 22 2023 at 20:00):

Brendan Hansknecht (Oct 22 2023 at 20:02):

Jacob (Oct 22 2023 at 20:18):

Brendan Hansknecht (Oct 22 2023 at 20:21):

Brendan Hansknecht (Oct 22 2023 at 20:21):

Richard Feldman (Oct 23 2023 at 00:00):

Richard Feldman (Oct 23 2023 at 00:01):

Richard Feldman (Oct 23 2023 at 00:03):

Richard Feldman (Oct 23 2023 at 00:03):

Richard Feldman (Oct 23 2023 at 00:04):

Brendan Hansknecht (Oct 23 2023 at 01:31):

Brian Carroll (Oct 23 2023 at 05:32):

Brian Carroll (Oct 23 2023 at 05:38):

Sven van Caem (Oct 23 2023 at 11:35):

Richard Feldman (Oct 23 2023 at 12:04):

timotree (Dec 18 2023 at 00:12):

timotree (Dec 18 2023 at 00:17):

Brendan Hansknecht (Dec 18 2023 at 00:26):

Brendan Hansknecht (Dec 18 2023 at 00:26):

timotree (Dec 18 2023 at 00:26):

timotree (Dec 18 2023 at 00:26):

Brendan Hansknecht (Dec 18 2023 at 00:27):

timotree (Dec 18 2023 at 00:30):

Brendan Hansknecht (Dec 18 2023 at 00:33):

Brendan Hansknecht (Dec 18 2023 at 00:33):

Brendan Hansknecht (Dec 18 2023 at 00:33):

timotree (Dec 18 2023 at 00:35):

Brendan Hansknecht (Dec 18 2023 at 00:41):

Brendan Hansknecht (Dec 18 2023 at 00:41):

Brendan Hansknecht (Dec 18 2023 at 00:42):

Fabian Schmalzried (Dec 20 2023 at 14:51):

Becker A. (Jan 04 2024 at 01:28):