Stream: beginners

Topic: Code review?


view this post on Zulip Ian McLerran (Jan 23 2024 at 20:46):

So I'm currently working on putting together an ISO date/time parsing Roc package, and just hit a major milestone: full date support (no time or date/time yet). Big ask, but just wondering if anyone would be interested in giving me a code review for the work so far?

view this post on Zulip Isaac Van Doren (Jan 24 2024 at 02:38):

Awesome!

view this post on Zulip Isaac Van Doren (Jan 24 2024 at 02:40):

In parseCalendarDateBasic I would recommend pattern matching on the strs. Then you can express that it should be 3 elements and bring all of those elements into scope at the same time. That way you can avoid having to get each element and unwrapping

view this post on Zulip Isaac Van Doren (Jan 24 2024 at 03:17):

For example:

parseCalendarDateBasic : Str -> Result Utc [InvalidDateFormat]
parseCalendarDateBasic = \str ->
    when splitStrAtIndices str [4, 6] is
        [year, month, day] -> doStuff year month day
        _ -> Err InvalidDateFormat

view this post on Zulip Isaac Van Doren (Jan 24 2024 at 03:22):

If a function takes a lambda as the last argument you don't need to wrap it in parens. I'd say that's even the idiomatic way to write.

So you can do things like this and they look visually cleaner

List.map myList \elem ->
    elem * 2

view this post on Zulip Richard Feldman (Jan 24 2024 at 04:39):

for the tests, you can avoid all the unwraps by comparing to Ok, e.g.

expect unwrap (parseDate "2024-01-23") "Could not parse!" == @Utc (Num.toU128 ((19_723 + 22) * secondsPerDay * nanosPerSecond))

...becomes

expect parseDate "2024-01-23" == Ok (@Utc (Num.toU128 ((19_723 + 22) * secondsPerDay * nanosPerSecond)))

or, alternatively:

expect parseDate "2024-01-23" == ((19_723 + 22) * secondsPerDay * nanosPerSecond) |> Num.toU127 |> @Utc |> Ok

view this post on Zulip Richard Feldman (Jan 24 2024 at 04:44):

also, pattern matching on tuples can be used instead of if allOk [...] then followed by unwrap, e.g.

when (year, month, day) is
    (Ok y, Ok m, Ok d) ->
        numDaysSinceEpoch { year: y, month: m, day: d }
        |> daysToNanos |> @Utc |> Ok

    (_, _, _) -> Err InvalidDateFormat

view this post on Zulip Richard Feldman (Jan 24 2024 at 04:45):

that's a performance win because List requires a heap allocation, but tuples don't! :smiley:

view this post on Zulip Richard Feldman (Jan 24 2024 at 04:47):

btw thanks for making this! At work we already know of a future use case where we're going to want to be able to parse ISO-8601 datetime strings in Roc, so this is very nice to see :heart_eyes_cat:

view this post on Zulip Richard Feldman (Jan 24 2024 at 04:47):

I started to work on one at some point but did not get as far as you have already!

view this post on Zulip Richard Feldman (Jan 24 2024 at 04:51):

oh! Also, I think Str.split can be used in place of splitStrAtDelimiter if I'm not mistaken

view this post on Zulip Brendan Hansknecht (Jan 24 2024 at 05:47):

A few minor suggestions:

  1. In general, I would advise removing unwrap from your library overall. For a library, there is almost always a better choice for how to handle/propagate errors.
  2. change monthDaysNonLeap to a function, it will be a lot faster. That or just a flat array and search it linearly.
  3. Use List.all results Result.isOk here
  4. For strSplitAtIndices, I would convert to a List U8, split with List.sublist/takeFirst/dropFirst then convert back to a string. It will lead to simpler code.
  5. My rewrite isLeapYear using boolean and (&&) instead of if conditions and explicitly returning Bool.true/false
  6. In numLeapYearsSinceEpoch, List.range will materialize the entire list, probably fine, but may be slow/wasteful for large number of years.
  7. Why make both numDaysSinceEpoch and numDaysSinceEpochToYMD. Given the optional record params, they really have the same api.
  8. Avoid checking the length of something and then unwrapping all of the List.get functions. It is not good style. Instead, use list pattern matching with when ... is.

view this post on Zulip Brendan Hansknecht (Jan 24 2024 at 05:49):

  1. I would generally recommend against using Nat. It is actually slotted for removal in the language. I believe for this package, U16 should be good, but U32 or U64 would be fine if wanted.

view this post on Zulip Ian McLerran (Jan 24 2024 at 14:40):

Wow, thank you all for the code reviews. Can't say enough how much I appreciate everyone's time to look through my code and provide feedback. This is all awesome feedback. :star_struck:

Isaac and Richard, the tips on pattern matching multiple elements here are a huge help. Both of these will lead to much cleaner code and better error handling than my solution, and if we can get more performant with a tuple, awesome!!

Brendan, thanks for the in depth list of improvements. All of these are great tips. I'll be diving into making all these improvements later today! :working_on_it:

view this post on Zulip Ian McLerran (Jan 24 2024 at 14:48):

Btw Richard, great to hear this is something that could be useful at your work! I noticed there didn't seem to be any ISO date/time packages out there, and thought it could be a useful addition to ecosystem. Awesome to hear this could be filling a needed gap!

view this post on Zulip Ian McLerran (Jan 24 2024 at 14:51):

Also... Not sure how I missed Str.split in the API... :sweat_smile: I think I started out looking for a splitStrAtIndex equivalent, and when I didn't find that, it just went into my brain that there wan't a split function for Str. Derp!

view this post on Zulip Ian McLerran (Jan 24 2024 at 22:52):

Okay, all recommendations have been implemented! Thanks again for the review everyone!

While removing unwrap, I also removed all crash keyword usages in general. There is one instance in splitStrAtIndicesRecur where I think a crash may actually be the best failure mode, but I'm curious what others think. For now it will simply ignore the failure (which should never occur) and try to continue.

Latest code is in the repo :octopus:

view this post on Zulip Brendan Hansknecht (Jan 25 2024 at 00:16):

If it truly should never occur. Thus, it would be a bug against the library if it occurs, crash is probably the right choice.

view this post on Zulip Brendan Hansknecht (Jan 25 2024 at 00:21):

That said, in this case, you do have a bug:

expect
    res = splitStrAtIndex "🔥🔥" 1
    res == ["🔥", "🔥"]

view this post on Zulip Brendan Hansknecht (Jan 25 2024 at 00:22):

If the case was a crash, it would be hit here.

view this post on Zulip Brendan Hansknecht (Jan 25 2024 at 00:22):

And this is actually exactly why we don't have Str.splitAtIndex in the standard library.

view this post on Zulip Brendan Hansknecht (Jan 25 2024 at 00:23):

Not sure if that can be hit through the public api of your package, but that is the concern.

view this post on Zulip Richard Feldman (Jan 25 2024 at 00:35):

if you're up for it, I think an approach worth trying is to start by immediately converting to List U8 as the very first thing, and then doing all parsing in terms of that

view this post on Zulip Richard Feldman (Jan 25 2024 at 00:37):

that has the benefit of also working on raw byte streams, which is what Decode uses

view this post on Zulip Richard Feldman (Jan 25 2024 at 00:37):

so you could support parsing them from both Str as well as from List U8

view this post on Zulip Richard Feldman (Jan 25 2024 at 00:40):

to do that you'd convert things like "-" to '-' because single quote literals just compile directly to numbers (specifically a unicode code point)

view this post on Zulip Richard Feldman (Jan 25 2024 at 00:41):

I think we should probably start recommending parsing List U8 as a best practice, even if the input is a string! :big_smile:

view this post on Zulip Ian McLerran (Jan 25 2024 at 14:12):

Brendan Hansknecht said:

Not sure if that can be hit through the public api of your package, but that is the concern.

Good call here - It was possible previously to hit this by putting a multibyte character in one of the character positions where a digit would have been expected in the date string. I have now added validation to ensure the date string contains no multi-byte characters.

view this post on Zulip Ian McLerran (Jan 25 2024 at 14:49):

@Richard Feldman Yeah, I'm totally up for it. I'm beginning work on making those adaptations now. Looks like it might be in my best interest to figure out Decoders. Any pointers on where I can go for a code sample of custom decoders in use?

view this post on Zulip Richard Feldman (Jan 25 2024 at 14:50):

that's a good question! I forget if we have any material on that - maybe @Luke Boswell or @Brendan Hansknecht might know?

view this post on Zulip Ian McLerran (Jan 25 2024 at 15:16):

Doesn’t even need to be learning material, if there is any “production” code that uses decoders, I can start parsing through that…

view this post on Zulip Brendan Hansknecht (Jan 25 2024 at 15:16):

I don't think a decoder would work here? They are more meant for decoding to/from a generic format like json or protobif.

view this post on Zulip Brendan Hansknecht (Jan 25 2024 at 15:17):

Also, look up TotallyNotJson in the repo.

view this post on Zulip Richard Feldman (Jan 25 2024 at 15:20):

you could do something like make an opaque Timestamp type which implements Decoding for strings

view this post on Zulip Richard Feldman (Jan 25 2024 at 15:21):

that would give someone a way to decode a record from JSON while also decoding some of the fields as ISO-8601 timestamps

view this post on Zulip Richard Feldman (Jan 25 2024 at 15:21):

I'm not sure if that's actually the best design, but it's a thought!

view this post on Zulip Brendan Hansknecht (Jan 25 2024 at 15:27):

Yeah, definitely can do that. Then a timestamp can be decoded from a string in any generic data format. Useful for end users, not useful for making the parsing nicer in this library.

This has come up a few times with Decode... I wonder if It is named poorly. I wonder if the serde folks regularly get the same question.

view this post on Zulip Ian McLerran (Jan 25 2024 at 15:31):

Thanks Brendon. I’ll take a look at it. From a brief scan at the Decode docs page, I thought there might be a more generic application with the custom decoders.

Richard, that is an interesting idea. Sounds like it could make for a pretty streamlined API for a common use case of ISO strings. Just to clarify, would timestamp be an opaque type which replaces the Utc opaque I’m currently using?

view this post on Zulip Richard Feldman (Jan 25 2024 at 15:32):

it's a good question, I'm not sure what would be more ergonomic for end users :thinking:

view this post on Zulip Richard Feldman (Jan 25 2024 at 15:32):

I think so, yeah

view this post on Zulip Richard Feldman (Jan 25 2024 at 15:32):

like Timestamp := Utc implements [Decoding { ...etc }]

view this post on Zulip Ian McLerran (Jan 25 2024 at 15:36):

Is there any reason I could not (or should not) simply define Utc := U128 implements [Decoding { …etc }]?

view this post on Zulip Richard Feldman (Jan 25 2024 at 15:37):

you can, but the trick is that you have to pick 1 way to decode it

view this post on Zulip Richard Feldman (Jan 25 2024 at 15:38):

so if you do it that way, you're saying Utc only decodes from an ISO-8601 timestamp as opposed to an integer

view this post on Zulip Richard Feldman (Jan 25 2024 at 15:38):

which maybe is what you want, but personally a thing I like about UTC is that I can treat it as an ordinary integer :big_smile:

view this post on Zulip Ian McLerran (Jan 25 2024 at 15:38):

Gotcha. Yeah, that doesn’t make a lot of sense to do… :rolling_eyes:

view this post on Zulip Ian McLerran (Jan 25 2024 at 15:43):

Just out of curiosity, why was Utc implemented as unsigned? Seems like a date pre-epoch could be useful to store. But maybe in terms of typical comparable dates, recent dates are almost exclusively to be expected, and older dates likely only need a textual representation…

view this post on Zulip Richard Feldman (Jan 25 2024 at 16:28):

yeah actually I can see an argument for signed! :+1:

view this post on Zulip Ian McLerran (Jan 25 2024 at 16:44):

Hmm well I just plan to stick with U128 for compatibility. Keeps things simpler for me too.

view this post on Zulip Brendan Hansknecht (Jan 25 2024 at 17:02):

Feel free to push a PR to the roc projects to change UTC to signed

view this post on Zulip Ian McLerran (Jan 25 2024 at 20:08):

Brendan Hansknecht Anywhere I should make PRs besides roc-lang/basic-cli and roc-lang/basic-webserver?

view this post on Zulip Brendan Hansknecht (Jan 25 2024 at 20:52):

I think just those two

view this post on Zulip Ian McLerran (Jan 29 2024 at 17:13):

So while working to migrate the basic-webserver to use I128 instead of U128 for Utc, I discovered a bug in InternalDateTime.epochMillisToDateTimeHelp which causes it to fail to correctly convert millis to DateTime for the last day of a month or year.

I opened an issue on the repo, and submitted a corresponding PR.

view this post on Zulip Ian McLerran (Jan 30 2024 at 17:44):

Brendan Hansknecht said:

Feel free to push a PR to the roc projects to change UTC to signed

I've opened a PR for the issue on the basic-cli repo. Changes to the webserver are more involved, but should be ready for a PR soon(TM).

view this post on Zulip Ian McLerran (Jan 30 2024 at 19:15):

Okay, a matching PR for the webserver is up.

view this post on Zulip Anton (Jan 31 2024 at 10:23):

We're looking for some additional input on a change in this PR where the type of Utc was changed from U128 to I128.

deltaAsNanos calculates the absolute difference between two Utc timestamps. The signature of deltaAsNanos was changed from Utc, Utc -> U128 to Utc, Utc -> I128. This is because Num.absDiff returns the same type as its arguments (Num a, Num a -> Num a).

absDiff will never actually return a negative number. I'd like to make this behavior clear in the signature of deltaAsNanos and cast the result of absDiff with Num.toU128. I believe this prevents confusion cause someone may otherwise see the I128 and wonder how this delta can be negative. It may also lead to wrong assumptions about how this function works. In general I also think it's good practice to avoid overly broad types.

On the other hand, the cast is not strictly necessary and it is kind of nice to use U128 everywhere and never need to cast for calculations.

Thoughts?

view this post on Zulip Brendan Hansknecht (Jan 31 2024 at 15:44):

Yeah, I wish the roc type system had a bit more power so you could specify a same bit width number but signed or unsigned.

view this post on Zulip Brendan Hansknecht (Jan 31 2024 at 15:46):

Personally, I would probably force them both into U128 even if they are negative and then run abs diff.

Should get the same abs diff, avoid the chance of abs overflowing, and get a more accurate type overall.

view this post on Zulip Anton (Jan 31 2024 at 15:52):

If only one is negative the abs diff would be different though

view this post on Zulip Brendan Hansknecht (Jan 31 2024 at 15:58):

...ah yeah, you have to do some extra handling cause the negative number are put above the positive ones. (Would have to double check, but I think it may just be converting then flipping the first bit)

view this post on Zulip Brendan Hansknecht (Jan 31 2024 at 16:03):

Cause imagining in I8/U8...

I8 inputs

-7 -> 11111001
12 -> 00001100

As U8 with same bit pattern

249
12

Flip first bit to put positives above negatives

121 -> 01111001
140 -> 10001100

Get abs diff:
19

view this post on Zulip Brendan Hansknecht (Jan 31 2024 at 16:04):

Yeah, so I would do that

view this post on Zulip Anton (Jan 31 2024 at 16:21):

Is it really worth the increase in complexity to reduce the probability of overflow, given that we're dealing with timestamps? With an I128 we're still very comfortable to represent the age of the universe in nanoseconds.

view this post on Zulip Brendan Hansknecht (Jan 31 2024 at 16:25):

Guess I didn't realize the scale of an I128.

view this post on Zulip Brendan Hansknecht (Jan 31 2024 at 16:27):

That said, I really don't think it is that complex:

aCast = Num.bitwiseXor (Num.toU128 a) (Num.shiftLeftBy 1 127)
bCast = Num.bitwiseXor (Num.toU128 b) (Num.shiftLeftBy 1 127)
Num.absDiff aCast bCast

view this post on Zulip Brendan Hansknecht (Jan 31 2024 at 16:28):

Personally though, I would really love if this could be done in Num.absDiff internally and it could automatically return the unsized type to ensure that it will never panic.

view this post on Zulip Ian McLerran (Jan 31 2024 at 22:01):

Thanks for the input here guys. Brendan, I'll take your solution for handling oposing signs in the delta functions, and update both my PRs accordingly.

view this post on Zulip Ian McLerran (Jan 31 2024 at 22:12):

Changes are up, just waiting on CI.

view this post on Zulip Anne Archibald (Feb 04 2024 at 20:32):

Anton said:

We're looking for some additional input on a change in this PR where the type of Utc was changed from U128 to I128.

deltaAsNanos calculates the absolute difference between two Utc timestamps. The signature of deltaAsNanos was changed from Utc, Utc -> U128 to Utc, Utc -> I128. This is because Num.absDiff returns the same type as its arguments (Num a, Num a -> Num a).

absDiff will never actually return a negative number. I'd like to make this behavior clear in the signature of deltaAsNanos and cast the result of absDiff with Num.toU128. I believe this prevents confusion cause someone may otherwise see the I128 and wonder how this delta can be negative. It may also lead to wrong assumptions about how this function works. In general I also think it's good practice to avoid overly broad types.

On the other hand, the cast is not strictly necessary and it is kind of nice to use U128 everywhere and never need to cast for calculations.

Thoughts?

Maybe I have an unusual perspective on time, having been a pulsar astronomer and having worked on Astropy's Time objects (which use a pair of double-precision numbers to provide sufficient precision), but are times before the UNIX epoch not of interest? Wouldn't a signed number of nanoseconds actually be useful in it's own right?

I realise that date formatting gets complicated for the distant past, but honestly no more so than the present era with our leap seconds. And how long before someone tries to parse the date of the Moon landing or the "shot heard round the world"?

view this post on Zulip Brendan Hansknecht (Feb 04 2024 at 21:04):

Yeah, we just changed it to signed.


Last updated: Jul 26 2025 at 12:14 UTC