Home liberachat/#haskell: Logs Calendar

Logs: liberachat/#haskell

←Prev  Next→ 1,803,134 events total
2025-10-27 18:37:35 × wbrawner quits (~wbrawner@static.56.224.132.142.clients.your-server.de) (*.net *.split)
2025-10-27 18:37:35 × pr1sm quits (~pr1sm@24.91.163.31) (*.net *.split)
2025-10-27 18:37:36 × notzmv quits (~umar@user/notzmv) (*.net *.split)
2025-10-27 18:37:36 × Typosit quits (b41a81e702@2001:bc8:1210:2cd8::494) (*.net *.split)
2025-10-27 18:37:36 × m1dnight quits (~m1dnight@d8D861A17.access.telenet.be) (*.net *.split)
2025-10-27 18:37:36 × Square quits (~Square4@user/square) (*.net *.split)
2025-10-27 18:37:36 × chromoblob quits (~chromoblo@user/chromob1ot1c) (*.net *.split)
2025-10-27 18:37:36 × L29Ah quits (~L29Ah@wikipedia/L29Ah) (*.net *.split)
2025-10-27 18:37:36 × jzargo2 quits (~jzargo@user/jzargo) (*.net *.split)
2025-10-27 18:37:36 × sam113101 quits (~sam@modemcable200.189-202-24.mc.videotron.ca) (*.net *.split)
2025-10-27 18:37:36 × Digit quits (~user@user/digit) (*.net *.split)
2025-10-27 18:37:36 × EvanR quits (~EvanR@user/evanr) (*.net *.split)
2025-10-27 18:37:36 × nek0 quits (~nek0@user/nek0) (*.net *.split)
2025-10-27 18:37:36 × haskellbridge quits (~hackager@96.28.224.214) (*.net *.split)
2025-10-27 18:37:36 × ridcully quits (~ridcully@p57b52b68.dip0.t-ipconnect.de) (*.net *.split)
2025-10-27 18:37:36 × elenril quits (~elenril@tutturu.khirnov.net) (*.net *.split)
2025-10-27 18:37:36 × AlexZenon quits (~alzenon@85.174.180.65) (*.net *.split)
2025-10-27 18:37:36 × gabriel_sevecek quits (~gabriel@188-167-229-200.dynamic.chello.sk) (*.net *.split)
2025-10-27 18:37:36 × op_4 quits (~tslil@user/op-4/x-9116473) (*.net *.split)
2025-10-27 18:37:36 × infinity0 quits (~infinity0@pwned.gg) (*.net *.split)
2025-10-27 18:37:36 × Beowulf quits (florian@2a01:4f9:3b:2d56::2) (*.net *.split)
2025-10-27 18:37:36 × hellwolf quits (~user@6526-1813-95c5-dfbc-0f00-4d40-07d0-2001.sta.estpak.ee) (*.net *.split)
2025-10-27 18:37:36 × Leary quits (~Leary@user/Leary/x-0910699) (*.net *.split)
2025-10-27 18:37:36 op_4_ is now known as op_4
2025-10-27 18:37:38 sam113102 is now known as sam113101
2025-10-27 18:40:45 Googulator20 joins (~Googulato@2a01-036d-0106-03fa-d161-d36f-e0e5-1b0a.pool6.digikabel.hu)
2025-10-27 18:40:45 × Googulator8 quits (~Googulato@2a01-036d-0106-03fa-d161-d36f-e0e5-1b0a.pool6.digikabel.hu) (Quit: Client closed)
2025-10-27 18:41:33 × FANTOM quits (~fantom@212.228.181.156) (Ping timeout: 256 seconds)
2025-10-27 18:41:46 haskellbridge_ is now known as haskellbridge
2025-10-27 18:42:32 <bwe> `decodeStrictText (utf8OrLatin1ToText "{\"key\": \"value\181\"}") :: Maybe (HMS.HashMap Text Text)` is going from ByteString to Text and starts decoding from Text only
2025-10-27 18:42:33 <bwe> https://paste.tomsmeding.com/8iOWvhaT
2025-10-27 18:42:55 FANTOM joins (~fantom@212.228.181.156)
2025-10-27 18:44:03 wbrawner joins (~wbrawner@static.56.224.132.142.clients.your-server.de)
2025-10-27 18:44:49 EvanR_ is now known as EvanR
2025-10-27 18:44:58 <EvanR> the heck is utf8OrLatin1ToText
2025-10-27 18:45:05 <EvanR> sounds dicey
2025-10-27 18:46:04 <EvanR> oh you defined it
2025-10-27 18:46:15 <EvanR> that sounds liable to explode in your face
2025-10-27 18:46:21 target_i joins (~target_i@user/target-i/x-6023099)
2025-10-27 18:46:34 merijn joins (~merijn@host-vr.cgnat-g.v4.dfn.nl)
2025-10-27 18:46:49 × Square2 quits (~Square4@user/square) (Ping timeout: 264 seconds)
2025-10-27 18:46:54 <EvanR> decode "{\"key\": \"valueµ\"}" is wrong because mu is not an ascii character
2025-10-27 18:47:05 <EvanR> and you're trying to specify a byte string
2025-10-27 18:47:28 <EvanR> decode "{\"key\": \"value\181\"}" is wrong because it's not utf-8
2025-10-27 18:48:43 × sord937 quits (~sord937@gateway/tor-sasl/sord937) (Quit: sord937)
2025-10-27 18:49:40 tromp joins (~textual@2001:1c00:3487:1b00:b825:23c0:1f89:fdbd)
2025-10-27 18:49:42 <bwe> EvanR: Yes, that's what I've understood now, too. So, how to get a bytestring to utf-8 text safely?
2025-10-27 18:50:01 <EvanR> "utf-8 text" is another categorical error
2025-10-27 18:50:37 <EvanR> to go from bytestring to text, whose internal encoding you don't need to know, ideally, to have to know how the original text was encoded as a bytestring
2025-10-27 18:51:00 <EvanR> guessing can never 100% work, so you just have to decide what to accept, then deal with a decoding error
2025-10-27 18:51:46 × wbrawner quits (~wbrawner@static.56.224.132.142.clients.your-server.de) (Ping timeout: 256 seconds)
2025-10-27 18:52:17 <EvanR> so the question "get a text from a utf-8 bytestring safely" is valid, and you can use decodeUtf8 :: Bytestring -> Text, or one of the variants that returns Maybe
2025-10-27 18:52:43 <EvanR> should have been *you have to know how the original text was encoded
2025-10-27 18:53:27 × merijn quits (~merijn@host-vr.cgnat-g.v4.dfn.nl) (Ping timeout: 256 seconds)
2025-10-27 18:53:29 <EvanR> but if you are using UTF-8, then first converting to Text before using Aeson sounds like extraneous steps
2025-10-27 18:55:01 <monochrom> IMO you should assume/insist UTF-8 in JSON, first of all. Then, it's just a matter of telling Aeson you want Text. Aeson will do the utf-8 decoding for you, for free.
2025-10-27 18:55:54 <monochrom> And yeah if you do your own decoding by your own hand before letting Aeson do its job, it's an XY problem.
2025-10-27 18:56:18 wbrawner joins (~wbrawner@static.56.224.132.142.clients.your-server.de)
2025-10-27 18:56:24 <monochrom> Or in short, mauke's example is right, just do that.
2025-10-27 18:56:41 <dminuoso> monochrom: To be fair, RFC8259 is an internet standard and it imposes UTF8 for open systems (outside closed ecosystems).
2025-10-27 18:56:55 <dminuoso> If we adopt RFC terminology, you *MUST* assume UTF-8.
2025-10-27 18:57:34 annamalai joins (~annamalai@2409:4042:4c88:4dce::9e4a:a60c)
2025-10-27 18:57:36 <monochrom> Agreed.
2025-10-27 18:58:10 <monochrom> OTOH I also have Windows zealots in mind---you know, those who use Windows-1293 or something.
2025-10-27 18:58:51 <monochrom> Or Americans---they think the whole world is Latin-1.
2025-10-27 18:58:55 <dminuoso> I have no pity for those who suffer from mojibake.
2025-10-27 18:59:02 <dminuoso> It is a curable disease.
2025-10-27 18:59:49 Digitteknohippie is now known as Digit
2025-10-27 19:00:49 <dminuoso> Worrying about Windows-1293 or Latin-1 is like worrying about measles in the year 2025.
2025-10-27 19:01:26 <monochrom> Oh, about that.
2025-10-27 19:01:52 <monochrom> Both pro-vaxxers and anti-vaxxers agree: Don't worry about measles. >:)
2025-10-27 19:02:13 <dminuoso> The Darwinist in me says: Let there be outbreaks.
2025-10-27 19:03:03 <EvanR> the rubber meets the road when you're in a trench and have to decide how to decode. And you say, fuckit assuming UTF-8 and rejecting other things is correct. Then someone tells you to use infernal decodeUtf8OrLatin1
2025-10-27 19:03:11 <Digit> monochrom: do any-vaxxers agree too?
2025-10-27 19:03:40 <EvanR> lol at "the RFC says you have to"
2025-10-27 19:03:45 <dminuoso> Digit: What is an any-vaxxer? Those who just inject anything they can get their hands on? We call those junkies.
2025-10-27 19:04:05 <EvanR> there's a documentary about that, Crank with jason statham
2025-10-27 19:04:11 <monochrom> haha
2025-10-27 19:04:46 merijn joins (~merijn@host-vr.cgnat-g.v4.dfn.nl)
2025-10-27 19:05:27 <EvanR> the corresponding policy would be to attempt to decode the input at any cost, even producing nonsense or suffering security issues
2025-10-27 19:05:34 <EvanR> never fail
2025-10-27 19:05:37 <dminuoso> EvanR: I think the biggest mistake in the world of decoding is not establishing metadata about the carried encoding in all the file formats. Very few like HTML have done it, in the rest of the world it's a horrible mixture of guesstimation and hope of standard adherence.
2025-10-27 19:05:44 <monochrom> If one must accept decodeUtf8OrLatin1, then let's make it an XYZ problem, shall we? (Go big or go home.) Use decodeUtf8OrLatin1, then re-encode to utf-8, then you can give that to Aeson.
2025-10-27 19:05:56 × Frostillicus quits (~Frostilli@pool-71-174-119-69.bstnma.fios.verizon.net) (Ping timeout: 256 seconds)
2025-10-27 19:05:58 <dminuoso> s/file formats/file- and wireformats/
2025-10-27 19:06:08 <EvanR> dminuoso, that still wouldn't avoid a "failed to decode" code path
2025-10-27 19:06:13 <haskellbridge> <loonycyborg> JSON is always utf-8, otherwise it's non-conforming
2025-10-27 19:06:35 <EvanR> yes bwe is out on a limb
2025-10-27 19:06:37 <dminuoso> EvanR: If a file has a declared encoding and is lying about that, you deserve to be complained to by your users.
2025-10-27 19:06:51 <dminuoso> Where you is whoeever is responsible for the software or using it.
2025-10-27 19:06:55 <EvanR> dminuoso, this sort of "plan" doesn't sound particularly worth it
2025-10-27 19:07:04 <dminuoso> EvanR: It seems to work fine for the rest of the world.
2025-10-27 19:07:08 <EvanR> reengineer all file formats ever and then it still doesn't work
2025-10-27 19:07:10 <haskellbridge> <loonycyborg> iirc there's a library that can figure out encoding without any metadata
2025-10-27 19:07:13 <dminuoso> Try using execve on an email file.
2025-10-27 19:07:17 <dminuoso> It wont just "try and make sense of it"
2025-10-27 19:07:25 <dminuoso> It will rightfully complain that its not an ELF.
2025-10-27 19:07:44 <monochrom> Sounds like something my students may try. :)
2025-10-27 19:07:58 <haskellbridge> <loonycyborg> That reminds me how I tried to run a C program directly as a binary when I was a kid
2025-10-27 19:08:09 <EvanR> an email might actually end up being mistaken for an ELF file if you try hard enough

All times are in UTC.