Logs: liberachat/#haskell
| 2025-10-27 18:37:35 | × | wbrawner quits (~wbrawner@static.56.224.132.142.clients.your-server.de) (*.net *.split) |
| 2025-10-27 18:37:35 | × | pr1sm quits (~pr1sm@24.91.163.31) (*.net *.split) |
| 2025-10-27 18:37:36 | × | notzmv quits (~umar@user/notzmv) (*.net *.split) |
| 2025-10-27 18:37:36 | × | Typosit quits (b41a81e702@2001:bc8:1210:2cd8::494) (*.net *.split) |
| 2025-10-27 18:37:36 | × | m1dnight quits (~m1dnight@d8D861A17.access.telenet.be) (*.net *.split) |
| 2025-10-27 18:37:36 | × | Square quits (~Square4@user/square) (*.net *.split) |
| 2025-10-27 18:37:36 | × | chromoblob quits (~chromoblo@user/chromob1ot1c) (*.net *.split) |
| 2025-10-27 18:37:36 | × | L29Ah quits (~L29Ah@wikipedia/L29Ah) (*.net *.split) |
| 2025-10-27 18:37:36 | × | jzargo2 quits (~jzargo@user/jzargo) (*.net *.split) |
| 2025-10-27 18:37:36 | × | sam113101 quits (~sam@modemcable200.189-202-24.mc.videotron.ca) (*.net *.split) |
| 2025-10-27 18:37:36 | × | Digit quits (~user@user/digit) (*.net *.split) |
| 2025-10-27 18:37:36 | × | EvanR quits (~EvanR@user/evanr) (*.net *.split) |
| 2025-10-27 18:37:36 | × | nek0 quits (~nek0@user/nek0) (*.net *.split) |
| 2025-10-27 18:37:36 | × | haskellbridge quits (~hackager@96.28.224.214) (*.net *.split) |
| 2025-10-27 18:37:36 | × | ridcully quits (~ridcully@p57b52b68.dip0.t-ipconnect.de) (*.net *.split) |
| 2025-10-27 18:37:36 | × | elenril quits (~elenril@tutturu.khirnov.net) (*.net *.split) |
| 2025-10-27 18:37:36 | × | AlexZenon quits (~alzenon@85.174.180.65) (*.net *.split) |
| 2025-10-27 18:37:36 | × | gabriel_sevecek quits (~gabriel@188-167-229-200.dynamic.chello.sk) (*.net *.split) |
| 2025-10-27 18:37:36 | × | op_4 quits (~tslil@user/op-4/x-9116473) (*.net *.split) |
| 2025-10-27 18:37:36 | × | infinity0 quits (~infinity0@pwned.gg) (*.net *.split) |
| 2025-10-27 18:37:36 | × | Beowulf quits (florian@2a01:4f9:3b:2d56::2) (*.net *.split) |
| 2025-10-27 18:37:36 | × | hellwolf quits (~user@6526-1813-95c5-dfbc-0f00-4d40-07d0-2001.sta.estpak.ee) (*.net *.split) |
| 2025-10-27 18:37:36 | × | Leary quits (~Leary@user/Leary/x-0910699) (*.net *.split) |
| 2025-10-27 18:37:36 | op_4_ | is now known as op_4 |
| 2025-10-27 18:37:38 | sam113102 | is now known as sam113101 |
| 2025-10-27 18:40:45 | → | Googulator20 joins (~Googulato@2a01-036d-0106-03fa-d161-d36f-e0e5-1b0a.pool6.digikabel.hu) |
| 2025-10-27 18:40:45 | × | Googulator8 quits (~Googulato@2a01-036d-0106-03fa-d161-d36f-e0e5-1b0a.pool6.digikabel.hu) (Quit: Client closed) |
| 2025-10-27 18:41:33 | × | FANTOM quits (~fantom@212.228.181.156) (Ping timeout: 256 seconds) |
| 2025-10-27 18:41:46 | haskellbridge_ | is now known as haskellbridge |
| 2025-10-27 18:42:32 | <bwe> | `decodeStrictText (utf8OrLatin1ToText "{\"key\": \"value\181\"}") :: Maybe (HMS.HashMap Text Text)` is going from ByteString to Text and starts decoding from Text only |
| 2025-10-27 18:42:33 | <bwe> | https://paste.tomsmeding.com/8iOWvhaT |
| 2025-10-27 18:42:55 | → | FANTOM joins (~fantom@212.228.181.156) |
| 2025-10-27 18:44:03 | → | wbrawner joins (~wbrawner@static.56.224.132.142.clients.your-server.de) |
| 2025-10-27 18:44:49 | EvanR_ | is now known as EvanR |
| 2025-10-27 18:44:58 | <EvanR> | the heck is utf8OrLatin1ToText |
| 2025-10-27 18:45:05 | <EvanR> | sounds dicey |
| 2025-10-27 18:46:04 | <EvanR> | oh you defined it |
| 2025-10-27 18:46:15 | <EvanR> | that sounds liable to explode in your face |
| 2025-10-27 18:46:21 | → | target_i joins (~target_i@user/target-i/x-6023099) |
| 2025-10-27 18:46:34 | → | merijn joins (~merijn@host-vr.cgnat-g.v4.dfn.nl) |
| 2025-10-27 18:46:49 | × | Square2 quits (~Square4@user/square) (Ping timeout: 264 seconds) |
| 2025-10-27 18:46:54 | <EvanR> | decode "{\"key\": \"valueµ\"}" is wrong because mu is not an ascii character |
| 2025-10-27 18:47:05 | <EvanR> | and you're trying to specify a byte string |
| 2025-10-27 18:47:28 | <EvanR> | decode "{\"key\": \"value\181\"}" is wrong because it's not utf-8 |
| 2025-10-27 18:48:43 | × | sord937 quits (~sord937@gateway/tor-sasl/sord937) (Quit: sord937) |
| 2025-10-27 18:49:40 | → | tromp joins (~textual@2001:1c00:3487:1b00:b825:23c0:1f89:fdbd) |
| 2025-10-27 18:49:42 | <bwe> | EvanR: Yes, that's what I've understood now, too. So, how to get a bytestring to utf-8 text safely? |
| 2025-10-27 18:50:01 | <EvanR> | "utf-8 text" is another categorical error |
| 2025-10-27 18:50:37 | <EvanR> | to go from bytestring to text, whose internal encoding you don't need to know, ideally, to have to know how the original text was encoded as a bytestring |
| 2025-10-27 18:51:00 | <EvanR> | guessing can never 100% work, so you just have to decide what to accept, then deal with a decoding error |
| 2025-10-27 18:51:46 | × | wbrawner quits (~wbrawner@static.56.224.132.142.clients.your-server.de) (Ping timeout: 256 seconds) |
| 2025-10-27 18:52:17 | <EvanR> | so the question "get a text from a utf-8 bytestring safely" is valid, and you can use decodeUtf8 :: Bytestring -> Text, or one of the variants that returns Maybe |
| 2025-10-27 18:52:43 | <EvanR> | should have been *you have to know how the original text was encoded |
| 2025-10-27 18:53:27 | × | merijn quits (~merijn@host-vr.cgnat-g.v4.dfn.nl) (Ping timeout: 256 seconds) |
| 2025-10-27 18:53:29 | <EvanR> | but if you are using UTF-8, then first converting to Text before using Aeson sounds like extraneous steps |
| 2025-10-27 18:55:01 | <monochrom> | IMO you should assume/insist UTF-8 in JSON, first of all. Then, it's just a matter of telling Aeson you want Text. Aeson will do the utf-8 decoding for you, for free. |
| 2025-10-27 18:55:54 | <monochrom> | And yeah if you do your own decoding by your own hand before letting Aeson do its job, it's an XY problem. |
| 2025-10-27 18:56:18 | → | wbrawner joins (~wbrawner@static.56.224.132.142.clients.your-server.de) |
| 2025-10-27 18:56:24 | <monochrom> | Or in short, mauke's example is right, just do that. |
| 2025-10-27 18:56:41 | <dminuoso> | monochrom: To be fair, RFC8259 is an internet standard and it imposes UTF8 for open systems (outside closed ecosystems). |
| 2025-10-27 18:56:55 | <dminuoso> | If we adopt RFC terminology, you *MUST* assume UTF-8. |
| 2025-10-27 18:57:34 | → | annamalai joins (~annamalai@2409:4042:4c88:4dce::9e4a:a60c) |
| 2025-10-27 18:57:36 | <monochrom> | Agreed. |
| 2025-10-27 18:58:10 | <monochrom> | OTOH I also have Windows zealots in mind---you know, those who use Windows-1293 or something. |
| 2025-10-27 18:58:51 | <monochrom> | Or Americans---they think the whole world is Latin-1. |
| 2025-10-27 18:58:55 | <dminuoso> | I have no pity for those who suffer from mojibake. |
| 2025-10-27 18:59:02 | <dminuoso> | It is a curable disease. |
| 2025-10-27 18:59:49 | Digitteknohippie | is now known as Digit |
| 2025-10-27 19:00:49 | <dminuoso> | Worrying about Windows-1293 or Latin-1 is like worrying about measles in the year 2025. |
| 2025-10-27 19:01:26 | <monochrom> | Oh, about that. |
| 2025-10-27 19:01:52 | <monochrom> | Both pro-vaxxers and anti-vaxxers agree: Don't worry about measles. >:) |
| 2025-10-27 19:02:13 | <dminuoso> | The Darwinist in me says: Let there be outbreaks. |
| 2025-10-27 19:03:03 | <EvanR> | the rubber meets the road when you're in a trench and have to decide how to decode. And you say, fuckit assuming UTF-8 and rejecting other things is correct. Then someone tells you to use infernal decodeUtf8OrLatin1 |
| 2025-10-27 19:03:11 | <Digit> | monochrom: do any-vaxxers agree too? |
| 2025-10-27 19:03:40 | <EvanR> | lol at "the RFC says you have to" |
| 2025-10-27 19:03:45 | <dminuoso> | Digit: What is an any-vaxxer? Those who just inject anything they can get their hands on? We call those junkies. |
| 2025-10-27 19:04:05 | <EvanR> | there's a documentary about that, Crank with jason statham |
| 2025-10-27 19:04:11 | <monochrom> | haha |
| 2025-10-27 19:04:46 | → | merijn joins (~merijn@host-vr.cgnat-g.v4.dfn.nl) |
| 2025-10-27 19:05:27 | <EvanR> | the corresponding policy would be to attempt to decode the input at any cost, even producing nonsense or suffering security issues |
| 2025-10-27 19:05:34 | <EvanR> | never fail |
| 2025-10-27 19:05:37 | <dminuoso> | EvanR: I think the biggest mistake in the world of decoding is not establishing metadata about the carried encoding in all the file formats. Very few like HTML have done it, in the rest of the world it's a horrible mixture of guesstimation and hope of standard adherence. |
| 2025-10-27 19:05:44 | <monochrom> | If one must accept decodeUtf8OrLatin1, then let's make it an XYZ problem, shall we? (Go big or go home.) Use decodeUtf8OrLatin1, then re-encode to utf-8, then you can give that to Aeson. |
| 2025-10-27 19:05:56 | × | Frostillicus quits (~Frostilli@pool-71-174-119-69.bstnma.fios.verizon.net) (Ping timeout: 256 seconds) |
| 2025-10-27 19:05:58 | <dminuoso> | s/file formats/file- and wireformats/ |
| 2025-10-27 19:06:08 | <EvanR> | dminuoso, that still wouldn't avoid a "failed to decode" code path |
| 2025-10-27 19:06:13 | <haskellbridge> | <loonycyborg> JSON is always utf-8, otherwise it's non-conforming |
| 2025-10-27 19:06:35 | <EvanR> | yes bwe is out on a limb |
| 2025-10-27 19:06:37 | <dminuoso> | EvanR: If a file has a declared encoding and is lying about that, you deserve to be complained to by your users. |
| 2025-10-27 19:06:51 | <dminuoso> | Where you is whoeever is responsible for the software or using it. |
| 2025-10-27 19:06:55 | <EvanR> | dminuoso, this sort of "plan" doesn't sound particularly worth it |
| 2025-10-27 19:07:04 | <dminuoso> | EvanR: It seems to work fine for the rest of the world. |
| 2025-10-27 19:07:08 | <EvanR> | reengineer all file formats ever and then it still doesn't work |
| 2025-10-27 19:07:10 | <haskellbridge> | <loonycyborg> iirc there's a library that can figure out encoding without any metadata |
| 2025-10-27 19:07:13 | <dminuoso> | Try using execve on an email file. |
| 2025-10-27 19:07:17 | <dminuoso> | It wont just "try and make sense of it" |
| 2025-10-27 19:07:25 | <dminuoso> | It will rightfully complain that its not an ELF. |
| 2025-10-27 19:07:44 | <monochrom> | Sounds like something my students may try. :) |
| 2025-10-27 19:07:58 | <haskellbridge> | <loonycyborg> That reminds me how I tried to run a C program directly as a binary when I was a kid |
| 2025-10-27 19:08:09 | <EvanR> | an email might actually end up being mistaken for an ELF file if you try hard enough |
All times are in UTC.