Home freenode/#haskell: Logs Calendar

Logs: freenode/#haskell

←Prev  Next→ 502,152 events total
2020-11-17 14:43:39 × oish quits (~charlie@228.25.169.217.in-addr.arpa) (Ping timeout: 272 seconds)
2020-11-17 14:43:43 <merijn> PacoV: Ah, then you probably will want to read this (which applies to basically all programming languages): https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
2020-11-17 14:45:32 × mr_yogurt quits (~mr_yogurt@5.61.211.35.bc.googleusercontent.com) (Ping timeout: 256 seconds)
2020-11-17 14:46:13 <PacoV> Ho, I remember reading this a while ago.
2020-11-17 14:46:21 <PacoV> I'll give it a second read.
2020-11-17 14:46:30 <PacoV> Thanks!
2020-11-17 14:46:47 <PacoV> BTW, my code works like a charm!
2020-11-17 14:47:34 × mlugg quits (c3c2162d@195.194.22.45) (Ping timeout: 245 seconds)
2020-11-17 14:47:35 <PacoV> Time to go buy some food before my better half comes back and I'll read those two articles! Thanks again!
2020-11-17 14:49:16 × christo quits (~chris@81.96.113.213) (Remote host closed the connection)
2020-11-17 14:51:18 <int-e> . o O ( "works like a charm" -- if you believe in it strongly enough, under the right circumstances, after a ritual sacrifice )
2020-11-17 14:52:32 mr_yogurt joins (~mr_yogurt@5.61.211.35.bc.googleusercontent.com)
2020-11-17 14:52:50 darjeeling_ joins (~darjeelin@122.245.211.11)
2020-11-17 14:52:56 <hekkaidekapus> merijn: You are apparently on a roll, carry on over at <https://github.com/haskell/haskell-language-server/pull/602>.
2020-11-17 14:53:21 <merijn> ...
2020-11-17 14:53:38 <hekkaidekapus> heh
2020-11-17 14:53:46 × bjobjo quits (~bjobjo@2a01:79c:cebf:d688::9e6) (Read error: Connection reset by peer)
2020-11-17 14:53:49 <fendor> hekkaidekapus, did I do it wrong?
2020-11-17 14:53:58 <merijn> fendor: Well, yes
2020-11-17 14:54:00 bjobjo joins (~bjobjo@2a01:79c:cebf:d688::9e6)
2020-11-17 14:54:10 <fendor> welp, I tried
2020-11-17 14:54:33 <hekkaidekapus> fendor: Sorry, I’m short on time. But merijn will take care of you :)
2020-11-17 14:54:36 cfricke joins (~cfricke@unaffiliated/cfricke)
2020-11-17 14:54:38 <merijn> with-utf8 is a terrible package that makes me wanna stab people
2020-11-17 14:54:54 <fendor> it was recommended on that reddit thread >_>
2020-11-17 14:55:19 <merijn> Yes, because the internet is filled with clueless people >.>
2020-11-17 14:55:19 <int-e> stabbing, hmm, does it have lenses?
2020-11-17 14:55:44 <merijn> fendor: The problem is there is a no-win scenario
2020-11-17 14:56:07 <int-e> locales are a terrible idea
2020-11-17 14:56:22 <merijn> fendor: Basically, *some* systems have broken environments/configurations were the encoding isn't specified, then causes GHC to open handles with the wrong encoding leading to encoding errors
2020-11-17 14:56:50 <fendor> right, so far I understood it
2020-11-17 14:57:01 <merijn> fendor: The problem with "always UTF-8" is that it intentionally breaks for everyone who has a properly configured system with encoding different than utf-8
2020-11-17 14:57:05 <int-e> for files, the encoding should be part of the file, not implicit in the environment
2020-11-17 14:57:42 acarrico joins (~acarrico@dhcp-68-142-39-249.greenmountainaccess.net)
2020-11-17 14:58:13 <int-e> merijn: there is no such thing :P
2020-11-17 14:58:23 <merijn> fendor: So if someone in Japan is using a UTF-16 (or UTF-32) configuration system wide or something, then you are now unable to open their files, because they're not UTF-8, but you're overriding the environment
2020-11-17 14:58:33 <fendor> and in this case, the error is more likely that the encoding of the stdout handle is wrong and it should suffice to explicitly set it, right?
2020-11-17 14:58:41 nbloomf joins (~nbloomf@2600:1700:ad14:3020:95c1:f982:82e4:2d79)
2020-11-17 14:58:43 <merijn> int-e: "encodings should be part of the file" <- sure, agreed, but that's not the world we live in
2020-11-17 14:59:04 <merijn> "intentionally breaking a feature that has existed for over 30 years to control this" is not the right solution
2020-11-17 14:59:24 <merijn> fendor: Well, but how do you know what encoding the terminal connected to stdout expects?
2020-11-17 14:59:37 × brodie quits (~brodie@207.53.253.137) (Quit: brodie)
2020-11-17 14:59:54 <merijn> fendor: You can set stdout to UTF-8 and write stuff to it, but if the terminal connected to stdout doesn't *expect* utf-8, you're outputting garbage
2020-11-17 15:00:01 × Suigintou quits (~Suigintou@92.223.89.101) ()
2020-11-17 15:00:12 <fendor> right.
2020-11-17 15:00:51 brodie joins (~brodie@207.53.253.137)
2020-11-17 15:01:00 <merijn> The only standard way to figure out what the terminal expects is to check the locale, which is what GHC does to figure out the right encoding
2020-11-17 15:01:21 vicfred joins (~vicfred@unaffiliated/vicfred)
2020-11-17 15:01:32 <int-e> I'd be okay with the locale specifying what happens on terminals... it even makes sense. But AFAICS it tends to be used for everything else as well, including files that may be transferred between systems, and that makes it a huge mess unless everybody agrees on the same encoding.
2020-11-17 15:01:34 <fendor> so, this is a user error that the user needs to fix?
2020-11-17 15:01:53 christo joins (~chris@81.96.113.213)
2020-11-17 15:02:10 <merijn> There was a related discussion on GHC gitlab, lemme look it up
2020-11-17 15:02:26 × vicfred quits (~vicfred@unaffiliated/vicfred) (Max SendQ exceeded)
2020-11-17 15:02:44 <merijn> How do I search for tickets I commented on on gitlab?
2020-11-17 15:02:51 × jollygood2 quits (~bc8165ab@217.29.117.252) (Quit: http://www.okay.uz/ (Session timeout))
2020-11-17 15:02:56 vicfred joins (~vicfred@unaffiliated/vicfred)
2020-11-17 15:03:06 × acarrico quits (~acarrico@dhcp-68-142-39-249.greenmountainaccess.net) (Ping timeout: 260 seconds)
2020-11-17 15:03:12 renzhi joins (~renzhi@2607:fa49:655f:e600::28da)
2020-11-17 15:04:06 × vicfred quits (~vicfred@unaffiliated/vicfred) (Max SendQ exceeded)
2020-11-17 15:04:08 Sgeo joins (~Sgeo@ool-18b982ad.dyn.optonline.net)
2020-11-17 15:04:37 vicfred joins (~vicfred@unaffiliated/vicfred)
2020-11-17 15:05:02 acarrico joins (~acarrico@dhcp-68-142-39-249.greenmountainaccess.net)
2020-11-17 15:05:21 <merijn> fendor: Related discussion: https://gitlab.haskell.org/ghc/ghc/-/issues/17755
2020-11-17 15:05:46 <merijn> fendor: In essence the problem is that the use has his/her locale unset and/or set to the 'C' locale, which errors on non-ascii
2020-11-17 15:06:19 <fendor> merijn, thanks will read it later!
2020-11-17 15:06:21 SanchayanMaity joins (~Sanchayan@106.201.35.233)
2020-11-17 15:07:34 <merijn> fendor: As for the issue you reference in the PR that is a cursed problem anyway
2020-11-17 15:07:36 nut joins (~user@roc37-h01-176-170-197-243.dsl.sta.abo.bbox.fr)
2020-11-17 15:08:00 hackage http-client 0.7.3 - An HTTP client engine https://hackage.haskell.org/package/http-client-0.7.3 (MichaelSnoyman)
2020-11-17 15:08:13 <merijn> fendor: The problem there is "can't open a file with an umlaut in the path" and I can already tell you know it's *impossible* to correctly and robustly fix/avoid this problem
2020-11-17 15:08:26 <fendor> merijn, you make me sad :(
2020-11-17 15:08:37 <fendor> thanks for the explanation! I have a better understanding now!
2020-11-17 15:08:47 <merijn> fendor: Linux (possibly all of posix) has made the retarded choice to say paths are "unspecified bytes not containg NUL or /"
2020-11-17 15:09:01 hackage http-client-openssl 0.3.3 - http-client backend using the OpenSSL library. https://hackage.haskell.org/package/http-client-openssl-0.3.3 (MichaelSnoyman)
2020-11-17 15:09:15 <merijn> fendor: Since there is no encoding information in the filesystem/file API you have no clue what encoding was used to create the file
2020-11-17 15:09:42 × SanchayanMaity quits (~Sanchayan@106.201.35.233) (Client Quit)
2020-11-17 15:09:44 × da39a3ee5e6b4b0d quits (~da39a3ee5@cm-171-98-79-192.revip7.asianet.co.th) (Ping timeout: 272 seconds)
2020-11-17 15:09:51 <merijn> And even if you know that the file was created with UTF-8 and you are using UTF-8 you *still* can't reliably open it
2020-11-17 15:10:01 SanchayanMaity joins (~Sanchayan@106.201.35.233)
2020-11-17 15:10:17 <merijn> Because ü has (at least?) two representations. As a single codepoint and as a composed codepoint
2020-11-17 15:10:31 <merijn> And the produced byte encoding in UTF-8 is different for those two
2020-11-17 15:10:59 <merijn> So depending on *how* the user types in ü you may get different byte sequence and thus non-existent paths
2020-11-17 15:11:02 <merijn> Fun times!
2020-11-17 15:11:12 da39a3ee5e6b4b0d joins (~da39a3ee5@cm-171-98-79-192.revip7.asianet.co.th)
2020-11-17 15:11:55 <merijn> Windows was much smarter in specifying NTFS paths to be UTF-16 with a defined normalisation scheme, so you can unambiguously know how to access paths with unicode characters
2020-11-17 15:13:11 <merijn> fendor: Making people sad is what I do. People tell me what they wanna do in the POSIX API and then I spend 10 minutes telling them they're fundamentally doomed because everything is terrible :)
2020-11-17 15:14:38 <__monty__> Fun fact, linux and macOS tend to default to opposite normalization schemes for filenames, lots of fun to be had with rsync between those systems.
2020-11-17 15:16:17 Lycurgus joins (~niemand@cpe-45-46-134-163.buffalo.res.rr.com)
2020-11-17 15:16:25 × darjeeling_ quits (~darjeelin@122.245.211.11) (Ping timeout: 264 seconds)
2020-11-17 15:16:29 × brodie quits (~brodie@207.53.253.137) (Quit: brodie)
2020-11-17 15:18:33 × invaser quits (~Thunderbi@31.148.23.125) (Ping timeout: 256 seconds)
2020-11-17 15:18:55 <nut> I have an English dictionary file encoded as utf8. There's also an index file giving the offset for each word. If I use Data.Text.IO to read in the dicionary, how can I make use of the offset info for an efficient lookup?
2020-11-17 15:18:58 × mputz quits (~Thunderbi@dslb-084-058-211-084.084.058.pools.vodafone-ip.de) (Ping timeout: 260 seconds)
2020-11-17 15:20:06 <merijn> nut: offset in *what*
2020-11-17 15:20:21 <nut> offset in the dictionary
2020-11-17 15:20:47 <nut> so that when people do lookup, they dont' have to pass the dictionary again and again
2020-11-17 15:20:49 <Lycurgus> likely byte offset in a flat file
2020-11-17 15:20:52 <merijn> offset in what? bytes? unicode codepoints? characters? lines?
2020-11-17 15:20:57 <nut> bytes
2020-11-17 15:21:23 <merijn> You can't really index Text in terms of bytes

All times are in UTC.