Home freenode/#haskell: Logs Calendar

Logs: freenode/#haskell

←Prev  Next→ 502,152 events total
2020-11-17 15:21:49 <nut> ah, then it could be character, let me check
2020-11-17 15:21:50 <merijn> You can read it as ByteString, do byte indexing on that, then selectively decode text starting from an offset
2020-11-17 15:22:02 <merijn> nut: If it's character you're doomed too :)
2020-11-17 15:22:05 cosimone joins (~cosimone@2001:b07:ae5:db26:d849:743b:370b:b3cd)
2020-11-17 15:22:07 <dolio> Once it's in Text all the offsets could be wrong anyway.
2020-11-17 15:23:12 <nut> so for a Data.Text string, there's no way to move some kind of pointer within the string right?
2020-11-17 15:23:35 <merijn> nut: You can index Text "by codepoint", maybe
2020-11-17 15:24:08 Entertainment joins (~entertain@104.246.132.210)
2020-11-17 15:24:22 <nut> so basically fseek equivalent
2020-11-17 15:24:38 <merijn> nut: The real, honest answer is that: in every single programming language indexing strings is a broken clusterfuck you cannot rely on to do anything sensible (even though it may appear to do something sensible if you only ever look at ascii)
2020-11-17 15:25:19 conal joins (~conal@64.71.133.70)
2020-11-17 15:25:22 nados joins (~dan@69-165-210-185.cable.teksavvy.com)
2020-11-17 15:25:41 <nut> The offset idea does seem efficient. Without it, how do Haskell manage quick lookup?
2020-11-17 15:26:02 × SanchayanMaity quits (~Sanchayan@106.201.35.233) (Quit: leaving)
2020-11-17 15:26:20 <merijn> nut: Like I said, if the offset is in bytes you can easily read a bytestring and index that and then decode to Text "on demand"
2020-11-17 15:26:21 SanchayanMaity joins (~Sanchayan@106.201.35.233)
2020-11-17 15:26:34 <nut> ok I see
2020-11-17 15:26:46 <nut> So i'll use the bytestring package instead of text
2020-11-17 15:26:54 Deide joins (~Deide@217.155.19.23)
2020-11-17 15:26:59 britva joins (~britva@31-10-157-156.cgn.dynamic.upc.ch)
2020-11-17 15:27:02 × conal quits (~conal@64.71.133.70) (Read error: Connection reset by peer)
2020-11-17 15:27:15 is_null joins (~jpic@pdpc/supporter/professional/is-null)
2020-11-17 15:27:27 <nut> You gave me the hint to use text instead of bytestring a few hours ago before i went to the dentist
2020-11-17 15:27:28 <merijn> nut: More practically for a deictionary I'd just read in the entire thing and create a Map
2020-11-17 15:27:35 × SanchayanMaity quits (~Sanchayan@106.201.35.233) (Client Quit)
2020-11-17 15:27:45 conal joins (~conal@64.71.133.70)
2020-11-17 15:27:50 SanchayanMaity joins (~Sanchayan@106.201.35.233)
2020-11-17 15:28:27 <nut> merijn: that would mean in memory lookup
2020-11-17 15:28:39 <nut> merijn: How would you then serialize the thing?
2020-11-17 15:28:55 × SanchayanMaity quits (~Sanchayan@106.201.35.233) (Client Quit)
2020-11-17 15:29:03 × da39a3ee5e6b4b0d quits (~da39a3ee5@cm-171-98-79-192.revip7.asianet.co.th) (Ping timeout: 265 seconds)
2020-11-17 15:29:12 SanchayanMaity joins (~Sanchayan@106.201.35.233)
2020-11-17 15:29:38 <merijn> nut: I'd just write the entire thing to disk at once and read it in at once
2020-11-17 15:30:06 <merijn> Rather than dynamically indexing an open file. You *can* dynamically index the file, but that doesn't seem worth it unless it's truly massive
2020-11-17 15:30:56 <nut> Most dictionary files I;ve seem have some sofisticated file formate
2020-11-17 15:31:04 Guest_85 joins (5181d645@host81-129-214-69.range81-129.btcentralplus.com)
2020-11-17 15:31:22 <nut> Such as the stardcit file formate or dictd.org
2020-11-17 15:31:57 bitmapper joins (uid464869@gateway/web/irccloud.com/x-asjzblgwwtdcvjsz)
2020-11-17 15:32:18 <nut> It's not massive, a few hundred M only. But I want to find out for the sake of learning
2020-11-17 15:32:23 <merijn> nut: Ah, but *that* sounds more like a different question, that sounds like "how would I parse complicated/sophisticated file formats into something usable?"
2020-11-17 15:33:25 <nut> Those file formats are design to have less disk access times and at the same time quick search time
2020-11-17 15:33:53 <merijn> @hoogle hSeek
2020-11-17 15:33:53 <lambdabot> System.IO hSeek :: Handle -> SeekMode -> Integer -> IO ()
2020-11-17 15:33:53 <lambdabot> GHC.IO.Handle hSeek :: Handle -> SeekMode -> Integer -> IO ()
2020-11-17 15:33:53 <lambdabot> UnliftIO.IO hSeek :: MonadIO m => Handle -> SeekMode -> Integer -> m ()
2020-11-17 15:33:55 <merijn> @hoogle hGet
2020-11-17 15:33:55 <lambdabot> Data.ByteString hGet :: Handle -> Int -> IO ByteString
2020-11-17 15:33:55 <lambdabot> Data.ByteString.Char8 hGet :: Handle -> Int -> IO ByteString
2020-11-17 15:33:55 <lambdabot> Data.ByteString.Lazy hGet :: Handle -> Int -> IO ByteString
2020-11-17 15:34:18 × SanchayanMaity quits (~Sanchayan@106.201.35.233) (Quit: leaving)
2020-11-17 15:34:22 <nut> Indeed, at first I though there would be a Data.Text.hSeek
2020-11-17 15:34:30 darjeeling_ joins (~darjeelin@122.245.211.11)
2020-11-17 15:34:36 <merijn> nut: If you open a file Handle you can use hSeek to jump to offsets to read bytes from there in the file, the same way you would in other languages
2020-11-17 15:34:38 SanchayanMaity joins (~Sanchayan@106.201.35.233)
2020-11-17 15:34:39 × toorevitimirp quits (~tooreviti@117.182.180.118) (Remote host closed the connection)
2020-11-17 15:34:57 × morbeus quits (vhamalai@gateway/shell/tkk.fi/x-sygopmpjleahuvxk) (Remote host closed the connection)
2020-11-17 15:34:59 <merijn> nut: You might also be interested in:
2020-11-17 15:35:01 <merijn> @hackage binary
2020-11-17 15:35:01 <lambdabot> https://hackage.haskell.org/package/binary
2020-11-17 15:35:19 <merijn> nut: Which is a library for decoding ByteString into custom data
2020-11-17 15:35:39 <merijn> @hackage attoparsec
2020-11-17 15:35:39 <lambdabot> https://hackage.haskell.org/package/attoparsec
2020-11-17 15:36:28 <dolio> You can just use the hSeek from base. Text doesn't need to provide its own.
2020-11-17 15:36:52 <merijn> dolio: Of course hSeek and then trying to read a String is *also* cursed :p
2020-11-17 15:37:08 <nut> There is no hSeek from base
2020-11-17 15:37:18 <merijn> System.IO.hSeek ?
2020-11-17 15:37:24 <nut> at least not from Prelude
2020-11-17 15:37:45 <dolio> Prelude doesn't export everything in base.
2020-11-17 15:37:48 <nut> i see
2020-11-17 15:39:17 × SanchayanMaity quits (~Sanchayan@106.201.35.233) (Client Quit)
2020-11-17 15:39:28 <merijn> Prelude only exports a fraction of base :)
2020-11-17 15:39:28 × kritzefitz quits (~kritzefit@fw-front.credativ.com) (Read error: Connection timed out)
2020-11-17 15:43:22 × ericsagn1 quits (~ericsagne@2405:6580:0:5100:d6bc:df2c:ba38:451b) (Ping timeout: 260 seconds)
2020-11-17 15:44:15 × Guest_85 quits (5181d645@host81-129-214-69.range81-129.btcentralplus.com) (Remote host closed the connection)
2020-11-17 15:44:30 hackage hedn 0.3.0.2 - EDN parsing and encoding https://hackage.haskell.org/package/hedn-0.3.0.2 (AlexanderBondarenko)
2020-11-17 15:44:49 royal_screwup21 joins (52254809@gateway/web/cgi-irc/kiwiirc.com/ip.82.37.72.9)
2020-11-17 15:45:17 idhugo joins (~idhugo@80-62-116-101-mobile.dk.customer.tdc.net)
2020-11-17 15:46:37 × MarcelineVQ quits (~anja@198.254.202.72) (Ping timeout: 260 seconds)
2020-11-17 15:48:16 × Tario quits (~Tario@201.192.165.173) (Read error: Connection reset by peer)
2020-11-17 15:50:09 MarcelineVQ joins (~anja@198.254.202.72)
2020-11-17 15:51:33 × Franciman quits (~francesco@host-82-56-223-169.retail.telecomitalia.it) (Quit: Leaving)
2020-11-17 15:53:15 <tomjaguarpaw> merijn: Compact regions didn't help with my GC problem in the end because I realised my test cases are also generating large amounts of data! However, I did manage to combine your System.Mem.performGC and GHC.Stats suggestions with RTS options to good effect: https://stackoverflow.com/a/64878595/997606
2020-11-17 15:54:25 knupfer joins (~Thunderbi@i59F7FFD9.versanet.de)
2020-11-17 15:55:24 ericsagn1 joins (~ericsagne@2405:6580:0:5100:9c16:5b76:e160:ad6d)
2020-11-17 15:55:41 jfredett joins (~jfredett@178.162.212.214)
2020-11-17 15:56:34 kritzefitz joins (~kritzefit@fw-front.credativ.com)
2020-11-17 15:56:46 × christo quits (~chris@81.96.113.213) (Remote host closed the connection)
2020-11-17 15:57:14 oish joins (~charlie@228.25.169.217.in-addr.arpa)
2020-11-17 15:58:51 zebrag joins (~inkbottle@aaubervilliers-654-1-89-20.w86-212.abo.wanadoo.fr)
2020-11-17 16:00:03 carlomagno1 joins (~cararell@148.87.23.11)
2020-11-17 16:00:03 × carlomagno quits (~cararell@148.87.23.10) (Remote host closed the connection)
2020-11-17 16:00:26 Rudd0 joins (~Rudd0@185.189.115.98)
2020-11-17 16:02:23 <merijn> tomjaguarpaw: Well, to be fair,if your code is producing lots of data, then perhaps including it in your benchmarks isn't so wrong :p
2020-11-17 16:04:41 nuncanada joins (~dude@179.235.160.168)
2020-11-17 16:05:01 christo joins (~chris@81.96.113.213)
2020-11-17 16:05:12 Stanley00 joins (~stanley00@unaffiliated/stanley00)
2020-11-17 16:05:38 asthasr joins (~asthasr@162.210.29.120)
2020-11-17 16:07:30 × sord937 quits (~sord937@gateway/tor-sasl/sord937) (Remote host closed the connection)
2020-11-17 16:07:40 christo_ joins (~chris@81.96.113.213)
2020-11-17 16:07:42 × christo quits (~chris@81.96.113.213) (Read error: Connection reset by peer)

All times are in UTC.