Home freenode/#haskell: Logs Calendar

Logs: freenode/#haskell

←Prev  Next→ 502,152 events total
2021-03-03 21:47:15 × Boomerang quits (~Boomerang@2a05:f6c7:2179:0:9428:7cc:4edb:1705) (Remote host closed the connection)
2021-03-03 21:55:58 × romesrf quits (~romesrf@44.190.189.46.rev.vodafone.pt) (Quit: WeeChat 3.0.1)
2021-03-03 21:56:22 takuan joins (~takuan@178-116-218-225.access.telenet.be)
2021-03-03 21:57:10 × nullniverse quits (~null@unaffiliated/nullniverse) (Remote host closed the connection)
2021-03-03 21:58:29 terrorjack joins (~terrorjac@ec2-54-95-39-30.ap-northeast-1.compute.amazonaws.com)
2021-03-03 21:59:05 conal joins (~conal@64.71.133.70)
2021-03-03 22:00:16 × mputz quits (~Thunderbi@dslb-088-064-063-125.088.064.pools.vodafone-ip.de) (Remote host closed the connection)
2021-03-03 22:00:36 × puffnfresh quits (~puffnfres@119-17-138-164.77118a.mel.static.aussiebb.net) (Ping timeout: 265 seconds)
2021-03-03 22:00:47 _bin joins (~bin@2600:1700:10a1:38d0:922b:34ff:fe99:1283)
2021-03-03 22:02:12 × _bin_ quits (~bin@2600:1700:10a1:38d0:84d4:3c69:b21e:817f) (Ping timeout: 260 seconds)
2021-03-03 22:03:16 wroathe joins (~wroathe@c-68-54-25-135.hsd1.mn.comcast.net)
2021-03-03 22:05:47 × dhouthoo quits (~dhouthoo@ptr-eitgbj2w0uu6delkbrh.18120a2.ip6.access.telenet.be) (Quit: WeeChat 3.0)
2021-03-03 22:05:57 × mananamenos quits (~mananamen@193.red-88-11-66.dynamicip.rima-tde.net) (Ping timeout: 246 seconds)
2021-03-03 22:06:56 × justanotheruser quits (~justanoth@unaffiliated/justanotheruser) (Ping timeout: 240 seconds)
2021-03-03 22:07:13 × jb55 quits (~jb55@gateway/tor-sasl/jb55) (Ping timeout: 268 seconds)
2021-03-03 22:11:27 mananamenos joins (~mananamen@193.red-88-11-66.dynamicip.rima-tde.net)
2021-03-03 22:12:30 jb55 joins (~jb55@gateway/tor-sasl/jb55)
2021-03-03 22:14:14 × malumore quits (~alecs@151.62.127.229) (Ping timeout: 245 seconds)
2021-03-03 22:14:27 × mouseghost quits (~draco@wikipedia/desperek) (Quit: mew wew)
2021-03-03 22:15:50 × Pickchea quits (~private@unaffiliated/pickchea) (Quit: Leaving)
2021-03-03 22:16:48 × deviantfero quits (~deviantfe@190.150.27.58) (Ping timeout: 246 seconds)
2021-03-03 22:16:55 × softwarm quits (44695313@ip68-105-83-19.sd.sd.cox.net) (Quit: Connection closed)
2021-03-03 22:22:24 × nbloomf quits (~nbloomf@2600:1700:ad14:3020:a840:3c23:1bcc:872e) (Quit: My MacBook has gone to sleep. ZZZzzz…)
2021-03-03 22:27:19 × apache801 quits (~rishi@wsip-70-168-153-252.oc.oc.cox.net) (Ping timeout: 260 seconds)
2021-03-03 22:28:12 × aidecoe quits (~aidecoe@unaffiliated/aidecoe) (Remote host closed the connection)
2021-03-03 22:31:12 d34df00d joins (~d34df00d@104-14-27-213.lightspeed.austtx.sbcglobal.net)
2021-03-03 22:32:13 <d34df00d> Hi!
2021-03-03 22:32:48 <d34df00d> I want to scan over a byte string, skipping some bytes (depending on previous bytes), and count the number of bytes I've skipped.
2021-03-03 22:33:54 justanotheruser joins (~justanoth@unaffiliated/justanotheruser)
2021-03-03 22:34:23 <d34df00d> Not counting the removed count is trivial: it's a matter of the following function:
2021-03-03 22:34:27 <d34df00d> https://bpaste.net/55BA
2021-03-03 22:34:57 <d34df00d> It processes about 100 megs per second on my machine, which is, I guess, not stellar, but not too bad either.
2021-03-03 22:35:08 <Rembane> What's the tricky bit?
2021-03-03 22:35:40 <d34df00d> If I now want to actually count the number of bytes I've skipped, it becomes funny: the following function:
2021-03-03 22:35:45 <d34df00d> https://bpaste.net/CP5Q
2021-03-03 22:36:01 <d34df00d> is about 10 times slower and seems to be linear in RAM on the input size.
2021-03-03 22:36:36 × Deide quits (~Deide@217.155.19.23) (Quit: Seeee yaaaa)
2021-03-03 22:36:43 <d34df00d> So, how do I solve this efficiently?
2021-03-03 22:38:32 puffnfresh joins (~puffnfres@119-17-138-164.77118a.mel.static.aussiebb.net)
2021-03-03 22:40:34 × zebrag quits (~inkbottle@aaubervilliers-654-1-83-46.w86-212.abo.wanadoo.fr) (Quit: Konversation terminated!)
2021-03-03 22:40:45 × merijn quits (~merijn@83-160-49-249.ip.xs4all.nl) (Ping timeout: 264 seconds)
2021-03-03 22:40:48 <shapr> My heuristic would be to look at the existing functions in the ByteString library to see what makes them fast
2021-03-03 22:40:57 zebrag joins (~inkbottle@aaubervilliers-654-1-83-46.w86-212.abo.wanadoo.fr)
2021-03-03 22:40:58 apache801 joins (~rishi@wsip-70-168-153-252.oc.oc.cox.net)
2021-03-03 22:41:00 <shapr> or profile the code and look for the hotspots?
2021-03-03 22:41:25 heatsink joins (~heatsink@2600:1700:bef1:5e10:dd5f:6f4f:a50:215d)
2021-03-03 22:41:43 × conal quits (~conal@64.71.133.70) (Quit: Computer has gone to sleep.)
2021-03-03 22:42:35 <d34df00d> It basically is one big hot spot.
2021-03-03 22:42:45 <d34df00d> -p is not too helpful here.
2021-03-03 22:43:00 <Rembane> d34df00d: Do what shapr says, or try to implement your function in terms of foldl' https://hoogle.haskell.org/?=&hoogle=foldl%27%20package%3Abytestring&scope= and see if that makes it faster.
2021-03-03 22:44:48 × kupi quits (uid212005@gateway/web/irccloud.com/x-cgsykjurvegvbxhp) (Quit: Connection closed for inactivity)
2021-03-03 22:45:29 <d34df00d> Aha, foldl'! I guess I'll try that first, since I've already took a stab at profiling this stuff, and looking at the sources of bytestring scares me a little.
2021-03-03 22:45:35 × heatsink quits (~heatsink@2600:1700:bef1:5e10:dd5f:6f4f:a50:215d) (Ping timeout: 240 seconds)
2021-03-03 22:46:02 softwarm joins (44695313@ip68-105-83-19.sd.sd.cox.net)
2021-03-03 22:46:10 <Rembane> d34df00d: But before you throw away all your old code, do some measurements, see how long time it takes so you don't make it slower by mistake.
2021-03-03 22:46:44 conal joins (~conal@64.71.133.70)
2021-03-03 22:47:16 <d34df00d> Yeah, that's always a good idea! In fact I'm always running my code with +RTS -sstderr to see what MUT and GC look like.
2021-03-03 22:47:26 <monochrom> This one is well-known. It benefits greatly from BangPatterns and "!skips", or else you use seq or $! on the RHSes manually.
2021-03-03 22:47:57 <monochrom> And if you use foldl', you still need to know this.
2021-03-03 22:48:09 × conal quits (~conal@64.71.133.70) (Quit: Computer has gone to sleep.)
2021-03-03 22:48:45 <d34df00d> Ah, I forgot to tell I have {-# LANGUAGE Strict #-}
2021-03-03 22:48:55 <monochrom> Because it looks like your future foldl' will be on a tuple. Well foldl' doesn't seq deeply on a tuple.
2021-03-03 22:49:26 <d34df00d> (I guess it should be equivalent in this case to all the bang patterns in the right places, right?)
2021-03-03 22:49:31 × Franciman quits (~francesco@host-82-49-79-189.retail.telecomitalia.it) (Quit: Leaving)
2021-03-03 22:49:31 <monochrom> Hrm, then I don't know. But look at core to confirm skips is non-lazy?
2021-03-03 22:50:01 <monochrom> Actually this is self-contained, I can try it out.
2021-03-03 22:50:05 <d34df00d> Deeper down the rabbit hole it is then!
2021-03-03 22:50:43 × hyperisco quits (~hyperisco@d192-186-117-226.static.comm.cgocable.net) (Ping timeout: 256 seconds)
2021-03-03 22:50:55 <monochrom> BSL = ByteString.Lazy ?
2021-03-03 22:51:15 <d34df00d> Yep.
2021-03-03 22:51:33 <monochrom> What is "first"?
2021-03-03 22:51:37 <d34df00d> Data.Bifunctor
2021-03-03 22:52:04 <d34df00d> Also, am I right that in the first variant (without the tuples and things) the BSL.pack . go . BSL.unpack fused into something O(1)-memory?
2021-03-03 22:53:25 <monochrom> No.
2021-03-03 22:53:32 jamm_ joins (~jamm@unaffiliated/jamm)
2021-03-03 22:53:32 conal joins (~conal@64.71.133.70)
2021-03-03 22:53:32 × conal quits (~conal@64.71.133.70) (Client Quit)
2021-03-03 22:54:10 <d34df00d> Hmm, why was it fast then?
2021-03-03 22:54:27 <d34df00d> And why it wasn't?
2021-03-03 22:54:36 × fendor_ quits (~fendor@77.119.128.81.wireless.dyn.drei.com) (Remote host closed the connection)
2021-03-03 22:54:43 <d34df00d> (it wasn't O(1), that is)
2021-03-03 22:54:51 gitgood joins (~gitgood@82-132-216-44.dab.02.net)
2021-03-03 22:55:13 conal joins (~conal@64.71.133.70)
2021-03-03 22:57:06 × conal quits (~conal@64.71.133.70) (Client Quit)
2021-03-03 22:57:34 × stree quits (~stree@68.36.8.116) (Ping timeout: 245 seconds)
2021-03-03 22:57:35 × jamm_ quits (~jamm@unaffiliated/jamm) (Ping timeout: 240 seconds)
2021-03-03 22:58:01 <Rembane> d34df00d: Is O(1) the size of the original string?
2021-03-03 22:58:05 <d34df00d> Yes.
2021-03-03 22:58:11 <Rembane> Cool.
2021-03-03 22:58:23 <d34df00d> Looks like nope :)
2021-03-03 22:58:44 <Rembane> It doesn't sound unfeasible, but Haskell memory is an interesting beast, lets see what monochrom says. :)
2021-03-03 23:00:27 <monochrom> I don't understand how 100MB is considered "O(1) size".
2021-03-03 23:01:07 <d34df00d> Hmm.
2021-03-03 23:01:08 <d34df00d> Hold on.
2021-03-03 23:01:31 <d34df00d> Nope, don't hold on. In my measurements, it was the size of the original string (that resided in memory anyway).
2021-03-03 23:01:47 <pjb> monochrom: the universe is finite, therefore 100 MB is O(1).
2021-03-03 23:01:57 <d34df00d> That is, replacing this function by `id` didn't change the memory consumption.
2021-03-03 23:02:00 <pjb> monochrom: anything that's inside this universe is O(1).
2021-03-03 23:02:47 <koz_> Suppose I have Foo of kind (Type -> Type) -> Type. If I write 'deriving stock (Generic)', what would the constraints on the generated instance look like?
2021-03-03 23:02:49 conal_ joins (~conal@64.71.133.70)

All times are in UTC.