Logs: freenode/#haskell
| 2021-03-03 21:47:15 | × | Boomerang quits (~Boomerang@2a05:f6c7:2179:0:9428:7cc:4edb:1705) (Remote host closed the connection) |
| 2021-03-03 21:55:58 | × | romesrf quits (~romesrf@44.190.189.46.rev.vodafone.pt) (Quit: WeeChat 3.0.1) |
| 2021-03-03 21:56:22 | → | takuan joins (~takuan@178-116-218-225.access.telenet.be) |
| 2021-03-03 21:57:10 | × | nullniverse quits (~null@unaffiliated/nullniverse) (Remote host closed the connection) |
| 2021-03-03 21:58:29 | → | terrorjack joins (~terrorjac@ec2-54-95-39-30.ap-northeast-1.compute.amazonaws.com) |
| 2021-03-03 21:59:05 | → | conal joins (~conal@64.71.133.70) |
| 2021-03-03 22:00:16 | × | mputz quits (~Thunderbi@dslb-088-064-063-125.088.064.pools.vodafone-ip.de) (Remote host closed the connection) |
| 2021-03-03 22:00:36 | × | puffnfresh quits (~puffnfres@119-17-138-164.77118a.mel.static.aussiebb.net) (Ping timeout: 265 seconds) |
| 2021-03-03 22:00:47 | → | _bin joins (~bin@2600:1700:10a1:38d0:922b:34ff:fe99:1283) |
| 2021-03-03 22:02:12 | × | _bin_ quits (~bin@2600:1700:10a1:38d0:84d4:3c69:b21e:817f) (Ping timeout: 260 seconds) |
| 2021-03-03 22:03:16 | → | wroathe joins (~wroathe@c-68-54-25-135.hsd1.mn.comcast.net) |
| 2021-03-03 22:05:47 | × | dhouthoo quits (~dhouthoo@ptr-eitgbj2w0uu6delkbrh.18120a2.ip6.access.telenet.be) (Quit: WeeChat 3.0) |
| 2021-03-03 22:05:57 | × | mananamenos quits (~mananamen@193.red-88-11-66.dynamicip.rima-tde.net) (Ping timeout: 246 seconds) |
| 2021-03-03 22:06:56 | × | justanotheruser quits (~justanoth@unaffiliated/justanotheruser) (Ping timeout: 240 seconds) |
| 2021-03-03 22:07:13 | × | jb55 quits (~jb55@gateway/tor-sasl/jb55) (Ping timeout: 268 seconds) |
| 2021-03-03 22:11:27 | → | mananamenos joins (~mananamen@193.red-88-11-66.dynamicip.rima-tde.net) |
| 2021-03-03 22:12:30 | → | jb55 joins (~jb55@gateway/tor-sasl/jb55) |
| 2021-03-03 22:14:14 | × | malumore quits (~alecs@151.62.127.229) (Ping timeout: 245 seconds) |
| 2021-03-03 22:14:27 | × | mouseghost quits (~draco@wikipedia/desperek) (Quit: mew wew) |
| 2021-03-03 22:15:50 | × | Pickchea quits (~private@unaffiliated/pickchea) (Quit: Leaving) |
| 2021-03-03 22:16:48 | × | deviantfero quits (~deviantfe@190.150.27.58) (Ping timeout: 246 seconds) |
| 2021-03-03 22:16:55 | × | softwarm quits (44695313@ip68-105-83-19.sd.sd.cox.net) (Quit: Connection closed) |
| 2021-03-03 22:22:24 | × | nbloomf quits (~nbloomf@2600:1700:ad14:3020:a840:3c23:1bcc:872e) (Quit: My MacBook has gone to sleep. ZZZzzz…) |
| 2021-03-03 22:27:19 | × | apache801 quits (~rishi@wsip-70-168-153-252.oc.oc.cox.net) (Ping timeout: 260 seconds) |
| 2021-03-03 22:28:12 | × | aidecoe quits (~aidecoe@unaffiliated/aidecoe) (Remote host closed the connection) |
| 2021-03-03 22:31:12 | → | d34df00d joins (~d34df00d@104-14-27-213.lightspeed.austtx.sbcglobal.net) |
| 2021-03-03 22:32:13 | <d34df00d> | Hi! |
| 2021-03-03 22:32:48 | <d34df00d> | I want to scan over a byte string, skipping some bytes (depending on previous bytes), and count the number of bytes I've skipped. |
| 2021-03-03 22:33:54 | → | justanotheruser joins (~justanoth@unaffiliated/justanotheruser) |
| 2021-03-03 22:34:23 | <d34df00d> | Not counting the removed count is trivial: it's a matter of the following function: |
| 2021-03-03 22:34:27 | <d34df00d> | https://bpaste.net/55BA |
| 2021-03-03 22:34:57 | <d34df00d> | It processes about 100 megs per second on my machine, which is, I guess, not stellar, but not too bad either. |
| 2021-03-03 22:35:08 | <Rembane> | What's the tricky bit? |
| 2021-03-03 22:35:40 | <d34df00d> | If I now want to actually count the number of bytes I've skipped, it becomes funny: the following function: |
| 2021-03-03 22:35:45 | <d34df00d> | https://bpaste.net/CP5Q |
| 2021-03-03 22:36:01 | <d34df00d> | is about 10 times slower and seems to be linear in RAM on the input size. |
| 2021-03-03 22:36:36 | × | Deide quits (~Deide@217.155.19.23) (Quit: Seeee yaaaa) |
| 2021-03-03 22:36:43 | <d34df00d> | So, how do I solve this efficiently? |
| 2021-03-03 22:38:32 | → | puffnfresh joins (~puffnfres@119-17-138-164.77118a.mel.static.aussiebb.net) |
| 2021-03-03 22:40:34 | × | zebrag quits (~inkbottle@aaubervilliers-654-1-83-46.w86-212.abo.wanadoo.fr) (Quit: Konversation terminated!) |
| 2021-03-03 22:40:45 | × | merijn quits (~merijn@83-160-49-249.ip.xs4all.nl) (Ping timeout: 264 seconds) |
| 2021-03-03 22:40:48 | <shapr> | My heuristic would be to look at the existing functions in the ByteString library to see what makes them fast |
| 2021-03-03 22:40:57 | → | zebrag joins (~inkbottle@aaubervilliers-654-1-83-46.w86-212.abo.wanadoo.fr) |
| 2021-03-03 22:40:58 | → | apache801 joins (~rishi@wsip-70-168-153-252.oc.oc.cox.net) |
| 2021-03-03 22:41:00 | <shapr> | or profile the code and look for the hotspots? |
| 2021-03-03 22:41:25 | → | heatsink joins (~heatsink@2600:1700:bef1:5e10:dd5f:6f4f:a50:215d) |
| 2021-03-03 22:41:43 | × | conal quits (~conal@64.71.133.70) (Quit: Computer has gone to sleep.) |
| 2021-03-03 22:42:35 | <d34df00d> | It basically is one big hot spot. |
| 2021-03-03 22:42:45 | <d34df00d> | -p is not too helpful here. |
| 2021-03-03 22:43:00 | <Rembane> | d34df00d: Do what shapr says, or try to implement your function in terms of foldl' https://hoogle.haskell.org/?=&hoogle=foldl%27%20package%3Abytestring&scope= and see if that makes it faster. |
| 2021-03-03 22:44:48 | × | kupi quits (uid212005@gateway/web/irccloud.com/x-cgsykjurvegvbxhp) (Quit: Connection closed for inactivity) |
| 2021-03-03 22:45:29 | <d34df00d> | Aha, foldl'! I guess I'll try that first, since I've already took a stab at profiling this stuff, and looking at the sources of bytestring scares me a little. |
| 2021-03-03 22:45:35 | × | heatsink quits (~heatsink@2600:1700:bef1:5e10:dd5f:6f4f:a50:215d) (Ping timeout: 240 seconds) |
| 2021-03-03 22:46:02 | → | softwarm joins (44695313@ip68-105-83-19.sd.sd.cox.net) |
| 2021-03-03 22:46:10 | <Rembane> | d34df00d: But before you throw away all your old code, do some measurements, see how long time it takes so you don't make it slower by mistake. |
| 2021-03-03 22:46:44 | → | conal joins (~conal@64.71.133.70) |
| 2021-03-03 22:47:16 | <d34df00d> | Yeah, that's always a good idea! In fact I'm always running my code with +RTS -sstderr to see what MUT and GC look like. |
| 2021-03-03 22:47:26 | <monochrom> | This one is well-known. It benefits greatly from BangPatterns and "!skips", or else you use seq or $! on the RHSes manually. |
| 2021-03-03 22:47:57 | <monochrom> | And if you use foldl', you still need to know this. |
| 2021-03-03 22:48:09 | × | conal quits (~conal@64.71.133.70) (Quit: Computer has gone to sleep.) |
| 2021-03-03 22:48:45 | <d34df00d> | Ah, I forgot to tell I have {-# LANGUAGE Strict #-} |
| 2021-03-03 22:48:55 | <monochrom> | Because it looks like your future foldl' will be on a tuple. Well foldl' doesn't seq deeply on a tuple. |
| 2021-03-03 22:49:26 | <d34df00d> | (I guess it should be equivalent in this case to all the bang patterns in the right places, right?) |
| 2021-03-03 22:49:31 | × | Franciman quits (~francesco@host-82-49-79-189.retail.telecomitalia.it) (Quit: Leaving) |
| 2021-03-03 22:49:31 | <monochrom> | Hrm, then I don't know. But look at core to confirm skips is non-lazy? |
| 2021-03-03 22:50:01 | <monochrom> | Actually this is self-contained, I can try it out. |
| 2021-03-03 22:50:05 | <d34df00d> | Deeper down the rabbit hole it is then! |
| 2021-03-03 22:50:43 | × | hyperisco quits (~hyperisco@d192-186-117-226.static.comm.cgocable.net) (Ping timeout: 256 seconds) |
| 2021-03-03 22:50:55 | <monochrom> | BSL = ByteString.Lazy ? |
| 2021-03-03 22:51:15 | <d34df00d> | Yep. |
| 2021-03-03 22:51:33 | <monochrom> | What is "first"? |
| 2021-03-03 22:51:37 | <d34df00d> | Data.Bifunctor |
| 2021-03-03 22:52:04 | <d34df00d> | Also, am I right that in the first variant (without the tuples and things) the BSL.pack . go . BSL.unpack fused into something O(1)-memory? |
| 2021-03-03 22:53:25 | <monochrom> | No. |
| 2021-03-03 22:53:32 | → | jamm_ joins (~jamm@unaffiliated/jamm) |
| 2021-03-03 22:53:32 | → | conal joins (~conal@64.71.133.70) |
| 2021-03-03 22:53:32 | × | conal quits (~conal@64.71.133.70) (Client Quit) |
| 2021-03-03 22:54:10 | <d34df00d> | Hmm, why was it fast then? |
| 2021-03-03 22:54:27 | <d34df00d> | And why it wasn't? |
| 2021-03-03 22:54:36 | × | fendor_ quits (~fendor@77.119.128.81.wireless.dyn.drei.com) (Remote host closed the connection) |
| 2021-03-03 22:54:43 | <d34df00d> | (it wasn't O(1), that is) |
| 2021-03-03 22:54:51 | → | gitgood joins (~gitgood@82-132-216-44.dab.02.net) |
| 2021-03-03 22:55:13 | → | conal joins (~conal@64.71.133.70) |
| 2021-03-03 22:57:06 | × | conal quits (~conal@64.71.133.70) (Client Quit) |
| 2021-03-03 22:57:34 | × | stree quits (~stree@68.36.8.116) (Ping timeout: 245 seconds) |
| 2021-03-03 22:57:35 | × | jamm_ quits (~jamm@unaffiliated/jamm) (Ping timeout: 240 seconds) |
| 2021-03-03 22:58:01 | <Rembane> | d34df00d: Is O(1) the size of the original string? |
| 2021-03-03 22:58:05 | <d34df00d> | Yes. |
| 2021-03-03 22:58:11 | <Rembane> | Cool. |
| 2021-03-03 22:58:23 | <d34df00d> | Looks like nope :) |
| 2021-03-03 22:58:44 | <Rembane> | It doesn't sound unfeasible, but Haskell memory is an interesting beast, lets see what monochrom says. :) |
| 2021-03-03 23:00:27 | <monochrom> | I don't understand how 100MB is considered "O(1) size". |
| 2021-03-03 23:01:07 | <d34df00d> | Hmm. |
| 2021-03-03 23:01:08 | <d34df00d> | Hold on. |
| 2021-03-03 23:01:31 | <d34df00d> | Nope, don't hold on. In my measurements, it was the size of the original string (that resided in memory anyway). |
| 2021-03-03 23:01:47 | <pjb> | monochrom: the universe is finite, therefore 100 MB is O(1). |
| 2021-03-03 23:01:57 | <d34df00d> | That is, replacing this function by `id` didn't change the memory consumption. |
| 2021-03-03 23:02:00 | <pjb> | monochrom: anything that's inside this universe is O(1). |
| 2021-03-03 23:02:47 | <koz_> | Suppose I have Foo of kind (Type -> Type) -> Type. If I write 'deriving stock (Generic)', what would the constraints on the generated instance look like? |
| 2021-03-03 23:02:49 | → | conal_ joins (~conal@64.71.133.70) |
All times are in UTC.