Home liberachat/#haskell: Logs Calendar

Logs: liberachat/#haskell

←Prev  Next→
Page 1 .. 492 493 494 495 496 497 498 499 500 501 502 .. 18003
1,800,294 events total
2021-06-16 12:59:42 × hello20 quits (~hello@cpc97208-walt22-2-0-cust196.13-2.cable.virginm.net) (Ping timeout: 268 seconds)
2021-06-16 13:00:01 chomwitt joins (~Pitsikoko@athedsl-20549.home.otenet.gr)
2021-06-16 13:01:15 alx741 joins (~alx741@186.178.108.66)
2021-06-16 13:01:29 × y04nn quits (~y04nn@81.17.24.204) (Ping timeout: 252 seconds)
2021-06-16 13:02:32 eggplantade joins (~Eggplanta@2600:1700:bef1:5e10:cded:c7cb:4d63:a64a)
2021-06-16 13:03:11 × andreas303 quits (~andreas@gateway/tor-sasl/andreas303) (Quit: andreas303)
2021-06-16 13:07:13 × eggplantade quits (~Eggplanta@2600:1700:bef1:5e10:cded:c7cb:4d63:a64a) (Ping timeout: 268 seconds)
2021-06-16 13:08:07 raek joins (~raek@2001:9b1:efe:3200:d250:99ff:fec0:e153)
2021-06-16 13:08:20 × shredder quits (~shredder@user/shredder) (Ping timeout: 268 seconds)
2021-06-16 13:09:01 bmo joins (~bmo@185.209.196.142)
2021-06-16 13:10:09 henninb joins (~user@63.226.174.157)
2021-06-16 13:10:18 × krjst quits (~krjst@2604:a880:800:c1::16b:8001) (Quit: bye)
2021-06-16 13:11:44 crazazy joins (~user@130.89.171.203)
2021-06-16 13:12:17 henninb parts (~user@63.226.174.157) ()
2021-06-16 13:12:54 × obs\ quits (~obscur1ty@102.41.69.204) (Quit: Leaving)
2021-06-16 13:13:10 obs\ joins (~obscur1ty@102.41.69.204)
2021-06-16 13:14:24 × ukari quits (~ukari@user/ukari) (Remote host closed the connection)
2021-06-16 13:15:05 ukari joins (~ukari@user/ukari)
2021-06-16 13:15:18 ddellacosta joins (~ddellacos@86.106.121.100)
2021-06-16 13:17:12 × obs\ quits (~obscur1ty@102.41.69.204) (Changing host)
2021-06-16 13:17:12 obs\ joins (~obscur1ty@user/obs/x-5924898)
2021-06-16 13:17:22 × cheater quits (~Username@user/cheater) (Remote host closed the connection)
2021-06-16 13:18:43 zebrag joins (~chris@user/zebrag)
2021-06-16 13:18:55 krjst joins (~krjst@2604:a880:800:c1::16b:8001)
2021-06-16 13:20:03 × ddellacosta quits (~ddellacos@86.106.121.100) (Ping timeout: 268 seconds)
2021-06-16 13:20:15 × kayprish quits (~kayprish@46.240.143.86) (Remote host closed the connection)
2021-06-16 13:20:58 × krjst quits (~krjst@2604:a880:800:c1::16b:8001) (Client Quit)
2021-06-16 13:21:43 cheater joins (~Username@user/cheater)
2021-06-16 13:22:14 shapr joins (~user@pool-100-36-247-68.washdc.fios.verizon.net)
2021-06-16 13:22:44 eggplantade joins (~Eggplanta@2600:1700:bef1:5e10:cded:c7cb:4d63:a64a)
2021-06-16 13:23:26 × Guest9 quits (~Guest9@103.250.139.6) (Quit: Connection closed)
2021-06-16 13:24:26 × jumper149 quits (~jumper149@80.240.31.34) (Ping timeout: 244 seconds)
2021-06-16 13:24:32 × psydroid quits (~psydroidm@2001:470:69fc:105::165) (Changing host)
2021-06-16 13:24:32 psydroid joins (~psydroidm@user/psydroid)
2021-06-16 13:25:22 krjst joins (~krjst@2604:a880:800:c1::16b:8001)
2021-06-16 13:25:50 sbmsr joins (~pi@104-6-130-18.lightspeed.miamfl.sbcglobal.net)
2021-06-16 13:27:01 × eggplantade quits (~Eggplanta@2600:1700:bef1:5e10:cded:c7cb:4d63:a64a) (Ping timeout: 244 seconds)
2021-06-16 13:28:22 AgentM joins (~agentm@pool-162-83-130-212.nycmny.fios.verizon.net)
2021-06-16 13:28:34 × cheater quits (~Username@user/cheater) (Ping timeout: 244 seconds)
2021-06-16 13:29:05 cheater joins (~Username@user/cheater)
2021-06-16 13:32:17 × haltux quits (~haltux@a89-154-181-47.cpe.netcabo.pt) (Ping timeout: 252 seconds)
2021-06-16 13:32:52 <bmo> What is a good approach of parsing (very large) XML files? I started off with using xml-conduit as it seems a good suit but maybe I am wrong
2021-06-16 13:33:00 × azeem quits (~azeem@176.201.22.245) (Ping timeout: 268 seconds)
2021-06-16 13:33:06 <bmo> I currently have a minor problem with that: paste.tomsmeding.com/WyuOXLLK
2021-06-16 13:33:14 azeem joins (~azeem@176.201.43.174)
2021-06-16 13:33:34 × raehik quits (~raehik@cpc95906-rdng25-2-0-cust156.15-3.cable.virginm.net) (Quit: WeeChat 3.1)
2021-06-16 13:34:39 raehik joins (~raehik@cpc95906-rdng25-2-0-cust156.15-3.cable.virginm.net)
2021-06-16 13:34:54 <bmo> So basically I cannot assume an order on the xml-tags. With that example an entry consists of `persons` (multiple fields that contain a `Text`) and `title` which is `Text` too. The naive way of just parsing `persons` first and then the `title` breaks as soon as the XML is not following the same order (duh)
2021-06-16 13:35:32 <bmo> Is there an elegant way of parsing such XML without breaking down my `Entry`'s fields, parsing them first and then re-order+validate?
2021-06-16 13:38:00 <bmo> In that small example my current approach works for `bs0` but for `bs1` it breaks as `title` precedes the `person`s (the might actually be interleaved in reality, so `<person>...<title>...<person>...` etc.))
2021-06-16 13:39:15 nschoe joins (~quassel@2a01:e0a:8e:a190:4dc0:5be8:9ad8:a5a4)
2021-06-16 13:39:56 × jakzale quits (uid499518@id-499518.charlton.irccloud.com) (Quit: Connection closed for inactivity)
2021-06-16 13:40:52 waleee joins (~waleee@2001:9b0:216:8200:d457:9189:7843:1dbd)
2021-06-16 13:41:23 benin036 joins (~benin@183.82.207.180)
2021-06-16 13:41:45 × dunkeln quits (~dunkeln@94.129.65.28) (Ping timeout: 268 seconds)
2021-06-16 13:41:52 <shapr> Anyone want to suggest improvements to https://github.com/shapr/takedouble/blob/main/src/Takedouble.hs#L71 and the saneFile function below?
2021-06-16 13:42:06 <shapr> I feel like there's a better and/or simpler approach to that.
2021-06-16 13:42:36 dunkeln joins (~dunkeln@94.129.65.28)
2021-06-16 13:43:56 <dminuoso> bmo: AttrParser is an Alternative, so you can use this https://hackage.haskell.org/package/parser-combinators-1.3.0/docs/Control-Monad-Permutations.html
2021-06-16 13:44:13 × aplainzetakind quits (~johndoe@captainludd.powered.by.lunarbnc.net) (Ping timeout: 268 seconds)
2021-06-16 13:45:45 <dminuoso> Im a bit surprised, does AttrParser not do this for you already?
2021-06-16 13:46:49 Tuplanolla joins (~Tuplanoll@91-159-68-239.elisa-laajakaista.fi)
2021-06-16 13:47:07 <dminuoso> Judging from the implementation, the order shouldn't matter.
2021-06-16 13:47:28 <bmo> dminuoso, actually the attributes are valid up to permutation true. I haven't noticed.
2021-06-16 13:47:36 aplainzetakind joins (~johndoe@captainludd.powered.by.lunarbnc.net)
2021-06-16 13:48:05 <dminuoso> So when you said "breaks", is that what you think it happens?
2021-06-16 13:48:08 <dminuoso> Have you actually tried it?
2021-06-16 13:48:29 <bmo> But my problem is with actual tags. So I have `<e> <p>x</p> <t>y</t> </e>` but sometimes `<e> <t>y</t> <p>x</p> </e>`
2021-06-16 13:49:08 <bmo> dminuoso, I just tested permuting the attributes and that works. But the permuted tags don't which, in hindsight, is expected
2021-06-16 13:49:12 eggplantade joins (~Eggplanta@2600:1700:bef1:5e10:cded:c7cb:4d63:a64a)
2021-06-16 13:49:13 <dminuoso> Ahh
2021-06-16 13:49:15 <shapr> Hm, I think I'll convert the "get all files in all subdirectories" function into something that could run in a bunch of threads, just to see if that's faster.
2021-06-16 13:49:29 <dminuoso> bmo: Yeah I dont think permutation on tags can reasonably work in conduit-xml
2021-06-16 13:49:34 <shapr> I've read NVMe drives work best with a deep queue of requests
2021-06-16 13:49:42 <maerwald> shapr: threading over filesystem operations? :>
2021-06-16 13:49:52 <dminuoso> bmo: For starters, what does "permutation" even mean? A naive take on XML is that it's a tree.
2021-06-16 13:50:38 muto joins (~muto@d75-159-225-7.abhsia.telus.net)
2021-06-16 13:50:46 <shapr> maerwald: yeah, I think it could speed up reading a bunch of files to check for duplicates
2021-06-16 13:51:10 <shapr> maerwald: I'm also slowly working my way towards this kind of thing: https://www.tbray.org/ongoing/When/202x/2021/03/27/Topfew-and-Amdahl
2021-06-16 13:51:16 × oo_miguel quits (~pi@89-72-187-203.dynamic.chello.pl) (Quit: WeeChat 2.3)
2021-06-16 13:51:19 × sbmsr quits (~pi@104-6-130-18.lightspeed.miamfl.sbcglobal.net) (Ping timeout: 272 seconds)
2021-06-16 13:51:20 <bmo> Well within an `<e>` (I'm just using abbreviations of that example I gave) "fields" are sometimes permuted, ie. not in a particular order
2021-06-16 13:51:28 <shapr> that is, a count min sketch on top of Apache logs
2021-06-16 13:51:40 <bmo> Luckily the leaves in such an `<e>` are always small, so the wouldn't nest further.
2021-06-16 13:51:59 <shapr> maerwald: at least for that post, reading multiple pieces of a large file in different threads was faster
2021-06-16 13:52:23 ddellacosta joins (~ddellacos@86.106.121.100)
2021-06-16 13:52:48 <dminuoso> bmo: You can <|> NameMatchers together
2021-06-16 13:53:42 × eggplantade quits (~Eggplanta@2600:1700:bef1:5e10:cded:c7cb:4d63:a64a) (Ping timeout: 264 seconds)
2021-06-16 13:54:21 <dminuoso> bmo: It seems you'd have to do something along these lines:
2021-06-16 13:54:56 <bmo> dminuoso, so I'd have to (with xml-conduit that is) parse the tags into something isomorphic to `data ELeave = P Text | T Text` and then re-order+validate once I parsed all of `<e>`'s leaves?
2021-06-16 13:56:28 × ddellacosta quits (~ddellacos@86.106.121.100) (Ping timeout: 244 seconds)
2021-06-16 13:58:33 <dminuoso> bmo: Something along the lines of: data Ki = Ent | Per | Tit; isEntry :: A -> Maybe Ki; isPerson :: A -> Maybe Ki; tag (isEntry <|> isPerson) (\case of Ent -> ...; Per -> ...; Tit -> ...)
2021-06-16 13:58:50 <dminuoso> This will become very awkward to write I think
2021-06-16 13:59:10 <dminuoso> Since you then have to keep track what kind of element you have consumed
2021-06-16 13:59:12 <bmo> Yeah :( I kinda wanted to avoid this somehow
2021-06-16 14:00:06 <bmo> Especially since that is a small example and the real thing is quite a bit bigger
2021-06-16 14:01:21 <dminuoso> bmo: have you considered tagsoup perhaps?
2021-06-16 14:02:12 <bmo> No, so far I only considered xml-conduit and had a quick look at how I can use DtdToHaskell with HaXml but conduit seemed simpler.
2021-06-16 14:02:40 <bmo> I was not aware of tagsoup, I'll have a look at it. Thanks a lot for your assistance!
2021-06-16 14:02:52 <dminuoso> with tagsoup you can convert it straight into a plain tree, that might be much easier to work with for you

All times are in UTC.