Home freenode/#haskell: Logs Calendar

Logs: freenode/#haskell

←Prev  Next→ 502,152 events total
2021-03-29 13:50:34 <johnnyboy[m]> and then I turn it into a nice structured format, e.g. JSON, XML, CSV, HTML tables, markdown tables, or LaTeX tables
2021-03-29 13:50:41 × heatsink quits (~heatsink@2600:1700:bef1:5e10:f0bc:f236:90c7:a6f5) (Ping timeout: 252 seconds)
2021-03-29 13:51:40 <johnnyboy[m]> I did use the optparse library for parsing the command line arguments and it works great
2021-03-29 13:52:21 <tomsmeding> johnnyboy[m]: can you perhaps share the code that runs that regex?
2021-03-29 13:52:39 <tomsmeding> to double-check syntax
2021-03-29 13:52:49 × cr3 quits (~cr3@192-222-143-195.qc.cable.ebox.net) (Client Quit)
2021-03-29 13:53:09 ddellaco_ joins (~ddellacos@ool-44c73afa.dyn.optonline.net)
2021-03-29 13:54:08 cr3 joins (~cr3@192-222-143-195.qc.cable.ebox.net)
2021-03-29 13:55:11 <johnnyboy[m]> <tomsmeding "johnnyboy: can you perhaps share"> this is the file where my regexes are: https://gitlab.com/jllang/spin2latex/-/blob/master/src/Token.hs
2021-03-29 13:55:21 <johnnyboy[m]> I should really rename that project
2021-03-29 13:55:36 <[exa]> johnnyboy[m]: you should really use attoparsec
2021-03-29 13:55:39 <johnnyboy[m]> it's not really restricted to producing LaTeX tables anymore
2021-03-29 13:56:06 <tomsmeding> oh hi john :)
2021-03-29 13:56:21 <johnnyboy[m]> hi, tom :)
2021-03-29 13:57:02 × graf_blutwurst quits (~user@2001:171b:226e:adc0:3dbe:eebd:8040:b693) (Remote host closed the connection)
2021-03-29 13:57:13 <johnnyboy[m]> maybe a parser combinator library would be a good idea in the long run
2021-03-29 13:57:44 <[exa]> like, I understand the language may be regular so a "full" context-free grammar parser looks like an overkill
2021-03-29 13:57:57 <johnnyboy[m]> anyway, my intention is to simply just discard most of the input text and only pick a handful of interesting fields
2021-03-29 13:58:11 deviantfero joins (~deviantfe@190.150.27.58)
2021-03-29 13:58:41 <[exa]> except, running normal attoparsecs is usually much less complex than compiling, optimizing and running the regexes
2021-03-29 13:59:26 <[exa]> also, it quite often happens that you need to do very ugly regex tricks to capture stuff that's trivial with parsers
2021-03-29 13:59:28 <tomsmeding> johnnyboy[m]: are you sure that [[:space:]] doesn't work? this page claims that it's supported (see under "Feature support"): https://hackage.haskell.org/package/regex-tdfa-1.3.1.0/docs/Text-Regex-TDFA.html
2021-03-29 13:59:46 <johnnyboy[m]> maybe there's something else wrong then
2021-03-29 13:59:48 <[exa]> and finally, you'll be ready for the moment you at some point realize you need to support parentheses.
2021-03-29 14:00:09 <johnnyboy[m]> by the way, the version in github has a mistake there
2021-03-29 14:00:23 <tomsmeding> [exa]: OP said that the text being parsed has been the same format since ages, so unlikely to change
2021-03-29 14:00:23 <johnnyboy[m]> sorry, no
2021-03-29 14:00:37 <merijn> tomsmeding: That's not really relevant, though :p
2021-03-29 14:01:00 <merijn> tomsmeding: Because the attoparsec version will be simpler to read/write even if you don't have to update it
2021-03-29 14:01:08 <tomsmeding> though I do agree that parser combinators are nicer than regex in Haskell, especially in Haskell, where parser combinators are so nice
2021-03-29 14:01:17 <tomsmeding> merijn: that latter point depends on your familiarity with regex ;)
2021-03-29 14:02:03 × nbloomf quits (~nbloomf@2600:1700:ad14:3020:cca4:232:630d:d55c) (Quit: My MacBook has gone to sleep. ZZZzzz…)
2021-03-29 14:02:16 <[exa]> tomsmeding: I've heard this a few times. Usually followed by "whew what a nice export, what if we also packed in a $non_regular_feature to make the export more colorful?"
2021-03-29 14:02:35 <tomsmeding> export != import?
2021-03-29 14:02:39 ixlun joins (~matthew@109.249.184.133)
2021-03-29 14:03:16 <[exa]> (I meant the export that comes from the other part of the program)
2021-03-29 14:03:48 tomsmeding is just trying to provide a bit of pushback to "how do I do X simple common thing with technique A? -- Well, please use technique PQR that does 10 other things but is much nicer" :)
2021-03-29 14:03:53 <tomsmeding> not trying to be hostile
2021-03-29 14:04:24 <maerwald> tomsmeding: parser combinators are more expressive and as such may not be desired :) that is following the principle of using the least powerful tool.
2021-03-29 14:04:45 <tomsmeding> [exa]: ah, I see
2021-03-29 14:04:46 <maerwald> That argument has also repeatedly been made by the LANGSEC authors
2021-03-29 14:05:35 <maerwald> treat input as a language, write a validator and use the least expressive tool
2021-03-29 14:05:52 <maerwald> in that sense, they also created parser combinator library for C
2021-03-29 14:06:03 MarcelineVQ joins (~anja@198.254.208.159)
2021-03-29 14:06:10 <maerwald> (arguing that parser combinators are magnitudes more safer than a hand-written one)
2021-03-29 14:06:14 <merijn> maerwald: parser combinators are just recursive descent parsers with a convenient paint of code
2021-03-29 14:07:00 heatsink joins (~heatsink@2600:1700:bef1:5e10:f0bc:f236:90c7:a6f5)
2021-03-29 14:07:57 <merijn> Well written recursive descent parsers are just as efficient and minimal as their corresponding LALR(k) version. But most humans find recursive descent much easier to write/think about (and better errors)
2021-03-29 14:08:29 <tomsmeding> % import Text.Regex.TDFA
2021-03-29 14:08:30 <yahb> tomsmeding: ; <no location info>: error:; Could not find module `Text.Regex.TDFA'; It is not a module in the current program, or in any known package.
2021-03-29 14:08:33 <tomsmeding> boo
2021-03-29 14:08:50 <tomsmeding> anyway johnnyboy[m]: 'match (makeRegex "\t" :: Regex) "\t" :: Bool' gives True for me
2021-03-29 14:09:05 <johnnyboy[m]> I think I'm going to replace tabs with spaces and see if I can then match against `[[:space:]]`
2021-03-29 14:09:08 <tomsmeding> just a literal tab character is apparently valid in a regex-tdfa regex
2021-03-29 14:09:19 <johnnyboy[m]> just to rule out the possibility that it's the tabs that somehow mess things up
2021-03-29 14:10:10 <[exa]> tomsmeding: in the "pushback" direction I'd probably suggest awk :]
2021-03-29 14:11:06 <johnnyboy[m]> okay, it's not the tabs
2021-03-29 14:11:14 × heatsink quits (~heatsink@2600:1700:bef1:5e10:f0bc:f236:90c7:a6f5) (Ping timeout: 245 seconds)
2021-03-29 14:11:15 <johnnyboy[m]> my regexes are just wrong somehow
2021-03-29 14:11:31 <tomsmeding> cue the rest here saying you should use parser combinators :p
2021-03-29 14:11:37 <tomsmeding> what's your source text and what's your regex
2021-03-29 14:11:52 malumore_ joins (~malumore@151.62.126.223)
2021-03-29 14:12:12 <johnnyboy[m]> https://gitlab.com/jllang/spin2latex/-/blob/master/testdata/success1.txt
2021-03-29 14:12:18 <johnnyboy[m]> that's a test file I'm using now
2021-03-29 14:12:34 waleee-cl joins (uid373333@gateway/web/irccloud.com/x-mhbpgvvowjjnvcmn)
2021-03-29 14:14:01 <johnnyboy[m]> this is my regex: https://privatebin.net/?cea173e3eb0202b4#EJcfvuZf734KwdZUG8CBhoHfiNNj6cPH3E3M8hLs4o8u
2021-03-29 14:14:07 molehillish joins (~molehilli@2600:8800:8d06:1800:6438:fe04:a25d:577)
2021-03-29 14:14:19 × ddellaco_ quits (~ddellacos@ool-44c73afa.dyn.optonline.net) (Remote host closed the connection)
2021-03-29 14:14:29 <johnnyboy[m]> so I'm looking for a number, followed by "actual memory use for states"
2021-03-29 14:14:42 <johnnyboy[m]> with whitespace (tab) between
2021-03-29 14:14:45 rj joins (~x@gateway/tor-sasl/rj)
2021-03-29 14:15:03 × malumore quits (~malumore@151.62.126.223) (Ping timeout: 268 seconds)
2021-03-29 14:15:11 <tomsmeding> johnnyboy[m]: are you sure you're skipping the initial whitespace? i.e. aren't you missing a prefix ' *'?
2021-03-29 14:15:28 <johnnyboy[m]> `$ cat success1.txt | grep -E "[0-9]+.[0-9]{3}[[:space:]]+actual"` returns `0.292 actual memory usage for states`
2021-03-29 14:16:08 <tomsmeding> yes because 'grep' allows matching at any point in a line
2021-03-29 14:16:14 <johnnyboy[m]> ah
2021-03-29 14:16:22 <tomsmeding> oh TDFA also does; ignore
2021-03-29 14:16:24 <johnnyboy[m]> ok, I'll try adding an initial [[:space:]]+
2021-03-29 14:16:31 <johnnyboy[m]> or [[:space:]]*
2021-03-29 14:16:41 ddellaco_ joins (~ddellacos@ool-44c73afa.dyn.optonline.net)
2021-03-29 14:17:10 <tomsmeding> it matches that line for me :p
2021-03-29 14:17:18 <dminuoso> Regular expressions. How to introduce long lasting bugs by carelessly bolted-on regular expressions.
2021-03-29 14:17:20 <tomsmeding> so your problem is outside of the regex I think
2021-03-29 14:17:51 <johnnyboy[m]> but I have this other regex for picking the error count
2021-03-29 14:18:11 <johnnyboy[m]> it works even if the line containing "errors: xxx" does start with something else
2021-03-29 14:18:27 <tomsmeding> regexen are like excel: computer scientists are embarrassed to admit their effectiveness
2021-03-29 14:18:33 slack1256 joins (~slack1256@dvc-186-186-101-190.movil.vtr.net)
2021-03-29 14:18:36 × molehillish quits (~molehilli@2600:8800:8d06:1800:6438:fe04:a25d:577) (Ping timeout: 258 seconds)
2021-03-29 14:18:46 × Iceland_jack quits (~user@95.149.219.0) (Ping timeout: 268 seconds)
2021-03-29 14:18:46 <tdammers> what about regexcel?
2021-03-29 14:18:48 <tomsmeding> johnnyboy[m]: indeed, for me your regex matches that line
2021-03-29 14:19:01 <tomsmeding> so I'm thinking the problem is not with the regex, but with the code that runs the regex
2021-03-29 14:19:16 <tomsmeding> tdammers: excel has gotten lambdas recently, surely it can also have regex
2021-03-29 14:19:17 Jd007 joins (~Jd007@162.156.11.151)
2021-03-29 14:19:32 <tomsmeding> oh it already does
2021-03-29 14:20:42 <tdammers> I bet it includes an email system too
2021-03-29 14:20:50 ski . o O ( <https://www.microsoft.com/en-us/research/blog/lambda-the-ultimatae-excel-worksheet-function/>,<https://www.microsoft.com/en-us/research/publication/a-user-centred-approach-to-functions-in-excel/> )
2021-03-29 14:21:04 dcbdan joins (~dcbdan@c-73-76-129-120.hsd1.tx.comcast.net)
2021-03-29 14:21:09 <johnnyboy[m]> <tdammers "I bet it includes an email syste"> I thought that was emacs
2021-03-29 14:21:53 <ski> Emacs includes an editor

All times are in UTC.