Logs: liberachat/#haskell
| 2025-11-13 10:38:38 | <kuribas`> | [exa]: how else, if you have strings? |
| 2025-11-13 10:39:13 | <[exa]> | merijn: like, the immutability shouldn't be an issue at all in that precise benchmark, it's select-only |
| 2025-11-13 10:39:23 | <[exa]> | kuribas`: tries? inverted indices? |
| 2025-11-13 10:40:50 | <kuribas`> | [exa]: The trie is slower in that benchmark |
| 2025-11-13 10:46:45 | <merijn> | kuribas`: Even then, meh |
| 2025-11-13 10:47:13 | <merijn> | kuribas`: I mean, why would a hashmap be better at strings than a tree based map? |
| 2025-11-13 10:47:58 | × | deptype_ quits (~deptype@2406:b400:3a:73c2:ebd8:a6e4:ac56:ebb9) (Remote host closed the connection) |
| 2025-11-13 10:48:12 | → | deptype_ joins (~deptype@2406:b400:3a:73c2:d739:473:9e2d:bf26) |
| 2025-11-13 10:48:18 | merijn | will takes a tree based Map over a hashmap any day |
| 2025-11-13 10:49:18 | <merijn> | Better worst case complexity, better space complexity, more flexible queries, negligible performance difference in 95% of scenarios (that might be a conservative percentage) |
| 2025-11-13 10:51:12 | <[exa]> | kuribas`: the main point is that if you really want an index over strings with only Eq possible, you're likely pushed yourself into over-generalizing the situation, and you should use integers instead of the strings |
| 2025-11-13 10:51:25 | <[exa]> | s/strings/variable-width keys/ |
| 2025-11-13 10:52:36 | <[exa]> | merijn: in the "naive" tree case the strings are super annoying (generic tree algorithms do repeated compares on shared key prefixes etc) |
| 2025-11-13 10:52:38 | <kuribas`> | [exa]: You build a sorted vector first? |
| 2025-11-13 10:52:49 | <[exa]> | no, trie |
| 2025-11-13 10:53:08 | → | poscat joins (~poscat@user/poscat) |
| 2025-11-13 10:53:42 | <merijn> | [exa]: Sure, but if you've got enough string keys for that to matter you should probably rethink your approach anyway :p |
| 2025-11-13 10:54:07 | <[exa]> | y a p |
| 2025-11-13 10:54:12 | <merijn> | At that point, just dump everything into SQLite |
| 2025-11-13 10:54:42 | <kuribas`> | I have a nested dictionary with wildcards. |
| 2025-11-13 10:54:46 | <kuribas`> | That doesn't go into SQL easily. |
| 2025-11-13 10:54:59 | <kuribas`> | with entries like ("foo", *, 2) |
| 2025-11-13 10:55:12 | × | poscat0x04 quits (~poscat@user/poscat) (Ping timeout: 256 seconds) |
| 2025-11-13 10:55:29 | <kuribas`> | and strings for addresses, driver names, device models, etc.. |
| 2025-11-13 10:55:44 | <merijn> | kuribas`: tbh 1) it's probably worth figuring out how to do it in SQLite anyway, 2) if you can turn it into a JSON encoding you can just store (and query) json blobs in SQLite :p |
| 2025-11-13 10:55:55 | <kuribas`> | Sure I can intern all the strings, that would just be another hashmap... |
| 2025-11-13 10:56:22 | <merijn> | kuribas`: Naah, with interned strings you could just use a tree map, since then [exa] comment about repeated compares goes away :p |
| 2025-11-13 10:56:43 | <kuribas`> | merijn: yeah, but the interning needs another hashmap. |
| 2025-11-13 10:57:40 | <merijn> | All I will say is that I've switched to SQLite from whatever I was using 3 times in projects, and every single time I have the same epiphany :p Which is, that I should use more SQLite in everything :p |
| 2025-11-13 10:58:34 | <kuribas`> | I have sqlite in my python project, and I feel its just now untyped queries instead of typed data processing... |
| 2025-11-13 10:58:40 | <[exa]> | kuribas`: no it requires a trie, hashmaps waste space for interning |
| 2025-11-13 10:59:13 | <merijn> | kuribas`: Have you not heard the glorious news of SQLite STRICT mode? :p |
| 2025-11-13 10:59:25 | <merijn> | https://sqlite.org/stricttables.html |
| 2025-11-13 10:59:34 | <kuribas`> | [exa]: what if I care about time more than space? |
| 2025-11-13 10:59:38 | <[exa]> | kuribas`: anyway you might have notice that you hit 2 professional hashmap haters today |
| 2025-11-13 10:59:49 | <kuribas`> | right :) |
| 2025-11-13 11:03:21 | <[exa]> | kuribas`: then you go patricia trees and/or precompiled string matchers (aho-corasick style) |
| 2025-11-13 11:03:55 | trickard_ | is now known as trickard |
| 2025-11-13 11:05:03 | × | ezzieyguywuf quits (~Unknown@user/ezzieyguywuf) (Ping timeout: 250 seconds) |
| 2025-11-13 11:06:02 | <[exa]> | like, there are great applications of hashes that actually make sense, like bloom filters and whatnot, why does everyone want hashmaps? |
| 2025-11-13 11:06:20 | [exa] | sad |
| 2025-11-13 11:06:52 | <merijn> | [exa]: JavaScript and Python popularised the use of maps. They use hashmaps, people get taught maps using the word hashmap. People internalise "hashmap good" |
| 2025-11-13 11:07:27 | <merijn> | [exa]: Especially since neither python nor JS give you easy access to non-hashmaps, people just pretend non hashmaps don't exist. |
| 2025-11-13 11:07:29 | <[exa]> | I already said "over-generalized" somewhere right. :D |
| 2025-11-13 11:08:09 | <merijn> | IMO the fact that containers only ships tree maps is absolutely based and helps default people to a good map type |
| 2025-11-13 11:09:00 | <merijn> | I also strongly approve C++ have std::map be a sensible map and requiring people to write std::unordered_map to get a hashmap |
| 2025-11-13 11:09:24 | <[exa]> | kuribas`: tbh you should start professionally hating hashmaps just for the purpose of being cool way before it becomes popularly cool in 20 years from now |
| 2025-11-13 11:09:43 | <[exa]> | ...at which point the cool thing is gonna be std::ai_map or so |
| 2025-11-13 11:10:19 | <merijn> | The only reason I'm not a professional SQLite shill is that I haven't figured out how to get paid :p |
| 2025-11-13 11:10:29 | × | Taneb quits (~username@host-95-251-57-201.retail.telecomitalia.it) (Ping timeout: 256 seconds) |
| 2025-11-13 11:10:36 | <[exa]> | merijn: I saw stuff like `using map = std::unordered_map;` from students, wasn't happy |
| 2025-11-13 11:10:46 | <merijn> | [exa]: Fail them! |
| 2025-11-13 11:10:55 | <[exa]> | anyway yeah the unordered wording is great there, tells nicely which property is lost |
| 2025-11-13 11:11:27 | <[exa]> | merijn: I told them they failed me, don't worry. :D |
| 2025-11-13 11:11:41 | <haskellbridge> | <doc> ai_map.. now that's just a hashmap where everything is hashed to embedding vectors :^) |
| 2025-11-13 11:12:24 | <haskellbridge> | <doc> i am sure that has gone through a couple hype cycles already |
| 2025-11-13 11:12:46 | <lucabtz> | merijn im a C++ developer as day job and always hated that std::map is not a hashmap but you are making me reconsider |
| 2025-11-13 11:12:54 | <merijn> | doc: I mean, replace the hashmap with a metric tree and that's actually neat |
| 2025-11-13 11:13:11 | <[exa]> | doc: no that's embedding_vector_map, that was popular 3 years ago. Behold: `insertAiMap k v = unsafePerformGemini $ "Please remember that " ++ show k ++ " saves " ++ show v` |
| 2025-11-13 11:15:50 | <[exa]> | kuribas`: anyway the most formal argument against the hashmaps that I have is literally that cache bump that you saw there in the benchmarks. With hashmaps it's unavoidable if your map grows; with sensible maps you can make it disappear using some kind of locality on a map of any size. |
| 2025-11-13 11:16:06 | <merijn> | lucabtz: A decent tree implementation like red-black or AVL tree guarantee O(log n) worst case lookup and insert (vs O(n) worst case for hashmap). Now hashmap can have better average case, but that depends on the ratio of keys to buckets. You need a certain percentage of empty buckets to avoid collissions, I don't know the optimal numbers but I'm betting at least 20-30%. So that means you must allocate |
| 2025-11-13 11:16:12 | <merijn> | 20-30% more space than you have data |
| 2025-11-13 11:16:48 | <merijn> | Additionally the ability of ordered maps to look up "smallest key bigger than X" or "smallest key larger than X" is super useful in many situations. |
| 2025-11-13 11:17:04 | <lucabtz> | merijn yeah yeah your point is valid |
| 2025-11-13 11:17:09 | <merijn> | Guaranteed stable ordering for traversals/iteration too |
| 2025-11-13 11:17:11 | <merijn> | <3 |
| 2025-11-13 11:17:15 | <[exa]> | <3 |
| 2025-11-13 11:17:32 | <lucabtz> | though you need ordering for a tree map which you dont for a hash map |
| 2025-11-13 11:17:42 | <lucabtz> | but you need a hash in the hash map |
| 2025-11-13 11:17:47 | <lucabtz> | so it isnt very different |
| 2025-11-13 11:17:48 | <merijn> | (important side note that containers indeed guarantees that foldable/traversable operation happen in ascending key order) |
| 2025-11-13 11:19:52 | <[exa]> | lucabtz: like at wurst you can commit a heinous crime and order by the hash. Not sure if the other ways is doable universally tho. |
| 2025-11-13 11:21:20 | <lucabtz> | [exa] yeah i didnt think of that, though it wouldnt work, what about hash collisions? |
| 2025-11-13 11:22:54 | <[exa]> | multimap™ |
| 2025-11-13 11:24:06 | × | merijn quits (~merijn@77.242.116.146) (Ping timeout: 256 seconds) |
| 2025-11-13 11:24:08 | → | xff0x joins (~xff0x@2405:6580:b080:900:7cd4:5734:7947:6d90) |
| 2025-11-13 11:24:18 | <[exa]> | technically you don't need short hashes because you don't need to minimize the hash table, so you can always have say 64b hash, so collisions won't hurt too much, and at worst you simply find the one key that you wanted from the tree range that the search returns |
| 2025-11-13 11:24:19 | → | trickard__ joins (~trickard@cpe-62-98-47-163.wireline.com.au) |
| 2025-11-13 11:24:39 | <[exa]> | but don't :D |
| 2025-11-13 11:25:13 | × | trickard quits (~trickard@cpe-62-98-47-163.wireline.com.au) (Ping timeout: 264 seconds) |
| 2025-11-13 11:25:13 | trickard__ | is now known as trickard |
| 2025-11-13 11:27:46 | <lucabtz> | but the tree would need to contain bins are leaves no? |
| 2025-11-13 11:29:04 | <lucabtz> | the worst case complexity would still be O(n), maybe the worst case would incredibly unlickely given the key space is so big though |
| 2025-11-13 11:30:01 | → | Lycurgus joins (~juan@user/Lycurgus) |
| 2025-11-13 11:30:35 | × | xff0x quits (~xff0x@2405:6580:b080:900:7cd4:5734:7947:6d90) (Quit: xff0x) |
| 2025-11-13 11:31:52 | <lucabtz> | and it still would be better space wise |
| 2025-11-13 11:32:02 | × | deptype_ quits (~deptype@2406:b400:3a:73c2:d739:473:9e2d:bf26) (Remote host closed the connection) |
| 2025-11-13 11:32:15 | → | deptype_ joins (~deptype@2406:b400:3a:73c2:8206:7870:138f:c565) |
| 2025-11-13 11:33:41 | → | xff0x joins (~xff0x@2405:6580:b080:900:7b8c:bddf:6c13:ed0c) |
| 2025-11-13 11:36:58 | → | merijn joins (~merijn@77.242.116.146) |
| 2025-11-13 11:42:01 | → | biberu joins (~biberu@user/biberu) |
| 2025-11-13 11:42:47 | × | merijn quits (~merijn@77.242.116.146) (Ping timeout: 256 seconds) |
| 2025-11-13 11:52:05 | × | deptype_ quits (~deptype@2406:b400:3a:73c2:8206:7870:138f:c565) (Remote host closed the connection) |
| 2025-11-13 11:52:24 | → | deptype_ joins (~deptype@2406:b400:3a:73c2:fdd5:1805:7d92:70ad) |
| 2025-11-13 11:53:57 | <[exa]> | lucabtz: not really, multimaps can have many same keys and you select the whole range |
| 2025-11-13 11:54:28 | <[exa]> | also if you'd get the O(n) worst-case with this one, you'd be equivalently screwed with the hashmaps ("replace your hash") |
| 2025-11-13 11:59:23 | → | merijn joins (~merijn@77.242.116.146) |
| 2025-11-13 12:00:04 | → | Nachtgespenst joins (~user@user/siracusa) |
| 2025-11-13 12:00:59 | × | lambdabot quits (~lambdabot@haskell/bot/lambdabot) (Remote host closed the connection) |
| 2025-11-13 12:12:37 | × | deptype_ quits (~deptype@2406:b400:3a:73c2:fdd5:1805:7d92:70ad) (Remote host closed the connection) |
All times are in UTC.