07:29:50 <mekeor> what's the best way to find out how many people visit my server through HTTP?
09:01:58 <Lemmih> mekeor: Google Analytics is quite easy.
09:03:08 <mekeor> i was thinking of a unix-tool or a happstack-way...
09:04:07 <mekeor> can i make happstack log the visits?
09:18:37 <LambdaDusk> hi... on the site to acid-state, it was said there is a way to have a datatype that only keeps the keys in memory and then uses mmap for the actual data... is there such an implementation already?
09:53:31 <Lemmih> LambdaDusk: I don't think so.
09:53:56 <Lemmih> LambdaDusk: Where was it referenced?
09:54:27 <LambdaDusk> Lemmih: http://acid-state.seize.it/ , "Another potential solution is to create a special data-structure like IxSet which stores only the keys in RAM, and uses mmap and/or enumerators to transparently read the values from disk without loading them all into RAM at once."
09:55:23 <LambdaDusk> I've tried to read up on mmap but I am not confident enough to believe I could implement one myself
09:57:50 <Lemmih> I think you'd be better off with either compact-map or berkeleydb.
09:58:12 <Lemmih> Or using Riak/MongoDB for blob storage.
10:04:12 <LambdaDusk> I am currently considering MongoDB, but all this type conversion stuff made me thinking about a solution directly in haskell
14:47:11 <Palmik> Lemmih, why does acid-state need to know what events are associated with the given AcidState?
14:49:16 <Lemmih> So the event log can be replayed.
14:55:22 <Palmik> OK, thanks.
15:54:52 <stepcut> mekeor: the best way is definitely google analytics. But, it would be easy to add a wrapper that logs all requests to a database so you can do your own analyzing.
15:55:51 <stepcut> mekeor: but google analytics is more better
23:33:28 <donri> i wonder how difficult it would be to make an acid-state that uses the disk as if it was RAM, and if that would be any useful at all assuming SSD
23:35:13 <stepcut> the issue is that update/queries can use arbitrary haskell functions (as long as the code is pure), so.. you would have to figure out how to get those functions to run on data that is on the disk
23:35:30 <donri> yea
23:35:45 <stepcut> seems difficult to do in general
23:36:11 <donri> also, if sharding is solved doesn't that sort of solve disk-swapping as well? isn't the problem in both those cases with data partitioning
23:36:26 <stepcut> instead we have considered, how can me make data structures which act pure to the outside world, but can store data on the disk transparently
23:36:41 <donri> or, conversely, isn't sharding as difficult to solve as disk swapping, without sacrificing the acid-state flexibility
23:37:12 <stepcut> if you have sharding, you could swap stuff on/off the disk, but you potentially have to read/write the entire data set to/from the disk for a single query.. so that is not very practical
23:37:23 <donri> i.e. our data we put in acid-state isn't partitioned in any way acid-state is aware of
23:38:38 <stepcut> partitioning likely requires special support from acid-state
23:39:24 <stepcut> but, with sharding, you are still just applying normal haskell functions to values in RAM
23:39:38 <stepcut> acid-state just has to farm some queries out to remote servers and gather the results
23:40:36 <stepcut> in theory, those remote servers could just be blocks that are swapped to disk.. but in practice that is going to result in horrible performance because many queries need to examine all the values
23:40:59 <donri> i thought with sharding no single server holds the full state, so you still don't have "just RAM"
23:41:21 <donri> or is your idea of sharding simply better support for separate states?
23:41:29 <stepcut> yes, no single server holds the full state
23:41:47 <stepcut> but everything is still in RAM somewhere
23:42:19 <stepcut> nothing has to be read from disk in order to perform the query
23:42:28 <luite> what's the plan for sharding, how do you know which server contains which data?
23:43:27 <donri> but isn't it still a difficult problem to solve because acid-state would have to fake all data being in local RAM?
23:44:17 <donri> like, way more complicated than anything acid-state is currently doing, including a hypothetical replicating backend
23:44:45 <stepcut> donri: there are some additional problems to be solved, but sharding is a bit like the remote backend
23:44:56 <donri> for example cloud haskell does impose explicit partitioning on you, with "processes" aka actors
23:45:21 <stepcut> except you have multiple remote backends and they each return a different subset of the results for the same query
23:46:02 <donri> well that just seems really complicated to do if you have no knowledge of the data structures
23:46:09 <luite> that doesn't sound like it would scale too well, run all queries on all servers?
23:46:11 <stepcut> yes, but sharding wouldn't be like that
23:46:52 <stepcut> not all data structures make sense for sharding.. and in practice, pretty much every acid-state database uses some collection type like a Map or IxSEt
23:47:27 <donri> i'm just thinking, if i'm right that it's this complicated, perhaps for sharding like with disk-swapping a dedicated data structure would make more sense
23:47:36 <stepcut> so, you could have a number of different shardable datatypes.. hashtables, etc
23:47:40 <donri> yea
23:50:00 <stepcut> luite: depends on the datastructure and what you are querying.. for something like a Map, where you just do key/value lookups, you could hash the key and know what server it is on. But, if you have a simple Set, and you want to return the values that match a certain predicate, then you need to apply the predicate match across all the servers. Obviously, that is an expensive operation, so you would want to avoid that if possible
23:51:36 <stepcut> luite: but it also depends on what, exactly, you are trying to scale. If it is a CPU intensive calculation, then distributing the load across many servers might be a win. You might use that, not because you need more RAM, but because you need more CPU
23:51:45 <stepcut> a bit like map/reduce
23:52:13 <luite> if you know what operations you support you can do approximate queries with bloom filters locally to get a set of servers that you need to query
23:53:54 <stepcut> yeah
23:54:05 <luite> of course you then have extra data to propagate between the servers, which sounds like a lot of work to implement
23:54:06 <stepcut> there is a lot of fun to be had working on acid-state :)
23:54:47 <luite> yeah to be honest, it seems that most of the "free lunch" is already in the current acid-state, and all these extensions are mainly hard work
23:55:21 <luite> still hard work in a good language :)
23:55:31 <stepcut> but fun!
23:56:33 <luite> yeah but it doesn't sound much easier than implementing mongodb or cassandra from scratch...
23:57:59 <donri> but if someone does the work, it may be much easier for users
23:58:27 <luite> right, but is it realistic tht this will happen, unless some company decides to hire some full-time programmers to work on acid-state?
23:59:02 <donri> yay gsoc! ;)