--- Log opened Sun Jul 19 00:00:28 2009
17:09 < mae_> dcoutts: hello
18:23 < dcoutts_> mae_: hi
18:31 < mae_> dcoutts_: i added a Socket interface to sendfile
18:32 < dcoutts_> mae_: ah right
19:19 < stepcut> mae: lately == getting my tonsils out :(
19:20 < stepcut> mae_: and a bunch of work on the URLT stuff
19:21 < stepcut> mae_: and some stuff with formlets+HSP
19:21 < stepcut> mae_: and trying to build a site that has enough users that multimaster and sharding looks important ;)
19:25 < mightybyte> stepcut: Well, that hasn't been too hard for me.
19:25 < stepcut> mightybyte: good to hear :)
19:26 < mightybyte> It took about a month to exceed the memory on my Linode 360
19:27 < stepcut> heh
19:27 < mightybyte> And now the Linode 540 may be getting too small.
19:28 < mae_> mightybyte: hooray!
19:28 < stepcut> mightybyte: hopefully your revenue is outpacing your hosting costs?
19:28 < mightybyte> Granted, there is still plenty of scaling room in the single server arena, but it's a little worrying that there is no scaling solution that I can use right now.
19:28 < mightybyte> No, my revenue is still pretty close to zero.
19:29 < mae_> stepcut: whats the latest site? :)
19:29 < mae_> stepcut: i'm just trying to improve the scalability of the low level stuff that happstack depends on currently
19:30 < mae_> stepcut: how would you feel if i removed all the header munging stuff from the low level http code
19:30 < mae_> where you had to set content-length explicitly
19:30 < stepcut> mae_: we are doing some contract work, and a industry specific CMS
19:30 < mightybyte> So far I've earned a grand total of $5.26 through Amazon affiliate ads.
19:30 < mae_> stepcut: neat
19:30 < stepcut> mightybyte: did you see my long post about possible sharding solutions?
19:30 < mightybyte> Yes
19:31 < stepcut> mightybyte: obviously, nothing usuable yet
19:31 < mightybyte> Right now obviously revenue is more important for me, but it's annoying to have scaling issues keep knocking on my door.
19:31 < stepcut> mightybyte: do users see other users data, or only their own? Could you simple split the data across two disconnected servers?
19:32 < mightybyte> Yes, users can see data from all over.
19:33 < mightybyte> The biggest part of the data is in one IxSet with several different indexes.
19:34 < stepcut> mightybyte: does the amount of RAM being used seem reasonable for the amount of data you believe you have? For example, if you had, say, 300 users does 1MB per user seems a bit high?
19:34 < mightybyte> Users can see their own data, anonymized forms of everyone's data (that makes up the front page), and non-anonymized forms of data that other users have chosen to make public.
19:35 < mightybyte> I haven't done detailed calculations there but it doesn't seem absurd.
19:35 < stepcut> ok
19:35 < mightybyte> I have a little over 1100 registered users right now.
19:36 < mightybyte> And top says the app is using 378m RES and 410m VIRT.
19:37 < stepcut> so you think those users have entered 300k worth of data on average? Aren't they just putting in fairly simple things?
19:37 < stepcut> numbers of reps, name of exercise, stuff like that?
19:37 < mae_> profile pics? :)
19:38 < stepcut> mae_: I would think the image data would like on the drive, not in RAM..
19:38 < mae_> stepcut: sure, but it could live in either place :)
19:38 < stepcut> mae_: though, obviously that one happens if you code things that way...
19:38 < stepcut> mae_: yeah, I forgot that you have to actually save the files to disk if you want them that way :)
19:38 < mae_> have you guys tried "trunk" yet? (still haven't found a darcs-ish term i feel comfortable for this yet)
19:38 < mightybyte> stepcut: Well, the latest checkpoint file is 5.4 megs.  I'm not sure how much of the additional memory is legitimate usage by some of the calculated IxSet indices.
19:39 < mae_> for fileServe? (or do you not use fileServe)
19:39 < mae_> you know what would be cool
19:39 < mae_> instance Serializable Handle
19:40 < stepcut> mightybyte: yeah, 5.4MB of real data sounds reasonable. You might consider doing some memory profiling and seeing what is happening. I am not sure that 60x as much usage in RAM is reasonable
19:40 < mightybyte> Yes, it's all simple data.  The checkpoint file is more along the lines of how much data the users probably have stored.
19:40 < mae_> i.e. the data interface is a handle (when needed) but the state system stores it in an arbitrarily named file
19:41 < mae_> then we just hand the handle to sendfile
19:41 < mae_> bam
19:41 < mae_> (like emeril)
19:41 < mightybyte> stepcut: I've done some profiling already.  I'll have to do some more because I don't have the results, but one of the biggest allocators was Happstack.Data.IxSet.flatten
19:42 < stepcut> mightybyte: I can believe that
19:42 < mae_> mightybyte: for freeform text fields, are you using string or bytestring?
19:42 < mightybyte> String
19:42 < mae_> that is probably also a factor.
19:42 < mightybyte> Yeah, I've thought about that.  But I don't have a ton of strings.
19:42 < mae_> i don't know what the multiplier is but I know that byestring is significantly more compact
19:42 < mightybyte> Much of the data is numeric.
19:42 < stepcut> mae_: yeah, it's around 10-12x
19:43 < mightybyte> Yeah, and it's worse since I'm running on a 64-bit machine.
19:43 < mightybyte> I've had requests to translate my site into another language.  If I do that, Strings actually might be needed.
19:45 < mae_> yep
19:46 < mae_> all around bytestring is more efficient though
19:46 < mae_> even on the http side
19:46 < mae_> less memory needed
19:46 < mae_> its not just state
19:46 < mae_> I would start using bytestring everywhere now
19:46 < mae_> and see if it makes a difference
19:46 < mae_> you might be surprised
19:47 < mae_> stepcut, mightybyte: do you guys use fileServe at all?
19:47 < stepcut> mae_: well, Int is more efficient than Integer... unless you need numbers bigger than 2^29 (or whatever the specs says)
19:48 < mae_> stepcut: what are you referring to?
19:48 < mae_> context?
19:48 < mightybyte> mae_: I use fileServe to serve a few static images, js, etc.  But that's it.
19:49 < mae_> mightybyte: ok, great, well in the next version any static files should be given a significant boost in efficiency
19:49 < mae_> memory-wise and speed-wise
19:49 < mightybyte> mae_: I agree that bytestring could probably help, but I don't think that explains the underlying 60x difference between checkpoint size and RAM usage.
19:50 < mae_> mightybyte: have you done any profiling yet?
19:50 < mightybyte> mae_: Ok.  I haven't been paying much attention to the fileServe talk going on.
19:50 < stepcut> mae_: you said that byestring is more efficient than String. But they aren't really the same thing...
19:50 < mightybyte> mae_: Yeah, but I'm going to try to do some more tonight.
19:51 < mae_> stepcut: oh ok, yeah i understand that. I was referring to cases where you have large text items to store (or arbitrary data)
19:51 < mae_> i mean you can put utf-8 in bytestring
19:51 < mae_> which is more efficient than string
19:51 < mae_> ok?
19:51 < mae_> happy? :)
19:51 < stepcut> mae_: Strings give you unicode characters, ByteString gives you ... bytes. So, ByteString.length does not return the number of characters in a 'bytestring' but rather the number of bytes required to represent the string as utf-8...
19:52 < mae_> right
19:53 < stepcut> mae_: so, if you need to do string operations (find the number of characters, toUpper/toLower), then you have some issues :)
19:53 < mae_> sure
19:54 < stepcut> but, I think there is work on a library that deal with that... Something which adds a Unicode layer on top of ByteString...
19:55 < stepcut> mae: I do use fileServe
19:56 < stepcut> mae: I also have a hacked version that allows me to serve individual files where the name of the file on the disk is not part of the request path
19:57 < stepcut> -- serve a single file with the given mime-type
19:57 < stepcut> serveFile :: String -> FilePath -> Request -> WebT IO Response
19:57 < stepcut> serveFile mimeType fp rq =
19:57 < stepcut> the Request is used to check for the 'if-modified-since' header
19:57 < stepcut> it would be nice if happstack-server had that as a native function
19:58 < stepcut> also, we have the 'dir' guard, etc, but no, 'host' guard.
19:58 < stepcut> well, *I* have a 'host' guard, but I should put that in happstack-server so that *we* have it ;)
20:01 < stepcut> anyway, time to make cake
20:02 < mae> stepcut: have you seen the recent commits?
20:02 < mae> I basically added a new constructor to Request
20:02 < mae> called SendFile
20:03 < mae> sorry not Request
20:03 < mae> Response
20:03 < mae> http://patch-tag.com/r/happstack/snapshot/current/content/pretty/happstack-server/src/Happstack/Server/HTTP/Types.hs
20:03 < mae> line 100-106
20:04 < mae> one of the nasty issues I have been running into
20:04 < mae> is that I notice there are various places where Content-Length is attempted to be set at
20:04 < mae> I want to blast all that black magic
20:04 < mae> and require the application developer (or at least at that level, not in the core http code) to set the content-length
20:05 < mae> maybe this will help performance also
20:05 < mae> so we don't have to keep calling length on the bytestring
20:06 < mae> so serveFile could be easily rewritten to use this new Response constructor
20:07 < mae> I want the separation between http and application code to be a little bit more pronounced
20:07 < mae> ie the http code never does anything "smart" with headers
20:08 < mae> but we can build a high level api which does this sort of thing
20:08 < mae> (and hopefully does this sort of thing with more well-defined semantics)
20:11 < stepcut> i am not opposed to the idea. I am not very familiar with the innards of happstack-server though, so I have no idea what the effect will be
20:29 < gwern> stepcut: there's lots of stuff for utf8 on bytestring; there's the new text lib, and I think utf8-string is over bytestring
20:47 < stepcut> gwern: I believe I was thinking of the new text lib
--- Log closed Mon Jul 20 00:00:29 2009