And Mystlord, Badjas is correct. Updates right now takes less than a second, but will grow as more data since the last full index is added. I actually had to index everything four times today to get everything how I wanted it.
New Search Engine - Now Live - Page 2
Forum Index > General Forum |
R1CH
Netherlands10340 Posts
And Mystlord, Badjas is correct. Updates right now takes less than a second, but will grow as more data since the last full index is added. I actually had to index everything four times today to get everything how I wanted it. | ||
Mystlord
United States10264 Posts
On December 23 2009 16:54 R1CH wrote: Spelling suggestions are something I've considered adding at a later date. Infix (partial) matching on a word is supported, but would substantially increase the index size. I want to see how it performs by default first. And Mystlord, Badjas is correct. Updates right now takes less than a second, but will grow as more data since the last full index is added. I actually had to index everything four times today to get everything how I wanted it. Ok then never mind . I don't think there are any major problems with the new search engine then. The edit thing might be a problem because we'd lose a lot of new material (I'm thinking updates to topics like Stylish FPVODs and Day[9] podcasts). Regardless, if I find any bugs, then I'll report them here. | ||
Cambium
United States16368 Posts
I've worked with Lucene extensively in the past, and it has top-notch performance, and it can be coded to address all of your problems. Maybe you can have a look when you are really bored over the holidays. I'll definitely have a look at Sphinx Good job re-writing the search engine though! Regarding the non-edit problem, can't you delete contents from the indices, and then simply re-index edited contents? | ||
Cambium
United States16368 Posts
Given that: Building indices from scratch takes 30 minutes Incremental updates take much less time What we can do is keep two sets of indices, call them A and B. Find a time when TL is least busy (i.e. least stress on the server) and call this time T. We incrementally update one set of indices, A, until time T; at time T, we completely remove the other set of indices, B, and rebuild it from scratch. While B is being built, we obviously still update A, and all search queries will be run against A. After B is complete, we "dump" A, and run all subsequent queries against B. And we switch B with A again at time T. This way, the indices will be 1-day stale in the worst case. Is this feasible? | ||
R1CH
Netherlands10340 Posts
EDIT: TL doesn't really have an off-peak time, however the indexer can use a configurable IOPS, iosize and delay so if we want it to take 4 hours to update, it can. Refreshing the index every day like this is certainly possible, but I'd rather avoid it since even with very conservative settings, it's still going to suck up a lot of RAM and IO. Maybe a weekly update is doable. | ||
Cambium
United States16368 Posts
On December 23 2009 17:10 R1CH wrote: When I say Sphinx indexes can't be modified, I mean it . It is on their todo list for a more flexible index format though. See http://www.sphinxsearch.com/docs/manual-0.9.9.html#conf-mva-updates-pool Ah that sucks. Hopefully they'll implement that soon | ||
Cambium
United States16368 Posts
On December 23 2009 17:10 R1CH wrote: EDIT: TL doesn't really have an off-peak time, however the indexer can use a configurable IOPS, iosize and delay so if we want it to take 4 hours to update, it can. Refreshing the index every day like this is certainly possible, but I'd rather avoid it since even with very conservative settings, it's still going to suck up a lot of RAM and IO. Maybe a weekly update is doable. Ha! edit: lucene owns~! (maybe sphinx won't pick this up) | ||
R1CH
Netherlands10340 Posts
| ||
Cambium
United States16368 Posts
| ||
writer22816
United States5775 Posts
| ||
R1CH
Netherlands10340 Posts
| ||
Cambium
United States16368 Posts
| ||
Cambium
United States16368 Posts
On December 23 2009 17:24 R1CH wrote: Perhaps, I'll need to play around with the indexer options to see if the throttling is effective. A full rebuild would be good every now and then regardless to minimize the size of the delta index. It would probably improve performance as well. Lucene definitely didn't like incremental indexing (and updates) as much as I'd liked. Maybe suggest an "optimize" function to re-organize the existing index if such method doesn't exist yet? | ||
R1CH
Netherlands10340 Posts
| ||
Ilikestarcraft
Korea (South)17717 Posts
On December 23 2009 16:07 R1CH wrote: Once a post is indexed, it's indexed. If a post is edited after the search engine has indexed it, any changes in the edit will not be searchable. Get your post right first time! none of my posts are going to be searchable now Thanks for the fix r1ch. | ||
Harem
United States11390 Posts
On December 23 2009 17:44 Ilikestarcraft wrote: none of my posts are going to be searchable now -_________- Anyways, fuck yes being able to search for 3 letter words now. Thanks R1ch. <33 | ||
darktreb
United States3016 Posts
On December 23 2009 16:23 R1CH wrote: I really don't think it's as big an issue as you think, most edits are made within a few minutes of a post being posted and there's a low chance it would get indexed in that timeframe. That's a really good point. Sounds good to me. | ||
ShaLLoW[baY]
Canada12499 Posts
| ||
H
New Zealand6137 Posts
explain it to me in terms a dumb person would understand | ||
NovaTheFeared
United States7212 Posts
| ||
| ||