Measurements of Standing Query Server Performance

relevance threshold set to 0.07 for all measurements unless relevance is the parameter being varied, or unless AutoThresholding is used.

1/24/00 Plot of total elapsed processing time for relevance factors from .01 to .10.
1/24/00 Stacked bar graph breakdown of processing times by relevance factor.
1/24/00 Stacked bar graph breakdown of processing times by relevance factor excluding document parsing.
1/24/00 Plot of total elapsed processing time for 3 to 18 processors.
1/24/00 Stacked bar graph breakdown of processing times for 3 to 18 processors.
1/24/00 Stacked bar graph breakdown of processing times for 3 to 18 processors excluding document parsing.

1/25/00 Plot of total elapsed processing time for relevance factors from .01 to .10.
1/25/00 Stacked bar graph breakdown of processing times by relevance factor.
1/25/00 Stacked bar graph breakdown of processing times by relevance factor excluding document parsing.
1/25/00 Plot of total elapsed processing time for 3 to 18 processors.
1/25/00 Stacked bar graph breakdown of processing times for 3 to 18 processors.
1/25/00 Stacked bar graph breakdown of processing times for 3 to 18 processors excluding document parsing.

1/26/00 Plot of total elapsed processing time for relevance factors from .01 to .10.
1/26/00 Stacked bar graph breakdown of processing times by relevance factor.
1/26/00 Stacked bar graph breakdown of processing times by relevance factor excluding document parsing.
1/26/00 Plot of total elapsed processing time for 3 to 18 processors.
1/26/00 Stacked bar graph breakdown of processing times for 3 to 18 processors.
1/26/00 Stacked bar graph breakdown of processing times for 3 to 18 processors excluding document parsing.

2/17/00 Plot of total elapsed processing time for relevance factors from .01 to .10.
2/17/00 Stacked bar graph breakdown of processing times by relevance factor.
2/17/00 Stacked bar graph breakdown of processing times by relevance factor excluding document parsing.
2/17/00 Plot of total elapsed processing time for 3 to 18 processors.
2/17/00 Stacked bar graph breakdown of processing times for 3 to 18 processors.
2/17/00 Stacked bar graph breakdown of processing times for 3 to 18 processors excluding document parsing.

All results above this point had various problems that makes the data suspect.


First good data for relevance and database scaling

2/21/00 Plot of total elapsed processing time for relevance factors from .01 to .10.
2/21/00 Stacked bar graph breakdown of processing times by relevance factor.
2/21/00 Stacked bar graph breakdown of processing times by relevance factor excluding document parsing.
2/21/00 Plot of total elapsed processing time for 3 to 18 processors.
2/21/00 Stacked bar graph breakdown of processing times for 3 to 18 processors.
2/21/00 Stacked bar graph breakdown of processing times for 3 to 18 processors excluding document parsing.

First reasonable data measuring how many standing queries system can support

Not all keywords in word table so some standing queries are not registering.
2/22/00 Plot of total elapsed processing time for 1 to 80 standing queries.
2/22/00 Stacked bar graph breakdown of processing times for 1 to 80 standing queries.
2/22/00 Stacked bar graph breakdown of processing times for 1 to 80 standing queries excluding document parsing.

Increased number of measurements from 40 to 300 to get better averaging

2/25/00 Plot of total elapsed processing time for relevance factors from .01 to .10.
2/25/00 Stacked bar graph breakdown of processing times by relevance factor.
2/25/00 Stacked bar graph breakdown of processing times by relevance factor excluding document parsing.

3/06/00 Plot of total elapsed processing time for 3 to 18 processors.
3/06/00 Stacked bar graph breakdown of processing times for 3 to 18 processors.
3/06/00 Stacked bar graph breakdown of processing times for 3 to 18 processors excluding document parsing.


Moved array allocation from stack to heap for 200 standing queries test

The allocation of the ConWt array new_con_wtp in addCentroid was causing a stack overflow because Linux has an 8MB limit on stack size by default.

3/07/00 Plot of total elapsed processing time for 30 to 200 standing queries.
3/07/00 Stacked bar graph breakdown of processing times for 30 to 200 standing queries.
3/07/00 Stacked bar graph breakdown of processing times for 30 to 200 standing queries excluding document parsing.


Same test as above - shows details of times on a document by document basis

Some strange effects here because the system was running out of memory.

3/08/00 Plot of total individual document processing time for 200 standing queries.
3/08/00 Stacked bar graph breakdown of individual document processing times for 200 standing queries.
3/08/00 Stacked bar graph breakdown of individual document processing times 200 standing queries excluding document parsing.


Reduced number of documents to 250 (plus original 50)

Now we're actually getting 1000 standing queries to work, but obviously things aren't working quite right still. Probably still running out of memory.

5/24/00 Plot of total elapsed processing time for 1 to 1000 standing queries.
5/24/00 Stacked bar graph breakdown of processing times for 1 to 1000 standing queries.
5/24/00 Stacked bar graph breakdown of processing times for 1 to 1000 standing queries excluding document parsing.


Here's a description of what I think is going on with memory allocation:
When a new document is added (after all the queries are registered) the function fetchVector(did) is called for every registered query. If there is no Vector in the cache then fetchVector calls DIDtoWFPs() which returns a list of word-frequency pairs which will contain as many entries as there are unique word stems in the document. For each word-freq pair a new weighting factor is calculated and added to the Vector. The Vector is a variable length array of ConWt values where a ConWt consists of a long plus a float. Thus each element in the Vector takes up 8 bytes. So if a document has 100 unique stemmed words in it, the associated Vector will be 800 bytes in length.
These Vectors are cached in a hash table. The Vectors are also added to a Centroid object, which is a subclass of a Vector. A Centroid contains all the elements of all the Vectors that are added to it. There is a Centroid for each Cluster. Since all the documents belong to some Centroid, all the Vectors get added to some Centroid and hence the amount of memory used is essentially double the amount used by the Vector caching.
So to calculate the amount of memory the SQS will allocate, use the following equation:

#docs x #SQ x (#unique_stemmed_words_in_doc x 8) x 2 = total memory allocated

In the tests run here we have 350 documents, 200 standing queries, and the average number of unique stemmed words per document looks to be around 350. So the total memory used will be:

350 x 200 x (350 x 8) x 2 = 392MB

Alternatively, you can say that each document will cause the standing query server to grow in size by 392MB/350 = 1.12MB (much larger than the documents themselves.)
The PC's in the computer cluster have 256MB of RAM and 256MB of swap space. When the tests start running there is about 300-400MB of space available (difficult to tell since Linux uses RAM for disk caching also.) Given that there are other parts of the code that also allocate memory, it's not too difficult to believe that the Vector caching and Centroids are the main reason we are using up all the memory when we get close to 350 documents, especially if we happened to have run across a sequence of larger documents. One document with 1000 words would itself be responsible for 3.2MB of memory. Looking at the size of the documents being added most are between 1000 and 8000 BYTES (not words!) in size, but there are a few around 40K BYTES in size.
The tradeoffs would appear to be between these four factors:

# documents clustered
Vector caching (speed vs memory)
memory usage/availability
# standing queries

Increase the # of documents clustered and you either need more memory (you could turn off caching) or have to reduce the number of standing queries. Increase the number of standing queries and again you either need more memory or have to reduce the number of documents clustered. Use caching and you have greater speed but less memory which reduces the number of documents clustered or the number of standing queries. Increase any one of them and the others are detrimentally affected.

During one of the test runs I plotted memory usage by the SQS versus number of documents processed. The graph looks like this (note that you have to add 50 to every point on the X axis since 50 documents are clustered when the query is registered): Memory Usage of SQS versus Number of New Documents Processed

Where things get weird in the upper right corner is where the PC was starting to run out of both RAM and swap space. The starting size of an SQS server is around 20MB before any queries are registered. I watched how the SQS grew in size during a similar test run and noted that it had grown to 53904KB after 162 queries had been registered. Since it started at 20KB this works out to (53904 - 20000)/162 = 209KB per query. Each query immediately clusters 50 documents so that is 209KB/50 = 4180B/document. Divide by 8 to get 522 unique stemmed words per document which seems a little high. However, these first 50 documents come from the serval Usenet news database whereas the new documents added to the cluster come from Clarinet news. Our Usenet news database had some problems in filtering during formation that led to containing some very large documents (1MB binary files for example where every line becomes a word).

Some things to try:
Try the same tests with just Clarinet news. Arne is building the database.
Turn off Vector caching to verify that it is the cause of the memory growth.
Try a test run with fewer new documents sent through the system so it will be able to support more standing queries.


EVERYTHING BELOW THIS IS AFTER CLUSTER, AGENTSERVER, AND MYSQL UPGRADE


Reduced number of documents to 50 (plus original 50), vector caching on

Relevance threshold set to 0.06

5/30/00 Plot of total elapsed processing time for 1 to 1000 standing queries.
5/30/00 Stacked bar graph breakdown of processing times for 1 to 1000 standing queries.
5/30/00 Stacked bar graph breakdown of processing times for 1 to 1000 standing queries excluding document parsing.

6/1/00 Plot of total elapsed processing time for 1 to 1300 standing queries.
6/1/00 Stacked bar graph breakdown of processing times for 1 to 1300 standing queries.
6/1/00 Stacked bar graph breakdown of processing times for 1 to 1300 standing queries excluding document parsing.

6/27/00 Plot of total elapsed processing time for 1 to 1600 standing queries.
6/27/00 Stacked bar graph breakdown of processing times for 1 to 1600 standing queries.
6/27/00 Stacked bar graph breakdown of processing times for 1 to 1600 standing queries excluding document parsing.

Reduced number of documents to 50 (plus original 50) and vector caching off

Relevance threshold set automatically

Didn't run this experiment.

Description of the experiment so far

Experiment Description


The results below show a comparison of Katya's Star Clustering algorithm and Fred's improvement on that algorithm (Linear Space Sampled Star algorithm) when run using a single standing query.

11/6/00 Plot of total elapsed processing time using Katya's algorithm.
11/6/00 Stacked bar graph breakdown of processing times using Katya's algorithm
11/6/00 Stacked bar graph breakdown of relevance and transmission times using Katya's algorithm.
11/6/00 Plot of total elapsed processing time using Fred's algorithm.
11/6/00 Stacked bar graph breakdown of processing times using Fred's algorithm
11/6/00 Stacked bar graph breakdown of relevance and transmission times using Fred's algorithm.


2/13/01 Plot of total elapsed processing time using Katya's algorithm.
2/13/01 Stacked bar graph breakdown of processing times using Katya's algorithm
2/13/01 Stacked bar graph breakdown of relevance and transmission times using Katya's algorithm.
2/13/01 Plot of total elapsed processing time using Fred's algorithm.
2/13/01 Stacked bar graph breakdown of processing times using Fred's algorithm
2/13/01 Stacked bar graph breakdown of relevance and transmission times using Fred's algorithm.


Graphs of Fred's algorithm with vector caching disabled.

2/22/01 Plot of total elapsed processing time using Fred's algorithm.
2/22/01 Stacked bar graph breakdown of processing times using Fred's algorithm
2/22/01 Stacked bar graph breakdown of relevance and transmission times using Fred's algorithm.


Graphs of varying relevance percentage for Fred's algorithm, varying the relevance from 0.00 to 0.20

11/15/00 Plot of total elapsed processing time for relevance factors from .01 to .20.
11/15/00 Stacked bar graph breakdown of processing times by relevance factor.
11/15/00 Stacked bar graph breakdown of processing times by relevance factor excluding document parsing.


A paper entitled Scalable Information Organization contains a description of the algorithm devised by Fred Reiss as an extension of Katya's Star clustering algorithm.


3/21/01 Plot of total elapsed processing time for 1 to 1500 standing queries.
3/21/01 Stacked bar graph breakdown of processing times for 1 to 1500 standing queries.
3/21/01 Stacked bar graph breakdown of processing times for 1 to 1500 standing queries excluding document parsing.


3/22/01 Plot of total elapsed processing time for 1 to 1600 standing queries.
3/22/01 Stacked bar graph breakdown of processing times for 1 to 1600 standing queries.
3/22/01 Stacked bar graph breakdown of processing times for 1 to 1600 standing queries excluding document parsing.

10/23/01

SQS.ppt Diagram of architecture of standing query server.
PersistentQueryC.ppt Peristant queries poster.