Author: Deepak Shenoy (CD paper for the Annual Borland Conference, San Jose, 2003)

Download the source code for this article (Compressed ZIP file, 555 KB)

Abstract

Learn to build better and more responsive Web Applications using Delphi. What we'll cover is:
  • Testing tools to test your web application's performance.
  • Caching Webmodule instances in ISAPI applications
  • Using the ISAPI Thread Pool
  • Database Engines and Web Application Performance
  • Using Database connection pools
  • Data compression and output buffering
  • ASP.NET optimization techniques

Introduction

Web applications are usually CGI, ISAPI or Apache modules that serve web content. While the general concept of writing web applications is something we are familiar with, the devil lies in the details. During development, our energies are focussed on building the application, and we usually falter in testing web applications for much higher loads, response time etc. This paper focusses on helping you build applications that can handle higher loads and respond faster.

Note: What we're going to cover in this paper is ISAPI DLL based web applications. (There are ASP.NET optimizations too, further down) If you are using CGI application, this content may not apply.

Testing tools and framework

To be able to test our web applications for higher loads, we need to use an automated testing tool. I'm going to use Microsoft's Web Application Stress Tool (WAST).

We're going to test a simple application - the iserver application in the Demos\Internet\WebServ\IIS folder within your Delphi installation directory. Let's first compile it and place it under the scripts folder.

Once done, we will run WAST. WAST comes up with an initial screen like so:

Now hit Record. A web browser comes up, and you can navigate to the URL of our server application, which in my machine is http://localhost/scripts/iserver.dll .I've hit a couple of links and then gone back to WAST, and stopped recording. Here's how my screen looks:

You'll notice that at the bottom you can see various entries for the pages I visited. You can change any entry, delete it, add a new one etc.

Now go to the settings node on the tree on the left. You will notice a screen like:

I've changed the settings as above. Basically this means that:

  1. The script should run for 2 minutes
  2. We are going to use 50 threads, each of which creates 2 socket connections to the server.
  3. We'll use a random delay of upto 5 seconds…this way all threads are not created at once.
  4. We're going to restrict bandwidth to 56 K - otherwise the service may appear faster in our testing compared to real-life.
Let's then run this test. Click on the script name ("New Recorded Script") and then select Run from the Scripts menu. The test will run for two minutes, and will give us the results as a report. You can view reports by clicking Reports in the View menu.

For the default application, compiled as is, the report looks like this:


Overview
==================================================================
Report name:                  3/17/2003 5:11:12 PM
Run on:                       3/17/2003 5:11:12 PM
Run length:                   00:02:05

Web Application Stress Tool Version:1.1.293.1

Number of test clients:       1

Number of hits:               434
Requests per Second:          3.64

Socket Statistics
-----------------------------------------------------------------------
Socket Connects:              2157
Total Bytes Sent (in KB):     174.81
Bytes Sent Rate (in KB/s):    1.47
Total Bytes Recv (in KB):     434.40
Bytes Recv Rate (in KB/s):    3.65

Socket Errors
-----------------------------------------------------------------------
Connect:                      0
sSend:                         1657
Recv:                         5
Timeouts:                     0

Result Codes
Code      Description                   Count     
=======================================================================
200       OK                            111
500       Internal Server Error         318      
NA        HTTP result code not given    5         


Page Summary
Page                            Hits      TTFB Avg  TTLB Avg  Auth     
=======================================================================
GET /scripts/iserver.dll        74        3002.95   3023.19   No        
GET /scripts/iserver.dll/custo  67        6085.73   6613.07   No        
GET /scripts/iserver.dll/runqu  53        3134.34   3786.60   No        
GET /scripts/iserver.dll/custo  56        5974.11   6536.77   No        
GET /scripts/iserver.dll/runqu  24        3518.42   3518.75   No        
GET /scripts/iserver.dll/custo  36        6345.33   6640.42   No        
GET /scripts/iserver.dll/runqu  40        4295.88   4739.80   No        
GET /scripts/iserver.dll/runqu  24        3191.42   3310.71   No        
GET /scripts/iserver.dll/runqu  35        3922.29   3922.54   No        
GET /scripts/iserver.dll/custo  25        1656.96   1657.32   No        
Let's analyze this report.

The Number of Hits is important: it gives you an indication of how many requests were sent, and the Requests per second tells you the average density of requests.

Socket Errors are obviously important, and in this case, we have 1657 send errors and 5 receive errors. While the errors could be in transmission, in this case it might just be because the application was too busy to respond.

Result Codes - 200 is "OK". 500 is "Internal Server Error" - maybe because the server couldn't handle the load.

Page Summary - In this it shows you the number of hits for each page, and the statistics of Time To First Byte (TTFB) Average, along with the Time To Last Byte(TTLB) Average. This can tell you how responsive the application is.

I'm going to show you only the report from now on, as we proceed to optimize this application. Also I'll only show the important information. (No more screenshots, that is)

Caching instances

When your web application is called (as an ISAPI, NSAPI or Apache module), the Application spawns a new thread for a request. Within the context of this thread, your main web module is created (in this case, the TCustomerInfoModule in main.pas) When the request is handled and response is sent, the created instance is then:
  1. Freed, if you set Application.CacheConnections=False. (It's true by default)
  2. If Cache Connections is true, then the webmodule is cached in an internal array for reuse at the next request.
Now what if you have one request being handled when another request comes in? The application will look inside the cache - if a webmodule is available, it is used. Otherwise a new thread is created, with a new instance of a the webmodule is created to handle the request. This happens until the number of currently active connections reaches the value in Application.MaxConnections, after which an exception is raised which says "Too many active connections". This shows up as an Internal Server Error on the browser.

Now the default for Application.MaxConnections is 32 - so if you have any more than 32 connections active at any time, you will see internal server errors. We seem to be doing much more in the test - so let's see the results if we increase this value to 100.


Overview
-----------------------------------------------------------------------
Number of hits:               374
Requests per Second:          3.11

Socket Errors
-----------------------------------------------------------------------
Connect:                      0
Send:                         0
Recv:                         0
Timeouts:                     0


Result Codes
Code      Description                   Count     
=======================================================================
200       OK                            216       
500       Internal Server Error         158       


Page Summary
Page                            Hits      TTFB Avg  TTLB Avg  Auth      
===================================================================
GET /scripts/iserver.dll        100       1825.33   1910.08   No        
GET /scripts/iserver.dll/custo  97        2228.18   2229.77   No        
GET /scripts/iserver.dll/runqu  88        2641.23   2664.10   No        
GET /scripts/iserver.dll/custo  58        2801.14   2861.98   No        
GET /scripts/iserver.dll/runqu  26        2587.65   2587.77   No        
GET /scripts/iserver.dll/custo  4         980.50    980.50    No        
GET /scripts/iserver.dll/runqu  1         3594.00   3594.00   No        
GET /scripts/iserver.dll/runqu  0         0.00      0.00      No        
GET /scripts/iserver.dll/runqu  0         0.00      0.00      No        
GET /scripts/iserver.dll/custo  0         0.00      0.00      No        
You'll notice that we've gained in terms of lower error rate, but we're still seeing errors. You can do a trial and error to get to a value you can live with. But let's try some more optimization methods.

The ISAPI Thread Pool

ISAPI applications can use their own thread pooling mechanism to maintain application threads. Delphi provides its own unit, ISAPIThreadPool, for this - all you have to do is add it to your uses clause, just below the line that contains "ISAPIApp" in your .dpr file. I've now reduced the test time to one minute, mainly because it took too long to run and have to wait.

Note for Delphi 6 users: Steve Trefethen has provided an updated version of this unit for D6 users. The original unit does nothing spectacular.

Results:


Overview
=======================================================================
Number of hits:               751
Requests per Second:          12.50

Result Codes
Code      Description                   Count     
=======================================================================
200       OK                            560       
500       Internal Server Error         191       



Page Summary
Page                            Hits      TTFB Avg  TTLB Avg  Auth
=======================================================================
GET /scripts/iserver.dll        108       2406.58   2409.21   No        
GET /scripts/iserver.dll/custo  102       3232.81   3233.85   No        
GET /scripts/iserver.dll/runqu  102       5705.95   5708.00   No        
GET /scripts/iserver.dll/custo  100       7390.03   7391.45   No        
GET /scripts/iserver.dll/runqu  97        7846.37   7847.36   No        
GET /scripts/iserver.dll/custo  84        6089.39   6091.00   No        
GET /scripts/iserver.dll/runqu  70        3101.99   3102.36   No        
GET /scripts/iserver.dll/runqu  44        1688.95   1689.32   No        
GET /scripts/iserver.dll/runqu  26        680.85    681.23    No        
GET /scripts/iserver.dll/custo  18        314.83    315.28    No        
Notice that we could handle a lot more connections in one minute! (I got an average of around 350 per minute, which is still higher than the 175 per minute average I saw on the earlier application)

We're doing slightly better on internal server errors, and if you look closely in the reports section, you'll notice the errors are all on the database pages. This might be related to contention or locks, and the DBDEMOS database being in paradox might result in other problems too.

Conversion to a better database engine

Let's move the DBDEMOS database to an Interbase database and retest. I've also converted to using IBX components instead (since the BDE isn't very reliable in multithreaded apps)

Overview
=======================================================================
Number of hits:               371
Requests per Second:          6.17

Socket Statistics
-----------------------------------------------------------------------
Socket Connects:              1145
Total Bytes Sent (in KB):     148.00
Bytes Sent Rate (in KB/s):    2.46
Total Bytes Recv (in KB):     854.50
Bytes Recv Rate (in KB/s):    14.22



Result Codes
Code      Description                   Count     
=======================================================================
200       OK                            363
NA        HTTP result code not given    8         

Page Summary
Page                            Hits      TTFB Avg  TTLB Avg  Auth     
=======================================================================
GET /scripts/iserver.dll        89        3899.07   3901.65   No        
GET /scripts/iserver.dll/custo  49        6327.51   6330.12   No        
GET /scripts/iserver.dll/runqu  54        7037.46   7038.59   No        
GET /scripts/iserver.dll/custo  40        8859.60   8863.73   No        
GET /scripts/iserver.dll/runqu  28        11089.82  11090.32  No        
GET /scripts/iserver.dll/custo  20        7692.70   7696.75   No        
GET /scripts/iserver.dll/runqu  27        8359.89   8360.96   No        
GET /scripts/iserver.dll/runqu  21        9874.52   9875.48   No        
GET /scripts/iserver.dll/runqu  22        6795.91   6797.86   No        
GET /scripts/iserver.dll/custo  21        6338.81   6341.81   No        
This is much better - no Internal server errors! You can now try to see what's causing all that delay. We're getting an average of around 4 - 11 seconds per page, which is not all that great. Note that the development machine I use isn't well configured for web pages - it has IDE hard drives, 133 Mhz memory (512 MB) and a single processor PIII 667 Mhz. Today's machines are much faster and memory buses are faster too, so you'll see much better performance on a higher scale. Also the bandwidth is restricted to 56 K in testing, which is less than average.

Testing other parts of the framework

We've only tested two pages in the application - there are others that use blob fields etc. Let's build a test plan for these pages and check.

Overview
=======================================================================

Number of hits:               504
Requests per Second:          8.38
Result Codes
Code      Description                   Count     
=======================================================================
200       OK                            504       


Page Summary
Page                            Hits      TTFB Avg  TTLB Avg  Auth     
=======================================================================
GET /scripts/iserver.dll        80        1189.72   1228.75   No        
GET /scripts/iserver.dll/custo  80        6942.93   6944.74   No        
GET /scripts/iserver.dll/runqu  80        11354.19  11356.20  No        
GET /scripts/iserver.dll/emplo  80        10721.46  10724.09  No        
GET /scripts/iserver.dll/bioli  80        8370.70   8373.29   No        
GET /scripts/iserver.dll/getim  74        6645.24   6673.00   No        
GET /scripts/iserver.dll/runqu  28        6723.64   6724.79   No        
GET /scripts/iserver.dll/getim  2         6382.50   6407.50   No        
This is only slightly better - I've used a different thread pooling unit from http://www.delphi3000.com/articles/article_1693.asp that seems to give a better performance than the Borland unit.

Using a database connection pool

Right now, we have a database connection on each webmodule, which isn't very easy on memory usage. In this case you have 250 cached web modules, which means 250 connections to the database. If you don't want this, you can create a database connection pool, say of around 50 connections. This will impact performance but the service will be more reliable since it won't take so much memory and resources.

Note that if you use ADO, you don't have to do this - there is built-in connection pooling in ADO.

I've created a resource pool datamodule as a non-web datamodule - this will be a singleton instance. Before you open any query, you must get a connection from the pool. The pool will grow to a max. size of 50, and maintains a list of active and inactive connections. Each connection once created is never freed - it only adds to the pool, and once inactive will be assigned to the next request. If the pool is full when a request comes in, the system waits for a preassigned amount of time (say 10 seconds, which is the highest time that it takes to serve a page as per results above) and if there's still no connection available, it flags an error that the Server is too busy.


Overview
================================================================================
Report name:                  9/6/2003 5:16:34 PM
Run on:                       9/6/2003 5:16:34 PM
Run length:                   00:01:17

Web Application Stress Tool Version:1.1.293.1

Number of test clients:       1

Number of hits:               770
Requests per Second:          10.25

Result Codes
Code      Description                   Count     
================================================================================
200       OK                            770       


Page Summary
Page                            Hits      TTFB Avg  TTLB Avg  Auth      Query     
================================================================================
GET /scripts/iserver.dll        158       2548.14   2555.13   No        No        
GET /scripts/iserver.dll/custo  129       6387.86   6399.35   No        No        
GET /scripts/iserver.dll/runqu  83        11023.45  11024.47  No        No        
GET /scripts/iserver.dll/emplo  81        9416.53   9421.02   No        No        
GET /scripts/iserver.dll/bioli  80        4574.57   4576.85   No        No        
GET /scripts/iserver.dll/getim  80        3991.64   4011.29   No        No        
GET /scripts/iserver.dll/runqu  80        9689.76   9692.16   No        No        
GET /scripts/iserver.dll/getim  79        6849.80   6864.00   No        No        
This test has been run for 15 more seconds but you can see the performance improvement in terms of requests per second, timing and number of hits handled.

Techniques for optimization within the VCL

  1. String searching
    The PathInfo received is whatever follows the URL (for instance, in http://test.com/search, "/search" is the PathInfo). Delphi ISAPI Web Modules have a TCollection that stores a list of valid paths that have handlers. If your web application has a large number of PathInfo handlers, Delphi will search linearly through the list of TCollection items to find the right handler. You can optimize this by writing an inherited WebModule that will use something like Binary Search, and keep the TCollection sorted. This can give you a big improvement.

  2. Tag handling in responses
    The TPageProducer and its descendants provide a way to handle custom tags such as <#TAGNAME>. You can see how this works in the HttpProd unit, in the ContentFromStream handler. This uses a token based approach - so it actually parses the entire HTML script! You can definitely optimize this by changing this to a different mechanism for string searching. There are a number of fast string search and replace utilities which will make your application that much faster. Plus, you can write content using your own handlers instead of using TPageProducer or TDatasetPageProducer. While this may not be visual, it definitely can improve performance.
    Here are a few string search routines:
    • FastStrings: A free string routine library with source. Postcardware.
    • Hyperstring: A commercial string utilities library

Changes in Internet Server Manager

You can change the properties of a web site to handle larger number of clients. Bring up Internet Services Manager from Administrative Tools, and you'll see what you can change. If you open the Web Site properties, it has a performance tab, in which you can:
  1. Change performance parameters - In the performance tab, you can set operational features for the no. of visits you expect.
  2. You will probably need to maintain logs, but logging takes that much extra time. Keep your log information as consise as possible, and make sure log files don't grow too big on the server.
  3. In your virtual directory where you store your ISAPI, you can set Application Protection to Low (IIS Process), Medium (Pooled), or High (Isolated). This generally tells you which context the DLL gets loaded in - Low gets your DLL into the IIS Process Space, High loads the ISAPI into a different (COM+) application. Obviously Low will have the best performance, and High the worst. But the trade-off is reliability - a failed IIS process can crash the web server.
  4. Application Configuration - In the "App Options" dialog box, uncheck "Enable Session State". This is only for ASP applications, you should disable it for your ISAPI applications.

Data Compression

Read David Intersimone's excellent article about compressing data before sending. Remember that this causes extra server load since it has to now compress the packet before sending, so you must use compression only for pages that involve transfer of large amounts of data.

Note that compression is a CPU intensive task, and therefore you have to analyze where your bottleneck is - Compression may not solve your problem if your bottleneck is CPU usage, but it might make a difference if the issue really is a limited amount of bandwidth.

Buffering Output

Usually a user notices better performance if the web browser receives content faster - so instead of buffering all output before sending it back, you can send back at least the HTML tags that constitute the header and title, so that the user gets to see activity immediately. You can flush the response by using Response.SendStream.

Database schema optimization

For web applications that connect to databases and run queries, you will see a performance gain by optimizing the database itself. Adding indexes, views and using plans will result in shorter query time, which will decrease the total time it takes to process a web request.

How do you do this? Some pointers:

  1. Ensure all tables have primary keys. Single field primary keys are best in my opinion - saves you the trouble of maintenance, but whatever you choose, make sure the table has a primary key defined. This improves location performance.
  2. When you have a link between two tables (usually a field in one table links to the other table's primary key) - use a foreign key, and perhaps an index on the foreign key depending on how often the link is used in your application SQL.
  3. I assume you're using an SQL database, and most SQL databases support SQL "plans". This gives you input about how your SQL is parsed by the database and what indexes, keys etc. the database uses to merge data. Optimally a plan should use the maximum number of indexes and avoid having to merge intermediate results etc. You can tweak the SQL in that manner, and in some database engines you can even specify the plan (for instance if you are only expecting to show a few rows, you can use a plan that will return only the top few rows) I can't give you examples here - it's out of the scope of this paper.
  4. Use views to return complex SQL based data - this method not only reduces SQL parsing time, it also ensures a pre-optimized data view for you. Most database engines will pre-compile and pre-optimize a view so repeated view access is faster.

Further optimizations

There are more optimizations you can do which will make your applications faster.
  1. Hardware: Better hardware is more scalable - a faster bus, faster hard disk, larger caches, multiple processors etc.
  2. Server farms - you can use multiple server machines when a larger number of requests come in - for this you might need a load balancing server which is available from multiple vendors. Remember that you shouldn't store state in local variables or files - all state should go into databases since the next request may be handled from another machine.
  3. Data Farms: What if your bottleneck is the database engine itself? Then, you can have database "farms" - multiple data servers that will host the same database engine and the same data. You can synchronize your data across these servers by using what's known as "merge replication". Many database servers including Oracle and SQL server support this feature, and you can even set these databases up for scheduled merges every few hours.
  4. Fast Internet Connections: IIS depends quite heavily on the speed of your connection, and increasing speed or optimizing your network connection can make a difference to your application's response time.
  5. TCP/IP and IIS Optimization: Some very useful tips are available from this site.

Optimization techniques for ASP.NET

If you're using ASP.NET with C#Builder or Delphi for .NET, there are some changes you can make to ensure your application is scalable. Some links are important to consider I'll talk about some of these techniques below.

1. Disable session State

If you need to identify which user is currently requesting your page, you'd want to use the ASP.NET session state technology. But many pages you write will not need this functionality and therefore, session state is of no importance to these pages. There is a downside to having session state - and that is server side performance. The server will try to "identify" the current user when session state is enabled. Unfortunately session state is enabled by default, so you can disable it for a page using the @Page directive in your ASP.NET page.

<%@ Page EnableSessionState="false" %> 
What if you have to use session state? In ASP.NET you can choose to have session state stored using an in-process server or an out-of-process server, and even in a database. Obviously perfomance suffers as you go out of process, and further lower when you choose the database approach. But these latter options have upsides too (reliability and redundancy) so choose your option carefully.

2. Use Page.IsPostBack() for round trips

Your page may have a ton of controls but you may have to populate these controls only the first time from a database. Now on every control's "event" you will find that the data is sent back to the page as the event handler runs on the server. But you don't need to populate the controls each time! (Perhaps you'll just redirect to another page) You can figure out if this is a "postback" call, i.e. if data on the page has been posted back to the server - by called Page.IsPostBack() which returns true if this is a postback call. That could save you a ton of time when processing postbacks.

3. Optimize the use of "ViewState".

If you look at most ASP.NET pages, you'll see something like this :

<input type="hidden" name="__VIEWSTATE" value="dDwzNjA1NzEwMDg
7dDw7bDxpPDA+O2k8MT47PjtsPHQ8O2w8aTwxPjtpPDI+Oz47b
Dx0PDtsPGk8MD47PjtsPHQ8QDxHb3REb3ROZXQ6IFRoZSBNaWN
yb3NvZnQgLk5FVCBGcmFtZXdvcmsgQ29tbXVuaXR5O1xlO1xlO
z47Oz47Pj47dDxAPFxlO1xlOz47Oz47Pj47dDw7bDxpPDA+O2k
8ND47aTw2Pjs+O2w8dDw7bDxpPDE+O2k8Mz47aTw3Pjs+O2w8d
Dw7bDxpPDA+Oz47bDx0PDtsPGk8Mz47PjtsPHQ8O2w8aTwxPjt
pPDM+O2k8NT47aTw2Pjs+O2w8dDxw
...
If your page doesn't have controls then you don't need this ViewState variable at all! Pages that are purely informative or reports are just wasting space on this ViewState variable. ANd this is especially true for DataGrids which have a lot in the ViewState. You can disable ViewState on a per-control basis by using the "EnableViewState" attribute.
 <asp:datagrid EnableViewState="false" ... runat="server"/>
Or, you can use this on the entire page:
 <@Page EnableViewState="false"/>
4. Pre-compile your web page.

The first time a web application is run, or if a file has changed, the ASP.NET runtime recompiles the ASP.NET files. This can take a while, so after you upload changes to your site, load your browser up requesting a page from each directory on your server. This ensures all the pages are "pre-compiled" so that the next user that requests a page doesn't have to wait for compiles.

5. Use .NET managed components instead of COM components

If you're used to ASP, then you might be using a number of COM components in your ASP code. Porting this to .NET might make you believe that you can reuse those COM components through COM Interop. But this has a significant effect on performance. Most commonly used COM components have .NET managed equivalents, and you can use them instead. For instance, instead of using Scripting.FileSystemObject, you can convert to using the classes in the System.IO namespace. If you have written Delphi based COM components, consider converting them to managed code using Delphi for .NET for improved performance.

6. Make optimum use of caching.

In ASP.NET there's multiple caching methods available:

  • You can cache an entire page.
  • You can cache the output of a control (if it's not something that changes often).
  • You can cache application data or user data.
Caching significantly improves performance, especially when data is static and you don't expect it to change. For instances lookup data in tables might not change at all - in such cases simply cache this on the server so you don't have to make a costly connection to the database.

See a huge set of links on caching in .NET.

7. Tune your "web.config" files.

The web.config files for your web site contains many juicy bits of information that you can use to optimize your site. You can disable NTLM authentication (improves performance), change isolation patterns for your application etc. Read the MSDN documentation for what options are available and what might apply to you.

Some other links:

Conclusion

You can write ISAPI applications in Delphi, and there is scope for a lot of optimization. I hope this paper has helped you identify potential scalability problems in your application, and given you enough information to test for such problems. There are of course innumerable things you can do specific to your application - you might get better performance using different SQL in your queries, you might see a vast improvement by using a specific version of thread pooling etc. In general, web applications can be optimized to a large extent by changing the way you program web applications, and you must build such applications with one eye on scalability.