Profilo di JamesJames McCaffreyBlogElenchi Strumenti Guida

Blog


26 gennaio

The Exponential Distribution in Software Testing

A couple of blog entries ago I described the Poisson Distribution and its applicability to software testing. A close cousin to the Poisson Distribution is the Exponential Distribution. Both distributions may arise when you have a situation with items arriving randomly and independently (this is called a Poisson Process). Examples could include HTTP connection requests arriving at a Web server, and bug reports arriving at a bug tracking database. The Poisson Distribution describes the number of items arriving in a given time interval. The Exponential Distribution describes the time between arrivals. Suppose that HTTP connections arrive at a Web server, on average, at a rate of one every 15 minutes. Suppose your standard unit of time is one hour. So, the arrival rate is 4 arrivals per hour. This is the rate parameter and is usually given the symbol lambda. The Exponential Distribution can be used to answer questions such as, "What is the probability that, after a particular HTTP request, the next request arrives within 30 minutes or less?" The Exponential Distribution can be constructed using the following equation: the probability that the "next" arrival occurs less than or equal to some time t equals 1 minus e raised to the minus lambda times t power, where lambda is the rate parameter and e is Euler's constant = 2.718281828 approximately. For example, the probability that the next arrival in my example above (where lambda equals 4.0) occurs within 30 minutes (= 0.5 hrs.) of any particular arrival is 1 - (e raised to the -4.0 * 0.5) = 0.8647 approximately. The screenshot below shows some examples with lambda equal to 4.0 arrivals per hour. For example, after any given arrival, the probability that the next arrival occurs in 6 minutes (i.e., 1/10 of an hour) or less is 0.3297. A mildly tricky part of using both the Exponential and Poisson distributions is keeping your units straight. In my example, the basic unit of time is one hour. But I could have recast everything so that the basic unit of time is 15 minutes (in which case lambda would be 1.0 arrival per 15 minutes). During my years as a software tester, I didn't have to use the Poisson and Exponential distributions very often, but when I did need them, using these distributions was very important. Should knowledge of the Poisson and Exponential distributions be a part of every software tester's skill set? Probably not. But should every software system testing effort have one or two testers who understand these two distributions? In most cases, probably yes.
ExponentialDistribution
22 gennaio

Java Considered Harmful to Software Testing

This past weekend I came across a brilliant short paper entitled, "Computer Science Education: Where Are the Software Engineers of Tomorrow?" at http://www.stsc.hill.af.mil/CrossTalk/2008/01/0801DewarSchonberg.html. The mini-abstract of the paper states, "It is our view that Computer Science (CS) education is neglecting basic skills, in particular in the areas of programming and formal methods. We consider that the general adoption of Java as a first programming language is in part responsible for this decline. We examine briefly the set of programming skills that should be part of every software professional’s repertoire." This statement reflects what I have been seeing for years: university computer science graduates are coming to industry (meaning Microsoft) less and less prepared to enter roles as top-level software developers and test engineers. And I am certain that the widespread use of Java in college computer science curriculums plays a role in this problem. So, notice the title of this entry should probably be more along the lines of, "Java Considered Harmful to all of Computer Science" but I am particularly concerned about the area of software testing. The paper I mention above clearly articulates several problems with Java but they can be whimsically summarized by what my good friend Doug (a top systems developer at Microsoft) said to me last night: "Java takes the science out of computer science." If I were the hypothetical King of College Computer Science, in addition to classes in Calculus, Probability, Statistics, Discrete Math, Algorithms, and Data Structures, I would require all computer science majors to take at least one class in C Language Programming, and Assembly Language Programming, and C++ Programming, and a functional programming language such as LISP or Prolog, and a Language Survey class, and finally a modern application programming language such as C# or Java.
19 gennaio

Basic Survey Design in Software Testing

Creating, administering, and interpreting a survey is much, much trickier than you might expect. (In fact I completely mangled an argument in an earlier version of this blog entry.) Let me describe some pitfalls when you want to use the simplest type of survey that has a multiple choice format. Imagine you want to ask end users how the user interface of a new software application compares with the user interface of a current system. So, you create a survey which has instructions for respondents to select the statement which best describes the extent to which they agree or disagree to each statement from a set of 100 statements. For example, one of your statements might be, "The search feature of the user interface of the new system is easier to use than the search feature of the current system." And you give survey respondents the four choices, "Strongly Disagree", "Disagree", "Agree", "Strongly Agree" to each statement. Well, let me suggest that you may have already committed several mistakes in survey design. This is an example of a Likert scale design. The first issue is that with a Likert design you should as a rule of thumb, except in rare cases, have five responses plus an explicit n/a response. The five responses are generally some closely related form of, "Strongly Disagree", "Disagree", Neutral", "Agree", "Strongly Agree", and then slightly physically to the right of these first five responses, you should have a sixth "Not Applicable" response. You should give survey respondents a neutral option because otherwise you are forcing a positive or negative opinion when they may be neutral on a statement. And you should give respondents an explicit not-applicable choice rather than assuming or guessing that no response at all means not applicable in some way (as opposed to an invalid response because the respondent just forgot to answer). A second, almost certain mistake is that your survey has too many (100) statements. Survey respondents are going to get bored and quickly launch into an auto-complete mode just to finish your survey.

These are just a few of the dozens of issues with survey design. Analyzing the results of surveys is also very difficult. The moral of the story is that survey design is a very tricky task and you cannot simply use common sense. When I was a university professor, I taught an entire semester class on survey design; this is probably the bare minimum knowledge you need to create and interpret surveys in a software testing environment.

12 gennaio

The Poisson Distribution in Software Testing

One of the problems in the field of software testing is that there is no clearly established core body of knowledge. Take the Poisson distribution for example. Should all software testers have a basic understanding of this distribution? I'm not sure, but I know that a rudimentary knowledge of the Poisson Distribution certainly can't hurt a software tester. Very few software testers I know have even the most basic knowledge of the Poisson Distribution and the closely related Exponential Distribution. The Poisson Distribution may appear in a system when there are items arriving over time. For example, consider user connections arriving at a Web site, or process requests arriving at a CPU. Here's a quick example. Suppose you know from historical data that a new user connection to a Web site arrives on average once every 15 minutes. And suppose you arbitrarily establish your standard unit of time as one hour. Here, the average number of arrivals is 4.0 per hour. This average number of arrivals per standard unit of time is usually given the Greek letter lambda. The probability of a specified number of arrivals, k, in the standard unit of time, is given by the equation lambda raised to the kth power times the constant e raised to the minus lambda power, all divided by k factorial. This equation sounds real ugly but is not really as bad as it sounds. I punched this example into Excel, computed the probabilities of 0 through 10 arrivals in an hour, and got the image shown below. So what's the point? The Poisson Distribution is an example of a topic that is hard to categorize in terms of whether it should be considered required knowledge for a certain skill level of software tester. Knowing about the Poisson Distribution would certainly help a tester better understand load and stress testing. Additionally, the quantitative techniques associated with understanding the Poisson Distribution can be useful in many other software testing scenarios. A few months ago I wrote an article for MSDN Magazine where I described three probability related techniques and algorithms I feel are essential knowledge for all software testers: generatin pseudo-random numbers, analyzing a pattern for randomness, shuffling a list of items, and generating numbers from a Normal/Gaussian distribution. See http://msdn.microsoft.com/msdnmag/issues/06/09/TestRun/. But this particular "knowledge of the Poisson Distribution for testers" scenario is analogous to education in general: some topics are directly necessary for a particular goal/major/degree, some are peripherally useful, and some may not be useful at all.ThePoissonDistribution
06 gennaio

Testing WCF Systems

Last week I developed a mini-class on an introduction to Windows Communication Foundation (WCF) for engineers at Microsoft. One of the hardest parts about WCF is not so much the technology, but rather understanding exactly what WCF is. One way to think of WCF is as a generalization of Web Services. Let me explain. Suppose you have a normal Web Service which contains Web Methods which expose some proprietary data in a SQL database (think book information for example). In a normal Web Service scenario, a client program (usually Internet Explorer) makes a request for book information using SOAP over HTTP to a Web Server which in turn queries the Web Service which is usually being hosted by IIS. The Web Service fetches the requested data and returns the data to the client using SOAP over HTTP. Well in some sense WCF takes this model and gives you options at every step of the way. The basic idea is the same: a client requests data from a server using SOAP and the data is returned using SOAP. But the WCF requesting client can be IE or any kind of program. WCF uses SOAP but the SOAP message can ride on HTTP or any kind of protocol. The WCF server can be hosted in IIS or any kind of service. Additionally, compared to Web Services, WCF gives you additional features such as transactions and enhanced reliability. The screenshot below shows a WCF demo from the training class I created.
 
MathServiceAndClient
 
So how do you test WCF systems? One the one hand you are still dealing with a request-response type of scenario so testing WCF systems is similar to testing Web Services. But because there are so many variables in a WCF system, your testing effort will be significantly more complex.