Issue



Optimal staffing in semiconductor manufacturing: a ueing theory approach


04/01/1998







Optimal staffing in semiconductor manufacturing: A queuing theory approach

Amnon Raviv, TEFEN USA, Santa Clara, California

Equipment costs comprise more than 70% of the capital investment budget of a new semiconductor manufacturing or test and assembly facility. The advent of 300-mm manufacturing promises to push the overall cost, if not equipment share, even higher. Obviously, equipment use must be one of management`s primary concerns. Proper staffing of equipment operators and maintenance technicians is complex, yet it plays a major role in equipment utilization and throughput. The classic approach to manufacturing staffing is somewhat arbitrary and ad hoc. This paper presents a more scientific and proactive approach, based on the theory of constraints (TOC) and designed with the objective of improving fab throughput. The models described are based on established industrial engineering principles, and have been adopted by management teams in scores of fabs and test sites worldwide.

Productivity studies of semiconductor fabs and back-end facilities show a distinct relationship between head count and throughput (Fig. 1). Three parameters determine equipment throughput - availability (uptime), utilization (how much available time is being used), and run rate (the speed of the machine in units/hr). Since run rates are relatively stable, equipment throughput is actually a function of the first two. Improving uptime and utilization will increase throughput. Proper staffing levels are one way to increase these two parameters.

While the left-hand side of the curve in Fig. 1 is intuitive - more people means less idle time for the equipment - the right-hand side emphasizes the importance of proper staffing. The question is how to minimize production loss due to understaffing without falling into the classic pitfalls of overstaffing (reduced alertness, crowding, moderate work quality, etc.).

Click here to enlarge image

Figure 1. Head count vs. throughput: Law of diminishing returns.

Traditional approaches to staffing

Traditional staffing methods are usually based on "number of activities/operator-hr" (for operators) and direct labor estimations (for maintenance technicians). Though these methods seem easy and inexpensive, they have significant disadvantages:

 They do not reflect the stochastic nature of machine operation and maintenance ("machine interference").

 They offer no distinction between constrained and nonconstrained equipment.

 They provide no insight into nonlabor costs such as productivity associated with staffing decisions.

 They contain built-in inaccuracies due to differences in labor content between different equipment types.

 They do not allow for changes in layout, automation, or work methods.

 They hinder benchmarking because of differences in definitions of activities in different facilities.

A relatively small number of IC manufacturers use versions of time and motion studies (MTMs) to quantify direct labor requirements. The first three faults above are still relevant to MTM. Moreover, the inherent complexity of such databases requires maintenance by a MTM-certified person.

A typical time allocation analysis of a process machine in a fab staffed according to these traditional methods shows up to 20% production loss due to operator- or maintenance-related idle time. (Data is based on SEMATECH research and industry productivity studies.)

A "no operator" (or "no technician") situation results if either nobody is in the bay/work center when a machine is in need of service (load/unload, open chamber for clean, etc.), or the persons in the work center are busy servicing other machines. The first situation is a matter of coverage and can be dealt with in a relatively simple manner as described later. The second situation is a more complex issue known as "machine interference." In an industry where the value of production time is so large, this interference should be incorporated in any realistic staffing model.

Machine interference

Machine interference is defined as ? Pn, where Pn is the probability that n machines are waiting for service simultaneously (n = 0...N number of machines). Consider a situation in which the machines in a particular work center are the clients waiting to be serviced and the operators/technicians are the servers. For a given number of machines, different numbers of servers will result in different interference levels, which in turn cause potential production loss (Fig. 2). Where machine service times and frequencies are nondeterministic, modeling the nature and impact of interference on productivity becomes fairly complex.

Queuing theory model. We first used an analytical approach based on queuing theory. Analysis of service frequencies and duration in different machines showed a close resemblance to Poisson and exponential processes, respectively. Therefore, we used a multiserver, multiclient model with finite sources of customers (M/M/c) to model fab and test work centers. Two systems, one for operators and one for maintenance technicians, were constructed in a spreadsheet environment. Both systems were modular and easy to construct.

Click here to enlarge image

Figure 2. Machine interference vs. head count.

Click here to enlarge image

Figure 3. Inputs and outputs in the simulation model.

The first level in each system consists of the database, a separate spreadsheet for each equipment type, detailing all the labor associated with operation or maintenance. Data include duration, frequency, and other parameters for each operation. "Tool spreadsheets" are then linked to their respective work centers. Machines of one type can be located in more than one work center, and, naturally, one work center can contain more than one type of machine. The "work center" files are all linked to the "area" file, where the user can create various production scenarios and examine their effect on the required staffing level in each work center. For operators, an "area" is usually a module (e.g., etch in a fab), whereas a "work center" can be a bay or even a part of a bay. For maintenance technicians, a work center is usually a cluster or a group of cluster tools serviced by a particular group of people.

The simulation model. The simulation model analyzes the complex service patterns found in the maintenance world. For example, situations in which more than one person participates in preventative maintenance (PM) or where individual repair episodes are discontinuous are better modeled through simulation. The same "tool spreadsheets" are fed into a modular simulation model providing the same type of outputs as the analytical model: production loss (or uptime loss), the average workload in each cluster, and so forth (Fig. 3).

Click here to enlarge image

Selection criteria and application of TOC concepts

As output, both the analytical and the simulation models provide production loss and workload percentages for given staffing levels. The user then inputs the maximum allowable percentages for each criterion. The model will find, for each work center, the lowest staffing level that simultaneously satisfies both criteria. Naturally, the higher the number of people assigned to a work center, the lower the production loss and the workload percentages.

Overstaffing, in addition to driving up wage costs, might reduce productivity. Since both production loss and workload drop as head count increases, the optimal head count is always a trade-off between the two. According to the TOC, the overall throughput of a production unit is determined by the throughput of the machine with the slowest run rate. The bottleneck machine can be easily identified in a stable flow-line production unit. An efficient unit will try to maximize the production time of its bottleneck by minimizing the bottleneck`s downtime and idle time. In practice, workload is usually balanced in all work centers (typical ranges are 40-70% for maintenance technicians and 70-90% for operators). However, the TOC suggests requiring lower interference-related production loss levels in the bottleneck work center and allowing higher ones in the rest of the line. The model allows us to differentiate workload and interference in different work centers.

Semiconductor facilities have unstable product flows, and usually have, by design, a well-balanced capacity across the different machines. This might create multiple potential bottlenecks. The fab would benefit from achieving low levels of production loss in all of them. The maximum allowable interference-related production loss usually ranges from 0-5% for bottleneck equipment and 5-10% for the rest.

The table compares the performance in one area of a specific fab before and after adjusting the head count. Both operator and technician models were applied in this area. As the table shows, the models achieved a significant reduction in production loss.

The analytical vs. the simulation approach

Both the analytical and the simulation approaches have been used to model head-count requirements. When choosing between the two approaches, one should consider the following:

 While data collection efforts are similar (both models use the same "tool spreadsheets"), construction and upkeep of simulation models are definitely more complex.

 Spreadsheet applications are usually more accessible to most users.

 Simulation is capable of dealing with distributions other than just Poisson and exponential, as well as with combinations of simple distributions or with complex work rules.

 Simulation is a dynamic tool, and can provide better understanding of the behavior over time.

 A maintenance simulation model can also be used as a building block for dynamic capacity, work-in-progress flow, scheduling models, etc.

 Analytical models are more smoothly assimilated within the organization.

 The analytical model can be used not only by decision makers and long-term planners, but also as a daily management tool for production managers and supervisors/leaders.

 Simulation models are used primarily for maintenance labor modeling, where PM and repairs are relatively complex and have significant impact on capacity and productivity.

Conclusion

Adequate staffing of operators and maintenance crews in both bottleneck and nonbottleneck areas is crucial to overall productivity. Advanced queuing theory and simulation models are highly cost-effective because they reduce human-related inefficiencies in the operation and maintenance of fab/test equipment. Industrial engineers have successfully developed and implemented two user-friendly packages in numerous semiconductor fabs and test facilities.

AMNON RAVIV received his BS degree in industrial engineering from Tel Aviv University. He has worked as an industrial engineer, project manager, and site manager for TEFEN Ltd., in both Israel and the US. Raviv is VP of West Coast operations for TEFEN USA, and is based in Santa Clara, CA. TEFEN USA, 1065 E. Hillsdale Blvd., Suite 400, Foster City, CA 94404; ph 800/983-3369, fax 650/577-9166.