Task Scheduling for Multiprocessor Systems with Autonomous ...

Report 1 Downloads 129 Views
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 26, 347-361 (2010)

Task Scheduling for Multiprocessor Systems with Autonomous Performance-Optimizing Control HSIU-JY HO AND WEI-MING LIN Department of Electrical and Computer Engineering The University of Texas at San Antonio San Antonio, TX 78249-0669 U.S.A.

In all non-blocking non-preemptive (NBNP) scheduling techniques for a multicomputer system for processor allocation, Largest-Job-First (LJF) technique proves to possess a unique characteristic in utilizing as many processors as possible compared to others such as First-Come-First-Serve (FCFS) and Smallest-Job-First (SJF). However, a jobbypass limit that is preset to preclude the starvation problem in an NBNP platform may lead to problems in all techniques. The scheduling becomes “mandatory blocking” whenever a job reaches this bypass limit and thus has to be scheduled for allocation in the next immediate turn. This deprives the scheduling process the flexibility benefit in its non- blocking nature. Such an adverse effect is especially pronounced in LJF compared to the normally used FCFS one. Thus, how to find a balance in real time between employing the LJF and the FCFS in different situations is the main focus of this paper. We first propose an automatic control process which allows automatic adjustment on the algorithm based on the observed performance. This process, unlike the well-known feedback-control process, adjusts the algorithm based on an unbiased approach in order to disengage the dependence of performance on the input. We then propose two different scheduling techniques that simply employ this control process to self-adjust the weights in between using the two different techniques in real time. Performance results observed from our simulation runs show a significant improvement over the plain LJF and FCFS. Keywords: scheduling, resource sharing, multiprocessor system, supercomputing, autonomous control

1. INTRODUCTION Fast and efficient processor allocation and job scheduling algorithms are essential components of a multi-user multicomputer operating system. A simple First-Come-FirstServe (FCFS) approach is the simplest scheduling policy which simply submits jobs for allocation according to their arriving order. FCFS policy has the disadvantage in “blocking” in nature, i.e. a request for an unavailable number of processors may block subsequent requests that are currently serviceable. Thus it has a potential leading to undesirable resource utilization and a long average job waiting delay. Several other scheduling techniques have been proposed involving applying simple heuristics in assigning priorities to jobs waiting in the queue for scheduling. These techniques are usually based on some characteristics of a job, such as the job size, the requested processing time, etc. Smallest-Job-First (SJF), Largest-Job-First (LJF), and Shortest-Job- First are among the well-known ones aiming at reducing the average waiting delay. In order to guarantee a degree of fairness, other criteria have to be incorporated into the system to avoid starvation situations so that each request will get serviced within a predetermined delay. HowReceived September 19, 2008; revised November 11, 2008; accepted December 11, 2008. Communicated by Tsan-sheng Hsu.

347

348

HSIU-JY HO AND WEI-MING LIN

ever, these techniques are still considered “blocking” techniques, since it is possible that the one currently scheduled to be allocated may not be allocatable and thus blocking other job(s) that are allocatable at the same time. Several scheduling strategies have been proposed to remedy this “blocking” problem. In these “non-blocking” techniques, a job selected by the job scheduler for allocation is put back to the job queue if it is not allocatable, and the next one in the queue is tested for allocatability. We also assume that the job scheduling process is a “non-preemptive” one, i.e. a job already allocated will not be preempted by any other job still in the queue. Jobs in the queue are usually scanned for allocatability in a certain order according to certain criteria. A “non-blocking FCFS” (sometimes referred to as “first-allocatable-first” or “first-fit”) simply repeatedly searches for the first allocatable job according to their arriving order. LJF and SJF algorithms have their corresponding nonblocking version, respectively, through a search process according to job size. A nonblocking LJF is sometimes referred to as the “best-fit” algorithm. Multiqueue [8] is another often used strategy, in which an incoming job is sent to one of several queues based on a set criterion. Chang [1] proposed a bypass-queue job scheduling strategy, a variation of the FCFS queue without the blocking problem. In [2, 3, 9-11], various backfilling approaches are used to improve poor utilization by assigning unutilized processors to jobs that are behind in the priority queue of waiting jobs, rather than keeping them idle so as to achieve a better space-sharing utilization. Backfilling is furthered combined with gangscheduling in [14] to improve space sharing scheduling strategies. Scheduling techniques using neural network approach have also been investigated [4, 13]. Other scheduling techniques designed for special systems or special constraints are presented in [5-7, 12]. None of the known techniques look into using a combination of basic scheduling algorithms to optimize performance in real time in an autonomous manner by taking advantages of their intrinsic merits suitable for different situations. It has been well established that non-blocking scheduling techniques usually lead to smaller latency than blocking ones. Thus, we will focus on only non-blocking scheduling techniques in this paper. Among all non-blocking non-preemptive techniques, the LJF technique proves to be one of the simplest and yet possesses some characteristic in allowing higher system utilization. However, allowing jobs to bypass others that arrive earlier may lead to a starvation problem, and such a problem usually is more prominent in nonblocking techniques than blocking ones. There have been many mechanisms developed to handle the starvation problem, including round-robin, bypass limit, etc. Due to the “non-preemptive scheduling” nature of this study, the mechanism of using a bypass limit is assumed in this paper where a job cannot be bypassed by more than a preset number of jobs arriving later. A job-bypass limit that is preset to preclude starvation problems in a non-blocking platform would lead to an undesirable effect in all techniques. The scheduling process becomes a simple “blocking FCFS” whenever a job reaches this bypass limit and thus has to be scheduled for allocation in the next immediate turn. The more such a “mandatory blocking” situation occurs, the more degradation on its performance follows. Such an adverse effect is especially pronounced in the non-blocking LJF compared to the non-blocking FCFS due to fact that more bypassing takes place in the LJF. Throughout the rest of this paper, “non-blocking” is assumed whenever FCFS or LJF is mentioned unless otherwise specified. In general, such an adverse effect from allowing scheduling to strictly follow LJF may easily offset its intrinsic benefit. Thus, how to find

TASK SCHEDULING FOR MULTIPROCESSOR SYSTEMS

349

a balance in real time between employing the LJF and the FCFS in different real-time situations while taking bypass values into consideration is the main focus of this paper. We first propose an automatic control process which allows automatic adjustment on certain system parameter settings based on observed performance changes. This process, unlike the well-known feedback-control process, adjusts the setting based on an unbiased approach in order to disengage the dependence of performance on the varying input. This is needed since the performance observed may be influenced by the then input rate. We then propose two different scheduling techniques that simply employ this control process to self-adjust some weight in between using the two different techniques in real time. Note that the proposed algorithms perform the scheduling in real time − they re-evaluate the situation every time a new job arrives or a job is allocated, and the overhead in this real-time re-evaluation process is very minimal leading to no delay to the actual job allocation process. Our simulation results consistently show significant improvement from our technique over LJF and FCFS.

2. SIMULATION SETUP The system in our simulation consists of 40,000 processors. Note that, as mentioned earlier, techniques developed in this paper can be easily applicable to most general resource sharing systems where the resource pool contains a number of identical resource units. Assumptions and parameters for our simulations are described as follows. Number of processors requested by an incoming task is generated such that it is either uniformly distributed over (1:5000) or normally distributed with a mean of μ and a standard deviation of σ. Two sets of task sequences are generated for testing: Task Set A: uniform distributed with σ ≈ 460; Task Set B: normally distributed with μ = 2500 and a varying σ (100 ≤ σ ≤ 600) so that behaviors on various size distributions can be compared. Service time of incoming tasks is normally distributed with a mean of 100 and a standard deviation of 25 time units. Inter-task arrival time (in time units) is exponentially distributed with an average inter-arrival time of 1/λ time units, i.e. the average arrival rate is equal to λ. 600,000 requests (tasks) are simulated in each simulation run to ensure equilibrium state is reached, if possible. This is needed to make sure whether the task inter-arrival time (1/λ) is too small for the task queue to maintain in an equilibrium state or a state of saturation has been reached.

3. LJF VERSUS FCFS As aforementioned, among all non-blocking scheduling techniques for processor allocation, LJF has proven to be very meritorious due to its policy in fitting the unused resource with the largest available requests, and thus tends to lead to a higher utilization. To ensure fairness in such a non-preemptive system, bypasses need to be curtailed to prevent a starvation problem from happening, a problem that may leave a small job never allocated. Usually a bypass limit (bpl) is set such that no more than this many jobs are allowed to bypass any one job. Once a job (or jobs) in the queue reaches such a limit, scheduling has to follow strictly blocking FCFS until all jobs’ bypass value (bpl) again fall below this limit. When such a “mandatory blocking” happens, all benefits from the

HSIU-JY HO AND WEI-MING LIN

350

scheduling “flexibility” in the non-blocking technique disappear, and thus longer waiting delay ensues. The more such a mandatory blocking occurs, the more adverse effect on the average waiting latency it leads to. With the bypass limit imposed, compared to FCFS, LJF has a higher tendency in leading to such a blocking scenario due to its nature in scheduling. The gain in performance from achieving a higher utilization by LJF may easily be offset by this adverse effect from blocking. The tighter (smaller) the bypass limit is imposed, the less flexible the LJF becomes. In order to tell how each of two techniques performs under different mandatory block urgency situations, a simulation is carried out to show, given the leading (maximal) bpv value in the queue, the contributing queue length (CQL) to overall average queue length (which is proportional to scheduling latency). That is, the contributing queue length is calculated using CQL(bpv) = q(bpv) * n/N where CQL(bpv) denotes the CQL value when the leading bypass value equals to bpv. q(bpv) is the average queue length when the leading bypass value equals to bpv, and n is the number of simulation time units that satisfies the situation while N is the total number of simulation time units. Note that the CQL value under a given bpv exactly accounts for the portion of overall average queue length that is due to the situation when the maximal bypass value equals to the given bpv. Fig. 1 shows the comparison results under four different input rates when the bypass limit is set to 50 using simulation task set A.

(a) (b) (c) Fig. 1. Comparison results of CQL(bpv) between LJF and FCFS when input rate equals to (a) 1/7.2 (b) 1/7.3 (c) 1/7.4.

This result clearly shows that when the maximal bypass value in the queue is low, LJF clearly outperforms FCFS contributing very minimally to latency. When the bypass value becomes close to the set bypass limit, LJF’s sudden increase in CQL contributes significantly to the overall latency. This tradeoff is consistent throughout different input rate situations as shown in the figure, except that the tradeoff appears earlier in maximal bypass value when the input rate becomes smaller.

TASK SCHEDULING FOR MULTIPROCESSOR SYSTEMS

351

4. REAL-TIME PERFORMANCE-OPTIMIZING CONTROL Our goal in this paper is to develop scheduling algorithms that can self-adapt to the constant changing input rate and varying task characteristics to optimize performance using combination of different techniques. In this section, we briefly describe a general methodology that is to be adopted for such a development. Our target problem possesses two critical characteristics very different from those suitable for typical control processes. For one, not having a reference value (e.g. the set speed in cruise control) for the control system to adjust according to renders all the known control techniques to be very limited in applying to our problem. Secondly, the input (or, more precisely, the “load” to the system) is constantly changing, thus any possible reference value dynamically produced may become out-dated very quickly. These input load characteristics, for example, those of generation rate, task sizes, etc., are known to be simply random processes in most reallife circumstances. Perfect examples include: incoming tasks in a computer system, incoming traffic for a local area network, incoming calls to a telephone system, and incoming traffic to a highway system, etc. Note that profiling using history information as some known techniques for specific applications have used is purposefully excluded in the proposed method since we believe that the database thus produced does not truly reflect the real-time nature (in terms of perturbation in load) of the target systems nor is it capable of handling any potential changes of the system characteristics. Expert systems, neural networks or other intelligent control techniques that rely on training with known patterns similarly cannot address these intrinsic problems. Similar to the simple feedback control process, our proposed algorithm would incur an automatic adjustment to some system setting(s) periodically based on performance observed in a set “window” of time. Since there is no optimal “goal” known beforehand to adjust according to, we propose a very simple automatic adjustment algorithm as a base algorithm that aims at approaching the optimal setting gradually and autonomously. Fig. 2 displays the algorithm proposed for this purpose. In the current window, performance Pt is observed and compared with the performance from the last window, Pt-1. If it leads

Fig. 2. An automatic performance-optimizing feedback control algorithm.

352

HSIU-JY HO AND WEI-MING LIN

to an improvement, the adjustment operator opt remains the same as the previous operator opt-1, otherwise, the operator is “reversed”. A simple example for such an operator is “increase” versus “decrease” as the two reversed operators. The exact control setting applied to the input at the end of this window, d t, is then derived from adjusting the previous value d t-1 by a margin Δd with the new operator. For a simple example, d t = d t-1 + Δd if opt is “increase”. There are four main components in this proposed control process: (1) Adjustment Parameter(s), (2) Performance Indicator(s), (3) Performance Sampling Window Period and (4) Performance Measurement as described in the following. The algorithm proposed so far seems straightforward; however, many pitfalls remain unsolved. When measuring performance in a window, one has to be careful about whether or not the performance measured is not wrongly influenced by the current input rate (load) and thus the comparison between two windows is biased. Note that, since the input rate (load) is essentially random, it is unreasonable to assume that a similar load exists in all windows. Consider a simple real-life example where two banks having different customer-serving capacities, μA and μB, respectively, and μB < μA. If the corresponding customer incoming (load) rates λA and λB satisfy the condition of λA < λB < μB < μA under the assumption that there has not been any load accumulation, then performance reading demonstrated by the throughput PA and PB would lead to PA < PB which implies a wrong indication of the capacity superiority between the two banks. Thus, judging superiority between two sampling windows would face the same challenge. In order to come up with a fair performance comparison between adjacent windows, a notion of “qualified sampling point” is first proposed. A point (time instance) is only considered “qualified” and then the performance at which point can be measured for comparison when the load at that point exceeds a certain predetermined threshold value. When comparing the two techniques (or, more precisely, two setting values) in adjacent windows with each having a capacity of μA and μB, respectively, the best threshold value would be the smaller of μA and μB. When the input load falls below the smaller of μA and μB, it is virtually impossible to tell which “technique” is the better one since both would produce the same throughput assuming no residual effect. Theoretically, only when the input load rises above the smaller of μA and μB is the performance measurement more likely to reveal the superiority of one over the other. In general, the higher such a “qualifying threshold value” is set, the more revealing is the performance measured on the actual capacity/capability of the techniques being observed. Listed below are several important factors that need to be addressed when trying to decide how a point is considered qualified. In our scheduling problem, since it is not practical to assume knowledge of the processing time for a job is known beforehand, one universal way to calibrate the input load is to base it on the number of processors requested generated in the predetermined time frame stretched backward starting from the current time. That is, one can assume that each job generated has a “duration of effect” on the system. As shown in an example in Fig. 3, each job generated (indicated as a darkly shaded box) has a uniform “duration of effect” (indicated as the accompanied lightly shaded box). With this, input load for each time point can be easily cumulated on-the-fly. As aforementioned, a threshold for load qualification still has to be set properly in order to lead to a reliable result. Let l(i) denote the “input load” measured at time instance i and lth be the threshold for the load to be considered “qualified”. For example, if lth is set to 12,

TASK SCHEDULING FOR MULTIPROCESSOR SYSTEMS

353

Fig. 3. Input load calibration.

then out of the five sampling points in this figure only tb and tc are considered “qualified”. A better alternative to this static load calibration is to have a dynamic “duration of effect” derived from calculating the average latency of the most recently finished jobs. In our algorithm, such a “duration of effect” is determined by using the average latency of the ten preceding jobs so as to have a close reflection of a job’s effect duration.

5. PROPOSED SCHEDULING ALGORITHMS Our proposed scheduling algorithms will employ the performance-optimizing control methodology proposed in section 4 to maximize the benefits from each of the composing algorithms. 5.1 Hybrid Scheduling Technique (HST) Our first proposed scheduling algorithm, the Hybrid Scheduling Technique (HST), will control a simple weight to use between FCFS and LJS techniques in an attempt to optimize their benefits in real time. In essence, this technique is geared toward reducing the number of mandatory blocking occurrences in LJF while improving on the relatively lower utilization in FCFS before mandatory blocking arises, without considering the factor from bypass value urgency directly. A new scheduling order of the jobs waiting in the queue is decided by the following formula: WO(x) = w ⋅ Oa(x) + (1 − w) ⋅ Os(x)

(1)

where WO(x) is the new weighted scheduling order for job x and w is the adjustment parameter to be autonomously adjusted in real time. Oa is the arriving order and Os denotes the size order, with each order value increases from 1 to n (n being the number of jobs in the queue). WO(x) is then used to determine the order of scheduling. As the control parameter d t shown in the algorithm in Fig. 2, the control adjustment is applied to w in real time. Obviously, when w is set to 0 the scheduling will follow LJF, while it becomes strictly FCFS when w is set to 1. The adjustment margin Δw (Δd in Fig. 2) is fixed at 0.1 in our process, with the operator opt being either “increase” or “decrease”. By employing the automatic control method, this algorithm aims at achieving a complete autonomous process in adjusting the use of the two algorithms in real time for constant performance

354

HSIU-JY HO AND WEI-MING LIN

Fig. 4. An example using HST with bpv = 50.

improvement. Fig. 4 gives an example showing a snapshot of six jobs in the queue with different sizes requested. The first two scheduling orders displayed are for FCFS and LJF, respectively, when w = 1 and w = 0. The last one gives a new scheduling order under our proposed HST technique when w is adjusted to become 0.3. Note that this algorithm does not take the bpv(x) into consideration directly, but instead hoping that the automatic adjusting process would be able to somewhat adjust according to it. 5.2 Bypass-Damping Hybrid Scheduling Technique (BDHST) Noting that the bypass values are not directly taken into consideration in the proposed HST technique when deciding on the new scheduling order, our second proposed technique, the Bypass-Damping Hybrid Scheduling Technique (BDHST), further prioritizes the scheduling order based on an additional factor, the mandatory blocking pressure. This pressure for job x can be quantified as bpl-bpv(x), the number of additional bypasses before job x demands a mandatory blocking. The rationale is to assign a higher priority to jobs with a small such number (i.e. higher pressure). The proposed scheduling technique again employs the optimal control methodology by controlling a weight factor w to balance between the mandatory blocking pressure, bpl-bpv(x), and job size preference, Os(x): WO(x) = w ⋅ (bpl-bpv(x)) + (1 − w) ⋅ Os(x) ⋅ C.

(2)

The additional constant factor, C = bpl/Q, is incorporated into the formula to ensure proper normalization between the two weighted orders, where Q indicates the current queue length. Note that this technique, compared to the HST algorithm, implicitly takes the arriving order into consideration since if bpl-bpv(x1) ≤ bpl-bpv(x2) then Oa(x1) < Oa(x2).

(3)

This comes from a simple observation that, with jobs in the queue a, b and c in the given arriving order, if c bypasses b, then it bypasses a as well. In addition, this algorithm does not reward jobs with earlier arriving order unless they are with critical mandatory blocking pressure. That is, if most of the jobs in the queue are with small bpv val-

TASK SCHEDULING FOR MULTIPROCESSOR SYSTEMS

355

ues, they would all receive relatively low priority weight from the bpl-bpv(x) factor, which is not true in the HST technique. Although the arriving order can still be used as an adjustment factor as adopted in the proposed HST algorithm pending on an effective control process, this BDHST algorithm looks for a better adjustment factor in bpl-bpv(x) to provide for a more effective control. This is based on a notion that arriving order does not necessarily contribute much to the scheduling decision-making criterion unless there are bypass values close to the set limit. Also note that, similar to the HST algorithm, this algorithm behaves exactly like LJF when w is set to 0, while it follows FCFS when w is set to 1 due to its implicit consideration of arriving order as shown in Eq. (3). The same example for the HST algorithm as shown in Fig. 4 is re-applied here with this new algorithm under the same situation, shown in Fig. 5.

Fig. 5. An example using BDHST with bpv = 50.

Note that, each of the two jobs in the front of the queue is now assigned with an earlier scheduling order, compared to that from the HST, due to their higher immediate blocking urgency, although a w = 0.3 actually favors size order more. In our simulation runs, performance of the proposed techniques is compared with the non-blocking FCFS and LJF techniques. We also include some simulations performed using a fixed weight value for the sake of reference.

6. SIMULATION RESULTS 6.1 Utilization versus Load Threshold We first carry out a simulation to verify our claim on the dependency of system performance (utilization in our case as described in section 4.4.2) on the input load (rate). Fig. 6 demonstrates such a relationship with simulation task set A and λ = 1/7.2. Four tests are given here: FCFS, LJF, and two proposed techniques, HST and BDHST. In each test run, utilization is measured only when the “load” is above the set threshold. From these results, we can see that the higher load threshold is set for measurement qualification the higher performance measured is observed. This solidifies our claim and the correctness of our load measurement. In addition, differences in utilization between the two

356

HSIU-JY HO AND WEI-MING LIN

(a) (b) Fig. 6. System utilization versus load threshold, with uniform distributed job sizes comparing (a) HST and (b) BDHST with FCFS and LJF with λ = 1/7.2

proposed techniques compared to the FCFS and LJF also clearly indicate the potential of the two proposed techniques in enhancing the performance. Theoretically, the highest potential of a technique reachable is given when the load threshold is set to infinity. From this argument, the FCFS and LJF both can only reach a maximum utilization of about 98.6%, while the two proposed techniques can reach up to over 99.0%. In our proposed techniques, the load threshold lth is set at 40,000. 6.2 Scheduling Performance Comparison Note that each of the FCFS and LJF is a special case with w = 1 and w = 0, respectively. To compare each of the two proposed techniques with the FCFS and LJF, we first run the control technique with a sequence of fixed w values to see where the optimal w value setting lies. The results show that, for each of the two control techniques, under large job size variation ((a) uniform distribution), a small fixed weight (w = 0.3) outperforms all others, while, when variation becomes smaller as in (b) and (c), the best weight value gradually increases to 0.5 and then 0.9, respectively. This observation verifies our claim in how different job size variance would call for different combination of the two techniques, and the smaller the variance is the less it requires the use of job size as scheduling decision-making criterion. Based on the fixed-w-setting simulation results observed, we then compare our proposed real-time control techniques using a dynamic weight value to the FCFS, the LJF, and the one fixed weight with the best result. Figs. 7 and 8 give the complete comparison. In the case when the job size variation is large as in (a), our real-time control techniques easily outperform the FCFS and LJF. Although not being able to match the best fixed weight value case in terms of the queue delay, our techniques can sustain the same traffic rate as the best fixed case. For small variation cases in (b) and (c), our proposed technique at least matches, if not outperforms, the much better one between the FCFS and LJF. In general, the BDHST is better than the HST as expected due to its direct consideration of mandatory blocking pressure.

TASK SCHEDULING FOR MULTIPROCESSOR SYSTEMS

357

(a) Task set A (σ ≈ 1460). (b) Task set B (σ = 600). (c) Task set B (σ = 100). Fig. 7. Performance comparison between dynamic weight control (HST) and the fixed weight under different job size distribution for the first technique.

(a) Task set A (σ ≈ 1460). (b) Task set B (σ = 600). (c)Task set B (σ = 100). Fig. 8. Performance comparison between dynamic weight control (BDHST) and the fixed weight under different job size distribution for the first technique.

6.3 Analysis of Performance Improvement As aforementioned earlier, the proposed control techniques aim at taking advantage of the merits of the FCFS and LJF while avoiding their corresponding deficiencies. We further compare the proposed techniques with FCFS and LJF in terms of the number of mandatory blocking encountered and average number of idle processors during the simulation. Figs. 9 and 10 show the results on each of the two factors, respectively using task set A. Both the HST and BDHST techniques trade off some of the benefit in the number of mandatory blocking from FCFS for additional gain in processor utilization. Both sacrifice in blocking by a small margin compared to FCFS but achieve a utilization close to the one posted by LJF. In HST, the tradeoff is more obvious by increasing the number of blocking by about 60,000 times from FCFS (still much less than that of LJF, and the reduction in the number of idle processors from FCFS is about 20% (about 10%-15% higher than that of LJF). In BDHST, the tradeoff in blocking is relatively small (about 20,000 to 40,000 times more than that of FCFS, but the gain the utilization is much more significant, a close-to-30% drop in number of idle processors almost matching the number

358

HSIU-JY HO AND WEI-MING LIN

(a) (b) Fig. 9. Comparison among scheduling techniques in terms of number of mandatory blocking.

(a) (b) Fig. 10. Comparison among scheduling techniques in terms of number of idle processors.

(a) (b) (c) Fig. 11. Comparison results of CQL(bpv) between the HST and the composing techniques when input rate equals to (a) 1/7.2 (b) 1/7.3 (c) 1/7.4

by LJF. This result clearly demonstrates that our proposed dynamic control techniques are capable of combining two simple algorithms while reaching a higher plateau where neither the two algorithms can attain, without any artificial interference. To compare the

TASK SCHEDULING FOR MULTIPROCESSOR SYSTEMS

359

(a) (b) (c) Fig. 12. Comparison results of CQL(bpv) between the BDHST and the composing techniques when input rate equals to (a) 1/7.2 (b) 1/7.3 (c) 1/7.4

Contributing Queue Length (CQL) under different leading bypass values, Fig. 11 displays the results from the HST compared to the two composing techniques. Fig. 12 shows the results from the BDHST. From these comparison results, we can conclude that both the proposed techniques maintain a more balanced CQL throughout the whole spectrum of bypass values. Compared to LJF, they tend to allow more queue length to build up when the bypass values are still low while resisting mandatory blocking from happening as often. Compared to FCFS, they still maintain some part of benefits from LJF in leading to low latency before blocking happens and do not deviate much from FCFS when bypass values are high.

7. CONCLUSION In this paper, we propose two non-blocking job scheduling techniques to combine the advantages of two well-known techniques under different situations using a novel real-time optimal control methodology. Significant improvement in latency and sustained traffic is observed from such an approach. There is obviously room for improvement in designing other scheduling techniques by adopting this control methodology simply by identifying other more effective control factors, which is the key advantage in this methodology. Furthermore, the intrinsic merit in this methodology is in its wide applicability to many resource-sharing cases, as suggested earlier in the paper. The authors have started working on two other resource sharing systems, the freeway access control and the router traffic control, with both showing very promising results.

REFERENCES 1. C. Y. Chang and P. Mohapatra, “An integrated processor management schema for the mesh-connected multicomputer systems,” in Proceedings of the International Conference on Parallel Processing, 1997, pp. 118-121. 2. S. H. Chiang and C. Fu, “Benefit of limited time sharing in the presence of very large parallel jobs,” in Proceedings of the 19th IEEE International and Distribution Processing Symposium, 2005, pp. 84b.

360

HSIU-JY HO AND WEI-MING LIN

3. D. Feitelson and A. M. Weil, “Utilization and predictability in scheduling the IBM SP/2 with backfilling,” in Proceedings of the 12th International Parallel Processing Symposium, 1998, pp. 542. 4. Y. M. Huang and R. M. Chen, “Scheduling multiprocessor job with resource and timing constraints using neural network,” in Proceedings of IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering, 1999, pp. 490-502. 5. J. P. Jones and B. Nitzberg, “Scheduling for parallel supercomputing: A historical perspective of achievable utilization,” in Proceedings of Job Scheduling Strategies for Parallel Processing, LNCS 1659, 1999, pp. 1-16. 6. H. D. Karatza, “Simulation study of multitasking in distributed server systems with variable workload,” Simulation Modeling Practice and Theory, Vol. 12, 2004, pp. 591-608. 7. V. Lo and J. Mache, “Job scheduling for prime time vs. non-prime time,” in Proceedings of IEEE International Conference on Cluster Computing, 2002, pp. 488-493. 8. D. Min and M. W. Mutka, “Efficient job scheduling in a mesh multicomputer without discrimination against large jobs,” in Proceedings of the 7th IEEE Symposium on Parallel and Distributed Processing, 1995, pp. 52-59. 9. A. W. Mualem and D. G. Feitelson, “Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with back-filling,” IEEE Transactions on Parallel and Distributed Systems, Vol. 12, 2001, pp. 529-243. 10. S. Srinivasan, R. Kettimuthu, V. Subramani, and P. Sadayappan, “Selective reservation strategies for backfill job scheduling,” in Proceedings of the 8th Workshop on Job Scheduling Strategies for Parallel Processing, 2002, pp. 55-71. 11. E. Shmueli and D. G. Feitelson, “Backfilling with lookahead to optimize the performance of parallel job scheduling,” Journal of Parallel and Distributed Computing, Vol. 65, 2005, pp. 1090-1107. 12. J. Subhlok, T. Gross, and T. Suzuoka, “Impacts of job mix on optimizations for space sharing schedulers,” in Proceedings of the ACM/IEEE Conference on Supercomputing, 1996, pp. 54.1-54.18. 13. X. Wang and T. Wu, “Solving multiprocessor job scheduling with resource and timing constraints using neural network,” IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering, Vol. 1, 2002, pp. 616-619. 14. Y. Zhang, H. Franke, J. E. Moreira, and A. Sivasubramaniam, “Improving parallel job scheduling by combining gang scheduling and backfilling techniques’,” in Proceedings of the International Parallel and Distributed Processing Symposium, 2000, pp. 133-142.

Hsiu-Jy Sandy Ho (侯秀枝) received her Master of Science and Ph.D. in Electrical Engineering from the University of Texas at San Antonio (UTSA) in 2004 and 2007, respectively. During her doctoral study, she was awarded the HEB Dissertation Fellowship in 2005. Dr. Ho has since become an engineer at the Harris Stratex Networks, Inc. Her research interests are in autonomous control and computer networks.

TASK SCHEDULING FOR MULTIPROCESSOR SYSTEMS

361

Wei-Ming Lin (林維明) received the B.S. degree in Electrical Engineering from National Taiwan University, Taipei, Taiwan, in 1982, the M.S. and Ph.D. degrees in Electrical Engineering from the University of Southern California, Los Angeles, in 1986 and 1991, respectively. He joined the University of Texas at San Antonio (UTSA) in 1993, and, since 2004, he has been a Professor of Electrical Engineering there, and also the Associate Dean for Graduate Studies in the College of Engineering since 2006. Dr. Lin has published more than 100 technical papers in international journals and conferences in the area of distributed and parallel computing, computer architecture, computer networks, autonomous control and internet security. He has served in program committee for many international conferences and also as the program chair for the International Conference on Computer Applications in Industry and Engineering 2004. He also received Best Paper Awards at three different international conferences. Dr. Lin has been awarded numerous grants from NSF, DOD, ONR, AFOSR, etc.