# Grouped Clock Gated Flip-Flop Array for Low Power Applications

# A.Boobalan<sup>1</sup>, V.Meenakshi<sup>2</sup>

<sup>1</sup>PG Scholar, <sup>2</sup>Assistant Professor

<sup>1,2</sup>Department of Electronics and Communication Engineering
Sona College of Technology, Salem, India.
flowerboobalan@gmail.com<sup>1</sup>, meena.vijay27@gmail.com<sup>2</sup>

Abstract: In real time processors, main part of power dissipation occurs due to dynamic power consumption. Clock gating is used to avoid unwanted switching activity but it causes area and power overheads due to extra logic gates. To avoid overheads, it needs a clock grouping technique to group a several FFs driving by a same clock signal but it will not give a complete solution. Clock gated multi bit flip flop will give efficient result in terms of area and power. In multi bit flip flop, clock generation logic for slave nodes will be same; hence it will reduce the number of logic gate required for flip flop. Clock grouping will be done based on the position of bits and each group has a single multi bit flip flop. A common data driven clock gating logic is added to each group to reduce the power consumption.

Keywords: Clock gating, Clock network, Multi-bit flip-flops, Power reduction

#### I. INTRODUCTION

Due to the switching of active devices, leakage of inactive devices, the power consumption is the major issue in the VLSI design [1,2]. The major part of power dissipation occurs due to dynamic power consumption. The clocking is the most dominating power consuming elements in modern integrated circuits. In recent VLSI designs the dense integration and higher operational frequencies are enabled by increasing technologies and also increases the power dissipation in the chip [2,3]. Reduction of power consumption is more important in recent VLSI design. Various methods are used for reducing the dynamic power consumption in sequential circuits. In activity sensitive clock tree construction [7,8] the power consumption of clock network is reduced by modules, clock edges and control signals. By combining two modules or internal nodes with similar activity pattern is to built a clock tree [8] so that the number of activity periods of the clock is reduced, then the power consumption of the clock tree is reduced. The clock skew is obtained while minimizing the power consumption and leads to the area and power overheads in the clock tree construction [7,8].

In activity driven clock gating [2,9] the power consumption of synchronous digital system is reduced by minimizing the number of power is consumed by the clock signal. Activity driven clock tree is constructed [9] where some section of clock tree is disabled by gating the signals. For gating the clock signals, additional control signals and gates are needed, there exists a tradeoff between amount of clock tree gating and whole amount of power consumption of clock tree and it lower the number of clock gates by switching activity of clocked modules [7,9]. While constructing the activity driven clock tree, some problems occurred like construction of clock tree problem and clock gate insertion problem. In the activity driven clock gating, clock is also not straight while minimizing the power consumption [9]. Clock gating technique [1,2,6] is one of the most important and it is widely used for reducing dynamic power consumption and minimizing of region. Clock gating is widely used technique for saving the clock power. The high switching activity is obtained in clock net which outcomes larger power dissipation in the adders [2]. The clock net is produces a more power dissipation and it is avoided by removing clock in part of the device is known as clock gating. Clock gating technique is generally used to reduce power consumption by avoiding unwanted switching activity [1]. Clock gating is prescribed at all levels like system architecture, logic design, gates and block design. The OCV [6], they implemented a clock gating in on-chip variations (OCV). The clock gating is carefully designed for successful timing closure under the influence of OCV which no longer guarantee the perfect result on clock. To implement an OCV on timing end the multi-level gated clock structure should be considered. Gated-clock design [7] is one of the approaches to reduce the dynamic power consumption. The set of strategies termed DPM [7] is used to reduce power consumption in a digital system. This strategy allows us disabling the logic circuits that functional operations are not performed during specified time slot. The FF clock will be disabled with an approach called gated-clock [2,7]. The gated clock design approach is depending on the technological parameters of adopted gates and its offer a significant power reduction. The power reduction of 10% is achieved with equivalent error of 3% and it is presented in [7]. By decreasing the number of clock gaters, we can achieve an extra power reduction. Adaptive clock gating [2] is also another technique to achieve a more power reduction. In this, the output of XOR gate is clock enable signal (clk\_en) which the present data input that will appear at the current output and the output of device. If current output and current input have different number therefore the clk\_en is high and this is an active period. In slumber period, the clk\_en is low then the current output and current input have similar values and the clock is not provided so it is gated.

Data driven clock gating [1] is also a clock gating methodology and it is proposed in the [1]. In many synchronous circuits, data driven clock gating is used for reducing the dynamic power consumption. The clock signal is disabled when the FF is not subject to change in beside clock cycle [1], [2]. Data driven clock gating is causes area and power overheads that must be considered. To avoid area and power overheads, it needs to group several FFs driving by a same clock signal. The same clock signal is generated by ORing the enabling signal of each FFs and it produces a joint enabling signal [1]. The data driven is based on the toggling activity of constituent FFs. Multi-bit flip-flop (MBFF) [1], [4] is another grouping of FFs method and the dynamic power consumption is reduced. MBFF is used to combine two or more FFs in a single cell such that the inverters driving a clock signal are shared among all FFs in a group [3], [4]. MBFF grouping is mainly depending on the physical position proximity of individual FFs. The benefits of multi-bit flip-flops are lower power due to clock buffers and fewer clocks immerse; the clock network can have a simpler topology, FFs consume less dynamic power and area, easier skew control. Multi-bit flip-flops [3], [4] is the best methodology in saving both area and power consumption. At the postplacement [4], we address the MBFF with harmful of power optimization. In addition, MBFF performs well even if the clock cannot be turned off and it further reduces the dynamic power consumption and also the total amount of inverters driving the clock pulse [3], [4]. The number of MBFFs [4] is applied at the post-placement to achieve a more clock power saving while considering the time slack constraints, placement density and simultaneously interconnecting wire length is reduced. The known technique is data driven gating for saving clock power and power consumption. It is stopping the clock pulse for some flip-flops when it is not required [1], [2]. The saving clock gating is mainly depends on logic functions. Nowadays multi-bit flip-flop is more appropriate for saving both device area and power consumption. We are all having the question of which flip-flop is better to place in grouping to increasing the power reduction. D flip-flop is good choice for implementation compare various types of flip-flops because d flip-flop is having an excitation and truth tables are same.

The combined multi-bit flip-flop and clock gating reaches more power reduction and saving of clock power and area and it is proposed in this paper. Clock gated multi bit flip flop will give efficient result in terms of area and power. In multi bit flip flop, clock generation logic for slave nodes will be same; hence it will reduce the number of logic gate required for flip flop. The data driven clock gating logic is assigned to each group to reduce power consumption. Clock grouping will be done based on the position of bits and each group has a single multi bit flip flop. In this paper, we briefly discuss about combined data driven clock gated multi-bit flip-flop and also clock grouping technique. Clock grouping method is overcomes an issues in data driven clock gating. The implementation of data driven clock gated MBFF using MAC unit is done and its experimental results obtained for 2FF, 4FF, 8FF flip-flop grouping. The comparisons of MAC function with clock gating and MAC function with combined data driven clock gated multi-bit flip-flop is also performed.

#### II. CLOCK GATING

Clock enabling signals are designed during the system and clock design phases, where the inter-dependencies of various are well understood. It is very difficult to assign such clock signal within control logic and gate level [1], since the inter-dependencies among the various flip-flops depend on synthesized logic. The big gap between block disabling is happened that is driven from the HDL, and what can be achieved with data knowledge regarding flip-flop movement and how they are correlated with each other. The clock net causes more power and it is controlled by the clock gating technique [1], [2]. The major part of power dissipation is happened in synchronous digital circuits is due to the clock net. No longer has unwanted switching on the segment of gating become smaller by removing the clock in clock gating. Clock gating [1], [5] is one of the vital techniques and it removes the clock network power. The addition of clock gating cell in system is removes the more dynamic power and area.



Figure 1: Enabling of the Signal

Figure 2: Data Driven Clock Gating

Latch based clock gating; Flip-flop based clock gating and gate based clock gating cell are three types to create a clock gating cell. RTL clock gating [1], [2] is one of the most common techniques and it is used for improving efficiency and optimization. In order to optimize power, the gated clock is simplest methodology and it is applied at hedge (gate) level, RTL and system level. The figure 1 shows how the clock can be disabled in the next clock cycle when the FF is not required. A XOR gate is estimate the present data input with flip-flops current output that will present at the output of next cycle [2]. The output of XOR gate signal clk\_en is shown whether clock power will be required in beside cycle. The clock driver is then replaced by the 2-way AND gate is known as clock gater.

The data driven clock gating presents an approach to maximize signal disabling at the gate level, where the flip-flop is disabled when the flip-flop is not subject to the change in beside clock cycle. The data driven clock gating needs an extra logic and interconnects to generate the clock enabling information where the area and power overhead is happened [1], [2]. In this case, several flip-flops are grouped together and shared a common clock signal to reduce an overhead. The data driven clock gating [1] is used to disable the flip-flops when it is not in use by the usage of clock signals. The unused clock signal is removed by collection of flip-flop in the data driven clock gating. For next cycle, the clock can be disabled by ORing the present data input with output that will appear at its output in the next cycle [1], [2]. The outcomes of k XOR gates are ORed together to create a joint gating signal for k flip-flops, which is applied to latch to avoid glitches. The combined latch with AND gate is commonly called as integrated clock gate (ICG). A single integrated clock gate is amortized over k flip-flops. There is a clear tradeoff between hardware overhead and the number of disabled clock pulses. The increasing of k is reduced a hardware problem but so does probability of disabling is obtained by ORing the k activate signal [1]. The figure 2 shows the data driven gating with latch based AND gates. Let the average activity factor of FF can be denoted by p (0<1). The power saving would be maximum due to the assumption of uniform physical clock tree structure and independent FF toggling during worst case condition. There by number k of combined gated FFs for maximum power saving will be described in the solution as

$$(1-p)^k \ln (1-p) (c_{FF} + c_W) + c_{latch} / k^2 = 0.....(1)$$

Where  $c_{FF}$  is FFs clock input capacitance,  $c_W$  is unit-size wire capacitance, and  $c_{latch}$  is the latch capacitance including wire capacitance of its clock input. How the optimal k is depends on p is shown in following table.

Table I. Toggling Probability Dependency of Optimal FF Group Size

| P | 0.01 | 0.02 | 0.05 | 0.1 |
|---|------|------|------|-----|
| K | 8    | 6    | 4    | 3   |

The grouped flip-flop clock enabling signals should be highly correlated. The consumed total dynamic power clock tree is reduced more than 10% by data driven clock gating [1]. Reference [5] reported a 20% power saving.

#### III. MULTI-BIT FLIP-FLOP

Multi-bit flip-flop is the effective power-saving methodologies by merging single-bit flip-flop in the design [3]. The multi-bit flip-flop is more efficient and it decreases a total flip-flop area and dynamic power effectively. The inverter-based clock buffer is used to generate clocks. The available flip-flops replaced with multi-bit flip-flops to reduce the clock power in the multi-bit flip-flop technique [3], [4]. The no of clock buffers have to reduce to achieve this power reduction. The concept of sharing of the clock buffers by several flip-flops can be used to reduce the number of clock buffers.



Figure 3: Minimum Sized Inverter of Different Technology

As CMOS methodology progressing, the inverter-based clock buffers driving capability improves significantly [4]. The number of minimum-sized inverters is used to evaluate the driving capability of clock buffer that it can drive on a given falling time or rising time [3]. The figure 3 shows the different processes of maximum number of minimum-sized inverters that can be driven by a clock buffer. The several flip-flops shared a common clock signal to avoid wastage of clock power. After this replacement, the position of flip-flops is changed and wire length of nets connecting pins to a flip-flop is also changed. To avoid the routing congestion and timing violation [3], [4], it is essential to consider placement density constraint and timing constraint. Multi-bit flip-flop technique is used to reduce the clock power. The merging of flip-flops based on certain timing constraints is useful to achieve the power reduction. By sharing the inverters in flip-flops [3], [4], we can

eliminate the number of inverters in the multi-bit flip-flop method. MBFF performs well even if the clock cannot be turned off and it further reduces the dynamic power consumption and also the total amount of inverters driving the clock pulse [3], [4]. Two 1-bit flip-flops are combined into one 2-bit flip-flop is performed by multi-bit flip-flop methodology. Each one bit is having an inverter, master-latch and slave-latch. We can avoid the duplicate inverter and total clock dynamic power consumption by merging single-bit flip-flops into multi-bit flip-flop [3]. The multi-bit flip-flop is easily implemented in ASIC function and we can get the benefits of removed clock skew in sequential gates, smaller area and delay due to shared transistors, and lower power consumption by the clock in sequential banked components [3], [4]. The figure 4 shows the MBFF concept.



Figure 4: Merging Of Two 1-Bit Flip-Flops into One 2-Bit MBFF

#### IV. COMBINED DATA DRIVEN CLOCK GATED MULTI-BIT FLIP-FLOP

The dynamic power consumption is reduced by multi-bit flip-flop cells. By the presence of shared inverter the clock power is reduced inside the flip-flop. The combined data driven clock gated multi-bit flip-flop gives a more power reduction and effectively reduces the area and skew. In multi bit flip flop, clock generation logic for slave nodes will be same; hence it will reduce the number of logic gate required for flip flop. Clock grouping will be done based on the position of bits and each group has a single multi bit flip flop. A common data driven clock gating logic is added to each group to reduce the power consumption.

A data driven clock gated multi-bit flip-flop is proposed and it is shown in figure 5. The multi-bit flip-flop has the capability to handle two data bits independently and single clock gating logic. The area requires for clock gate will be reduced. In order to process the 2-bit data using clock gated multi-bit flip-flop, it requires two EXOR gates to detect whether there is need for pulse or to retain the old data. Latch followed by an AND gate will act as gated clock generation system.



Figure 5: Combined Data Driven Clock Gated Multi-Bit Flip-Flop

The combined data driven clock gated multi-bit flip-flop is obtained by combining the flip-flops with clock gating. In clock gating [1], [5], it removes a more power consumption but it will not give more efficient. The proposed method achieves more power reduction in this paper. The merged flip-flops is having a number of data input and number of data output pins, clock signal and reset pin. It gives the benefits of lower power consumption than single bit flip-flop. The figure 6 shows the example of two bit flip-flop cell.



| clk | D1 | Q1 | D2 | Q2 |
|-----|----|----|----|----|
|     | L  | L  | L  | L  |
|     | L  | L  | Н  | Н  |
|     | Н  | Н  | L  | L  |
|     | Н  | Н  | Н  | Н  |
|     | X  | D1 | X  | D2 |

Figure 6: Dual-Bit Flip-Flop Cell.

Table II. Truth Table of Dual-Bit Flip-Flop Cell

The table II shows the truth table representing dual-bit flip-flop cell. We could find that when CLK is high, the value of Q2 will pass to D2, and the value of Q1 will pass to D1 or Q1 and Q2 will keep exact value. The experiments described in section VI shows the more power reduction.

#### V. IMPLEMENTATION OF DATA DRIVEN CLOCK GATED MBFF USING MAC UNIT

In the following, the implementation of data driven clock gated MBFF using multiply and accumulate (MAC) is obtained. The products of two numbers and adds that number to an accumulator is computed and it called as multiply-accumulate (MAC). The hardware unit that performs the function is described as a multiplier-accumulator, the function itself is also known as MAC operation. The MAC operation modify an accumulator unit a: a = a+(b\*c). The block diagram of MAC unit is as follows



Figure 7: MAC Unit Operation

In the above MAC unit which is having a total of 32 flip-flops and each register is having 16 flip-flops. In the registers, the flip-flops are merged in the MBFF to reduce the dynamic power consumption and area and clock pulse. The clock grouping is based on the position of bits and 2FF, 4FF; 8FF grouping is obtained by combining the flip-flops in MBFF. We can merge only two flip-flops in the single multi-bit flip-flop.

## VI. EXPERIMENTAL RESULTS

The multiply-accumulate (MAC) is designed using data driven clock gated MBFF and the simulation result is done for data driven clock gated MBFF and clock gating using MAC. The MAC function having a 16-bit input and this input is divided into two separate 8-bit input, in\_1 [7:0], in\_2 [7:0]. The input is applied to the multiplier and the operation of multiplier is

products of two inputs and output is obtained at outcome of the multiplier is 16-bit output multi\_ [15:0]. The output of the multiplier is applied to the register ar\_out [15:0] and it temporarily stores a bits. The adder is adding the output of the two registers and it finally produces the output, out [15:0]. The output waveform of the MAC function is shown in figure 8.

| Current Simulation<br>Time: 1000 ns |       | 0        | 200 | 400<br> |    | 600 |    | 800 | 9   | 1000 |
|-------------------------------------|-------|----------|-----|---------|----|-----|----|-----|-----|------|
| □ 😽 in_1[7:0]                       | 8'h10 | 0        | 2   | X       | 13 | X   | 4  | X   | 16  |      |
| ■ <b>5</b> 4 in_2[7:0]              | 8'h10 | 0 X 8    | 32  | X       | 2  | X   | 32 | X   | 16  |      |
| ■ 😽 multi_ou                        | 1     | 0        | X   | 64      | Х  | 26  | X  | 128 | X : | 256  |
| ■ 84 ain[15:0]                      | 1     |          | 0   | Х       | 64 | X   | 26 | Х   | 128 |      |
| ■ 84 aout[15:0]                     | 1     | 16°h)000 | x X | 0       | Х  | 64  | Х  | 90  | Χ : | 2 18 |
| ■                                   | 1     |          | 0   |         |    | X   | 64 | Х   | 90  |      |
| ■ 😽 out[15:0]                       | 1     | 16°h)000 | x X |         | 0  |     | Х  | 64  | Х   | 90   |
| <b>ò</b> ∏ set                      | 0     |          |     |         |    |     |    |     |     |      |
| <b>ઢ</b> ∬ dk                       | 1     |          |     |         |    |     |    |     |     |      |
| <b>∂</b> ∏ reset                    | 0     |          |     |         |    |     |    |     |     |      |
|                                     |       |          |     |         |    |     |    |     |     |      |

Figure 8: Output Waveform of MAC Function

The combined data driven clock gated MBFF and clock gating is designed using MAC function and also power comparison is shown in section VI. In clock gating, we can group all the flip-flops to reduce the dynamic power consumption and clock grouping is based on the toggling of flip-flops. In the data driven clock gated multi-bit flip-flop, we can merge the flip-flops in MBFF and data driven clock gating is added to each group to reduce a more power. The clock grouping is achieved based on the positions of bits. In the design of MAC unit, the output waveform is same for data driven clock gated MBFF and data driven gating but the power variation is different because of flip-flop grouping. In the output waveform of MAC function with data driven clock gated MBFF; we are achieving a different power variation in the concept of combined data driven clock gated MBFF method for 2FF, 4FF, 8FF group. We can merge all the flip-flops such as 2FF, 4FF, 8FF, 16FF such that we can reduce the power consumption compared to clock gating. The data driven clock gated MBFF is effective method to reduce the clock power consumption. In a single multi-bit flip-flop, we can combine two flip-flops in a multi-bit flip-flop concept. It is a multi-bit flip-flop of order 2 and it can store 2 bit we can set and reset it independently by set and reset pin which is also of 2 bit. The figure 9 shows the power comparisons of data driven clock gated multi-bit flip-flop and clock gating.

| GRO                      | AT ID                  | MAC FUNCTION |                                    |  |  |
|--------------------------|------------------------|--------------|------------------------------------|--|--|
| GRC                      | JOP                    | POWER(mW)    |                                    |  |  |
| GROUPING USING<br>GATING | GROUPING USING<br>MBFF | CLOCK GATING | DATA DRIVEN<br>CLOCK GATED<br>MBFF |  |  |
| 2FF                      | 1 MBFF                 | 1285         | 750                                |  |  |
| 4FF                      | 2 MBFF                 | 2595         | 1427                               |  |  |
| 8FF                      | 4 MBFF                 | 3385         | 2024                               |  |  |

Figure 9: Power Comparisons of Data Driven Clock Gated MBFF and Clock Gating

## VII. CONCLUSION

Multi-Bit Flip-flop in combination with data driven clock gating is an effective and efficient implementation methodology to reduce the power consumption by merging single-bit flip-flop. The combined data driven clock gated multi-bit flip-flop gives an efficient solution to reduce the power consumption in flip-flop by using data driven clock gating technique on multi bit flip-flop which has the capability to store 2 bit data. The complexity of clock grouping is also reduced by half because of using MBFF. Even under timing and placement density constraints, clock power saving still can be substantial at the post placement stage using multi-bit flip-flops. The experimental result of MAC function implemented using clock gated multi bit flip-flop shows 41.6% power improvement over conventional clock gating technique. Experimental results indicate that multi-bit flip-flop is very effective and efficient method in lower-power designs.

#### REFERENCES

[1] Shmuel Wimer, *Member, IEEE*, and Israel Koren, *Fellow, IEEE* "Design Flow for Flip-Flop Grouping in Data-Driven Clock Gating" IEEE *Transactions on very large scale integration (vlsi) systems*, vol. 22, no. 4, April 2014.

- [2] S. Wimer and I. Koren, "The Optimal Fan-Out of Clock Network for Power Minimization By Adaptive Gating," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 20, no. 10, pp. 1772–1780, Oct. 2012.
- [3] I. H.-R. Jiang, C.-L. Chang, Y.-M. Yang, E. Y.-W. Tsai, and L. S.-F. Cheng, "INTEGRA: Fast Multi-Bit Flip-Flop Clustering for Clock Power Saving Based on Interval Graphs," in *Proc. Int. Symp. Phys. Design*, 2011, pp. 115–121.
- [4] Y.-T. Chang, C.-C. Hsu, M. P.-H. Lin, Y.-W. Tsai, and S.-F. Chen, "Post-Placement Power Optimization with Multi-Bit Flip-Flops," in *Proc.IEEE/ACM Int. Conf. Comput.*, *Aided Design*, Nov. 2010, pp. 218–223.
- [5] A. Bonanno, A. Bocca, A. Macii, E. Macii, and M. Poncino, "Data Driven Clock Gating for Digital Filters," in *Proc. 19th Int. Workshop*, 2010,pp. 96–105.
- [6] M. S. Hosny and W. Yuejian, "Low Power Clocking Strategies in Deep Submicron Technologies," in *Proc. IEEE Intll. Conf. Integr. Circuit Design Technol.*, Jun. 2008, pp. 143–146.
- [7] W. Aloisi and R. Mita, "Gated-Clock Design of Linear-Feedback Shift Registers," *IEEE Trans. Circuits Syst.*, *II, Brief Papers*, vol. 55, no. 5, pp. 546–550, Jun. 2008.
- [8] W. Shen, Y. Cai, X. Hong, and J. Hu, "Activity-Aware Registers Placement for Low Power Gated Clock Tree Construction," in *Proc. IEEE Comput.Soc. Ann. Symp. VLSI*, Mar. 2007, pp. 383–388.
- [9] C. Chunhong, K. Changjun, and S. Majid, "Activity-Sensitive Clock Tree Construction for Low Power," in *Proc. Int. Symp. Low Power Electron. Design*, 2002, pp. 279–282.
- [10] A. Farrahi, C. Chen, A. Srivastava, G. Tellez, and M. Sarrafzadeh, "Activity-Driven Clock Design," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 20, no. 6, pp. 705–714, June 2010.