More Than a Device: Function Implementation in a Multi-Gate Junctionless FET Structure

: The miniaturization of the transistor sizes to keep up with Moore’s Law in Integrated Circuits (ICs) is rapidly approaching the physical limits. To push the horizons of Moore’s Law, among the various approaches available in the literature, single device-based computing shows promise by achieving more functionality in a smaller footprint. However, a single device-based computing approach either mainly embeds only the primitive logic hence inefficient in performance, or requires exotic devices like spin logic devices, and memristor which involve non-conventional costly manufacturing steps. Previously, we introduced the concept of embedding logic in a single device based on Crosstalk Computing, where deterministic signal interference between nano-metal lines is leveraged for logic computation. This paper elaborates upon the methodology of realizing complex Boolean functions through TCAD-based modeling and simulations, quantifying results, and compares against existing approaches. Core to our approach is a multi-gate Junctionless FET-based device, methodical placement of the independent gates, manipulation of device parameters, and dimension. This paper shows the implementation of various complex logic functions along with the primitive gates in the proposed device. Our benchmark results show 8x density benefits and 8x less power consumption on average than CMOS-based implementation. For the case of delay, elementary and complex logic devices show comparable characteristics with 14 nm PTM counterparts. Suchrealization of complex functions in a stand-alone device is compatible with the existing fabrication process.


Introduction
The proliferation of Integrated Circuits (ICs) has played a vital role in global socioeconomic progress. With the slowdown of Moore's law [1][2][3], it is imperative to find solutions to continue this progress. Among various solutions available in the literature [4], embedding logic in a single device by manipulating device parameters provides one of the best possible alternative pathways [4][5][6]. Instead of having multiple devices for a Boolean logic unit, if a single device can exhibit the Boolean function itself, the ensemble will collapse down to the footprint of a single device. However, such approaches are mostly confined to embedding only primitive cells like NAND/NOR in a single device and show a small number of density benefits [7,8] or include exotic devices like Complimentary Resistive Switch [4] and Bipolar Memristors [5]. Hence, require costly non-conventional manufacturing processes [6].
We propose implementing Boolean complex logic in a stand-alone device similar to multi-gate Junctionless FET by utilizing a novel computing technique called Crosstalk Computing [8]. In Crosstalk Computing, metallic nano-lines acting as aggressors are organized in a compact manner such that whenever signal transitions take place in these lines, the sum of their crosstalk interference gets induced through virtual coupling capacitance in another metal nano-line called victim; the transitioning signals are inputs, and the net induced charge is the output. The coupling strength between the input and output metal lines and the net charge induced determines what logic is computed. We resemble this aggressor-victim scenario in our proposed multi-gate Junctionless device where independent gates act as aggressors and the silicon fin of the device act as a victim. By placing the independent gates within the device astutely, we can control the formation of accumulation or inversion in the device fin to get the saturation current at the output. Device geometry, placement of gates, and manipulation of the device parameters are the keys to achieve the desired logic function.
Previously, we introduced the concept of implementing logic functions in a single device by exploring Crosstalk Computing technology but was only limited to primitive gates [8]. This paper has detailed the methodology of implementing complex Boolean functions in a single device. We have implemented the device using Technology Computer-Aided Design (TCAD) SProcess [9]. For the characterization of the device and verification of the Boolean functionality, we have used TCAD SDevice [10]. Extending our previous work, the key contributions of this paper are as follows:  Details of fabric construct include material aspects, geometric parameter variation, dimension, usage, and We have implemented two primitive gates (AND, OR) and two complex Boolean functions (AB+BC+CA and B+AC). Our results show and verify the functionality of logic gates. Our comparison results also show that for the primitive gates, there is 6x density gain and, on average, 8x average power reduction. For complex functions, the average density benefit is 13x, the average power reduction is 10x, and the delay is also in good agreement concerning CMOS counterparts.
The rest of the paper is organized as follows: Section 2 discusses the approach for embedding the device's functionality. Section 3 illustrates the methodology evaluation; Sections 4 and 5 discusses the detailed result of implemented complex logic and compares results. Finally, Section 6 concludes the paper.

Approach for Embedding Functionality of the Device
We have followed a bottom-up modeling approach where a multi-gate Junctionless FET is used as the core functional unit. Control of independent gates, utilization of inversion/depletion mode, and customization of the device parameters are the keys to performing the logic operation in the single proposed device. The device is first implemented in the Sentaurus TCAD process and then characterized using the TCAD device simulation in our bottom-up modeling approach. From device simulation, ON current is achieved through the I-V characteristics of the device for identifying the logic state. For implementing Boolean functionality in the device, the Crosstalk Computing concept is considered for its high-density benefit [8]. To reassemble the Crosstalk Computing, independent gates of the device will act as aggressors, and the fin will act as a virtual victim. Like the interference-based aggressor-victim scenario, when voltage is applied at the gate, based on the voltage level, the device will either work in the inversion or depletion region. For the voltage below the threshold voltage, the device will remain in the depletion region and be in a partial ON state and produce logic 0 as output. For the voltage above the threshold limit, the device will be in the inversion region and be in an entirely ON state and produce logic 1 as output. Following this principle, elementary and complex logic are implemented in the device. In this paper, inputs A, B, and C are mapped to gate-1, gate-2, and gate-3, respectively. The selection of gate material and the gate dielectric is considered by reviewing numerous references [11][12][13][14][15][16][17][18][19][20][21][22][23][24]. TiN is considered as gate material for its specific work function of 4.4 eV. As mid-bandgap metal TiN, its work function is varied from 4.4 to 4.6 eV because of granularity. Tuning the work function, and proper potential barriers can be implemented for gate control. HfO2 is selected as the gate dielectric for its high dielectric constant which is essential for tuning the gate potential as well as gate control. Considering these issues, the device is implemented in SProcess and SDevice.

Elementary Logic
Two independent gates based on Junctionless FET implement the elementary logic gates (OR & AND). Input voltage is applied at the independent gates by voltage sweeping; in this case, Gate-1(Input A) and Gate-2 (Input B) (Figures 1(i & ii)). When the voltage is applied in the subthreshold, charge is depleted in the silicon fin. When some part of the Silicon fin ( Figure 1(i(a))) becomes neutral (i.e, no more depletion), then the threshold voltage is reached, and bulk current starts to flow in neutral silicon [6]. As the depletion decreases with the gate voltage increase, the diameter of the neutral channel also increases. At this moment, the device is in inversion (Figures 1(i(b-d))). When the device reaches flat band voltage, the entire channel region becomes neutral ( Figure 1(i(d))). Further increment of voltage will lead to full inversion. Here two gates act as aggressors, and the silicon fin acts like a victim, which resembles a Crosstalk setup. For example, in two input OR logic, when both the gates are 'OFF,' the depletion region beneath these two gates will overlap with each other. Hence, no current will flow, and the output will be logic 0, depicted in Figure  1(i(a)). However, when one of the gates is 'ON' and the other is 'OFF,' the depletion region will be removed, and partial inversion will occur, resulting in a conduction path (channel) between the source and drain ( Figures  1(i(b-c))). The device will then be in the 'ON' state, and the output will be logic 1. When both the gates are 'ON,' and the drain voltage is constant, the depletion region completely disappears, and full inversion will take place, widening the channel path to get a fully ON state and producing logic 1 output ( Figure 1(i(d))).
Similarly, for AND operation, when both gates are in an 'OFF' state, depletion regions will overlap, as shown in Figure 1(ii(a)). For one of the gates is 'ON,' the depletion region will wither away from the respected 'ON' gate area because of voltage increments and proceed to the inversion. However, the depletion region will remain at the 'OFF' state. As a result, the transistor will remain in the 'OFF' state (Figures 1(ii(b-c))). Only the specific voltage above the threshold voltage at both gates will remove all depletion regions, and the transistor will be in the 'ON' state ( Figure 1(ii(d))).
We proposed a double gate Junctionless device based on the above principle as shown in Figure 2 with 2 input AND, OR gates Figures 2(a & b) illustrate OR device and log plot of I-V characteristics, respectively. From Table 1, the device has a gate length of 14 nm, the gate material is TiN, a silicon fin width of 20 nm, and 20 nm in height. The device has an ON current of 10 µA and an OFF current of 0.32 pA with a threshold voltage of 0.3 V ( Figure 2b). As can be seen from Figures 2(a & c), the main differences between AND, OR devices are the orientation of the two independent gates. For OR logic implementation, the gate lengths are placed across each other to have equal control of the channel. Due to such arrangements, any individual gate or both gates together can create a partial inversion region within the channel (logic 1 at the output) whenever an input voltage is applied to the gates. In contrast, AND device gates' location is diagonal to each, which ensures the AND operation.    Table 1. For AND logic implementation, gates are placed on the opposite side, far away from each other. As a result, if any single gate has logic 1 input, it will not have enough control over the fin to create partial inversion and, therefore, will generate logic 0 at the output. However, when both gates have logic 1 input, the device will move from the depletion region to the partial inversion region and produce logic 1 output. Implementing geometric parameters from Table 1, AND, OR device is modeled. Table 2 lists the total current for all input combinations for both 2 input AND, OR logic. For the 00-input combination, AND produces 1 pA, and OR is producing 20 pA and stay in OFF state. For 01 and 10 input combinations, AND device has a total current of 25 nA, which is less than the ON current obtained from the device I-V characteristic because of gate orientation. However, for the OR device, the total current is 10 µA which is greater than the AND device's ON current and creates ON state (logic 1) at the output. For logic 1 input in both gates, the AND device has a total current of 1µA and reaches the ON state. OR is drawing 20 µA and stays in ON state.

Complex Logic
show the implementation of two different complex Boolean functions, AB+BC+CA and B+AC, respectively. Both devices have three independent gates based on the number of inputs of the logic functions. Using Crosstalk Computing technology, three independent gate-based Junctionless devices can efficiently implement different complex multi-level logic functions. AB+BC+CA and B+AC are such complex multi-level logic functions. The logic function AB+BC+CA is the expression for a full adder carry function. Hence, a full adder can be produced using this device. The device can be customized based on the input configuration of the logic function. The customization includes varying the gate location and gate oxide thickness [8].
Implementation of Boolean Function AB+BC+CA is shown in Figures 2(e & g). Device dimensions are given in Table 1. For this function, the output will be logic 1 when at least two inputs transition from low to high. We customized the device with three equal independent gates to achieve this functionality. All the gates are of equal length and contribute to the fin of attaining partial inversion. However, the oxide thickness for the gates is chosen to be different to limit the excess current flow during switching activities. Our previous work [8] shows that by keeping all other parameters the same, current flow can be limited by increasing doping concentration and gate oxide thickness. Oxide thickness for Gate-1 (Input A) and Gate-3 (Input C) is kept at 9 nm to limit excess current generation, and for Gate-2 (Input B), gate oxide thickness is 2 nm. HfO2 has been used as a gate oxide. This Junctionless FET has a gate length of 14 nm and a fin width of 22 nm with a height of 20 nm Figure 2f is the log plot of the I-V characteristic of the device. The device has an ION of 2.2 µA and IOFF of 1e-10 A/µm with a threshold voltage of 0.53V. Figure 2g depicts the device structure to realize the B+AC logic function. This device has a fin width of 42 nm, a fin height of 20 nm, and a gate length of 14 nm made of TiN. HfO2 is used as a gate oxide for this device as well. Gate-1 (Input A) and Gate-3 (Input C) have a gate oxide thickness of 9 nm, and Gate-2 has 2 nm. Figure  2g exhibits the log plot of the I-V characteristics of the device. The device has an ON current of 1.2 μA and an OFF current of 6.5e-10 A with a threshold voltage of 0.53 V. To comply with the B+AC logic condition, we increase the width of the fin. As a result, Gate-2 (Input B) will have a wider area to control inversion/partial depletion [10]. The other two gates will equally contribute to producing current. In this device, Gate-2 (Input B) should produce more current than Gate-1 (Input A) and Gate-3 (Input C). To present Gate-2 (Input B) is made intentionally wider with 2 nm oxide thickness so that it will work as a dominant gate/input to produce more current, and Gate-2 will achieve inversion faster than the other two gates and fulfill the logic. Gate-1 and Gate-3 have shorter gate areas than Gate-2, and current is also limited with 9 nm oxide thickness. Such gate arrangement is made to fulfil the logic of B+AC. The same Crosstalk setup is also applied here. Here, three gates act as aggressors, and the silicon fin acts like a victim.

Methodology
To achieve a functioning device, it is essential to examine the electrical characteristics. The device is designed with Sentaurus Process, which imitates the physical process steps. Obtaining the device by Synopsys Sentaurus Process [9], characteristic analysis is done with the Sentaurus Device [10]. The Sdevice model solves Poisson and carrier continuity equation to determine current behavior characterization. The silicon band structure and the effect of bandgap narrowing are calculated by the Oldslotboom method. From I-V characteristics, ON current, and OFF current is extracted and by examining the value of ON current, logic 0 and logic 1 are determined. the AC and DC behavior of a device cumulate the overall performance of a device (Table 3). For examining DC and AC characteristics of a device, average power, leakage power, and delay are crucial parameters. Average power and leakage power were extracted from the I-V characteristics using the basic formula for power. Propagation delay is calculated using the equation given in [13]: Cg is gate capacitance, Vdd is drain voltage, and Ion is ON current. Considering the condition of ON, the current of 1uA is regarded as logic 1. CMOS counterpart 14nm PTM I ON and I OFF are extracted from HSPICE simulations. Average power and leakage power are also extracted from the HSPICE simulation. To get the proper propagation delay of the PTM devices, the critical path of the circuit is considered and calculated from HSPICE. The Proposed devices and 14 nm PTM CMOS counterpart power and delay are summarized in Tables  4 and 5.

Results and Discussion
The present study confirms the findings of functional implementation in a standalone device (Figures 2(a &  c), Figures 2(e & g)) and shows the devices' efficiency in multi-gate functionality. Together, the present findings ensured the validity of our proposed devices AB+BC+CA and B+AC as complex logic. Figure 3i depicts some selected cases of input combination (000, 010, 100, 101, 110, and 111) for the AB+BC+CA logic, and Table 3 lists all the possible input combinations, the current of corresponding gates, the and total current. For the logic 1 output, we considered 1e-7 A as the threshold current and acquired AB+BC+CA logic. For logic 0 input in all gates, inversion regions overlap with each other, and as a result, the device will be in weak inversion, producing 1e-10 A at the output; hence, the device is in the 'OFF' state. For logic 1 input in only one gate, the same incidents happen; the device produces a 5.2e-8 A output current and remains in the 'OFF' state. For logic 1 input in two gates of 011,110 and 101 switching conditions, the inversion regions withered away, and conducting channel established between the source and drain produced an output current of 8.5e-7 A. As a result, the device shifted to the 'ON' state. The same phenomena occurred for the logic 1 input in all three gates. The device produces an output current of 2 µA and produced a logic 1 output.   Table 3 enlists all the current outputs corresponding to gates and total current. From Table 3, it can be seen that for the logic 0 input in three gates, the gate could not get enough control to achieve inversion, and subsequently, the device remains in the depletion region and draw a current of 1e-10 A; hence, logic 0. The same scenario happened for logic 1, for input combinations 001 and 100. For each case, the device remains in the depletion region and produces a 5.7e-8 A output current; hence, OFF state. In the case of the 010 input, the device has a 5.5e-7 A output current and produces a logic 1 output. In this device, gate 2 is the dominant gate. During the high transition of the input signal, the device goes in the partial inversion region; hence ON state (i.e. logic 1 output). For 011, 101, and 110 input combinations, since multiple inputs are high, the device goes to the partial inversion region and produces a logic 1 at the output. For all 011 and 101 input cases, the device has a 1.6 µA output current. The same scenario can be seen for the 111 input; the device gets a logic 1 output due to partial inversion and produces 2.5 µA total output current from three gates.
From Table 3, it is observed that all the switching cases are the same except the 010 input, where we are getting a logic 1 output. For the second device, the wider Gate-2 acts as the dominant gate and controls the channel, guiding it towards the partial inversion state. Both figures show that with multiple high inputs like 011, 101, 110, and 111, one gate acts as a dominant gate, aggregates all the currents from the other gates, and produces the total current. Average power, leakage power, delay, and Power Delay Product (PDP) are also calculated for the proposed devices of AND, OR, AB+BC+CA, and B+AC and compared with CMOS 14 nm devices and presented in Tables 4 and 5. Considering average power, the AND logic circuit and the OR device consume 0.3 µW and 0.115 µW where 14 nm PTM AND, OR consume 3.74 µW and 6.19 µW, respectively. The difference is mainly because of the transistor count reduction in the proposed device. The proposed device is a single device with functionality whereas the CMOS counterpart consists of several transistors with more resistive paths. As a result, the CMOS counterpart will consume much more power during multiple switching events than the proposed device. Leakage power for the AND device and the OR device is 0.064 nW and 0.016 nW, respectively. Both the devices have very small OFF currents 0.92 nA and 0.023 nA, respectively, resulting in this minimal leakage power. For the case of 15 nm Junctionless Single Gate Tunneling FET (JLSGTFET), ON current is 9.91e-4 A and OFF current is 2.8e-13 A [25].
The reason for minimal leakage power is also transistor count. For the 14 nm PTM counterpart, the transistor count is greater than the proposed device. Transistors count along with the resistive path, increasing leakage power. Our proposed AB+BC+CA device and the B+AC device have a leakage power of 0.098 nW and 0.412 nW, respectively, where 14nm PTM counterparts have 0.574 nW and 0.373 nW. The reason for this small leakage power is the tiny OFF current of our proposed device. The AB+BC+CA and B+AC device has a delay of 11.84 ps and 25.51 ps, respectively. The larger width of the B+AC device causes this longer delay than the AB+BC+CA device. 14 nm PTM counterpart has a 59.48 ps and 61.1 ps delay for AB+BC+CA and B+AC, respectively. This time delay is calculated considering the critical path netlist. The proposed research AND, OR devices delay 4.45 ps and 3.93 ps, respectively. As the proposed AND device has a small gate length, it will take a longer time to achieve partial depletion. So, the proposed AND device has a longer delay than the proposed OR device. The 14nm PTM AND, OR delay 37.19 ps and 29.49 ps, respectively. While considering power, it is recorded that our proposed devices consume less power than the 14nm PTM AND, OR devices. Our proposed device delay is in good agreement with the CMOS counterpart. Power Delay Product (PDP) indicates the average energy consumed per switching. Regarding our proposed device, AND, OR devices have PDP of 1.34e-6 J and 1.83e-6 J, and CMOS counterparts have much higher PDP because of transistor number. The same case goes for complex logic. Our proposed devices have less PDP than CMOS devices.
Achieving such Crosstalk logic behaviours in a single device indicates a denser circuit design. From the proposed methodology, it is evident that the device consists of lesser transistors with significant benefits. The AB+BC+CA logic circuit will require only one transistor, whereas CMOS technology will involve twenty transistors in the proposed device. For the case of B+AC Logic, the proposed research requires one transistor, where CMOS technology involves eight transistors. In the proposed research, the AB+BC+CA logic circuit space occupation is 10 times smaller compared to the CMOS counterpart regardless of the technology node. For B+AC logic, it is 3 times smaller than the CMOS counterpart. Regarding power, it consumes 8x times less average power than existing CMOS architecture, and the time delay is in picoseconds for both of our proposed devices which can be considered high-speed devices.

Random Dopant Fluctuation Analysis
Random dopant fluctuation (RDF) is analyzed for junctionless devices with TCAD [14]. RDF is a kind of process variation that occurs from implanted impurity concentration. RDF causes threshold voltage fluctuation, degradation of ON current, and increment of OFF current. RDF effect is simulated on elementary logic devices and complex logic devices.
Considering elementary logic, AND device is selected. Regular AND devices with RDF are compared with On current and Off current. The discrete random dopants are varied from 100 to 500, and performances are compared in Table 6. From this table, the regular AND device and RDF devices have the same current range with fluctuation. For 00 switching conditions, the device with 400 discrete particles has the highest current but is acceptable. For 01 switching condition, the device with 200 discrete dopants has the highest current limit.
In every case, some devices get a little bit extra current. The reason is that all the devices with RDF and the regular devices have the same doping concentration. For threshold voltage, the same reason is applicable.
AB+BC+CA device is analyzed with RDF as a complex logic device in Table 7. The device was analyzed with varying discrete dopants from 100 to 500 particles. Compared with the regular device, the devices with RDF are working in the proper current range with a slight fluctuation. The threshold voltage remains the same for the regular device and all cases of RDF. Both ON current and OFF current are in the proper range. For the 011 switching case, the device with 100 discrete particles has the maximum current in the acceptable range. For the 111 switching case, the device with 200 discrete dopants has the highest current in the range. As all the devices are junctionless devices with the same doping concentration all over, that plays a significant role in retaining an acceptable range of current, and very small scale of current fluctuations occur for random dopant effect. Although the RDF problem can be solved, there will be some difficulties with our proposed device. The devices will be hard to fabricate as their gate length is minimal, and the gate location must be precise to get logic output. Hence, there is some fabrication complexity with these proposed devices. Adjacent devices will affect the performance of each other if they are located too close. As a result, a particular distance between devices should be maintained to get noise-free output. Scaling down of devices may affect the noise margin of the devices. These complexities remain in our proposed devices.

Conclusion
Embedding functionality in a single device is a novel way of performing logic computations. In this paper, we proposed a new scheme for implementing the device as a functional component. The paper presents a detailed framework for the device structure necessary to achieve different logic functionality in a standalone device. Previously, we implemented a primitive gate in a single device. We implemented complex logic, AB+BC+CA, and B+CA in single devices and demonstrated their functionality in this work. The devices are also examined in random dopant fluctuation, and the devices are performing well in the RDF condition. This standalone device approach can yield a higher density of benefits of 8x compared to the conventional CMOS technology, and power consumption is 8x lesser than the CMOS counterpart. The device delay of the elementary and complex logic devices is comparable with 14nm PTM counterparts and exhibits better performance and promises better performance in all issues.

Conflict of interest
There is no conflict of interest for this study.