Deep reinforcement learning (RL) has attracted research interest in the manufacturing area in recent years, but real implemented applications are rarely found. This is because agents have to explore the given environments many times until they learn how to maximize the rewards for actions, which they provide to the environments. While training, random actions or exploration from agents may be disastrous in many real-world applications, and thus, people usually use computer generated simulation environments to train agents. In this paper, we present a RL experiment applied to temperature control of a chamber for ultra-precision machines. The RL agent was built in Python and PyTorch framework using a Deep Q-Network (DQN) algorithm and its action commands were sent to National Instruments (NI) hardware, which ran C codes with a sampling rate of 1 Hz. For communication between the agent and the NI data acquisition unit, a data pipeline was constructed from the subprocess module and Popen class. The agent was forced to learn temperature control while reducing the energy consumption through a reward function, which considers both temperature bounds and energy savings. Effectiveness of the RL approach to a multi-objective temperature control problem was demonstrated in this research.
Chemical mechanical planarization (CMP) is a wafer planarization process that uses chemical reactions initiated by slurry and mechanical actions by pad asperity. The progression of CMP causes temperature deviation on the pad surface. Increase in process temperature results in increased material removal rate (MRR). So, pad temperature distribution is closely related to With-In Wafer Non-Uniformity (WIWNU). In this study, the pad temperature distribution is modelled from the energy perspective and slurry supply location is suggested to reduce temperature deviation. An energy supplying expression was created by setting the micro area and substituting the applied pressure, relative velocity, and process time. The energy and temperature distributions were observed as quite consistent and the temperature peak matched well with highest friction heat point (HFHP). Based on the model expression, the slurry injection position was set to the center of pad, the HFHP and wafer center, and change in temperature distribution was measured. A comparative analysis was carried out employing the existing method that uses multiple nozzles rather than single nozzles and the deviation was reduced by about 18.5% when slurry was supplied to the HFHP for a single nozzle and by 24.7% when the largest flow rate was supplied for multiple nozzles.