Manufacturing systems are increasingly required to operate in high-mix, low-volume production environments, where process flexibility is crucial. One effective way to achieve this flexibility is through the use of multiple processing alternatives (MPA), allowing a product to be produced using different process plans or component structures. In MPA environments, scheduling decisions must address both the selection of processing alternatives for each product and the execution order of the resulting production tasks. Additionally, processing times often vary due to machine conditions and process variability, further complicating scheduling. This study introduces a dual-network-based deep reinforcement learning method for scheduling in manufacturing systems with multiple processing alternatives. The framework utilizes two Q-networks to learn both the selection of processing alternatives and the dispatching rules. Computational experiments demonstrate that the proposed method effectively reduces both the average makespan and its variability compared to a genetic algorithm-based approach, particularly as the problem size increases, showcasing its effectiveness in the face of processing time uncertainty.
The increasing adoption of industrial robot arms in advanced manufacturing has heightened the need for flexible trajectory planning methods that go beyond traditional offline programming (OLP) tools, which are often expensive, proprietary, and limiting. This study introduces an OLP-free pipeline designed to generate robot trajectory data and optimize paths for six-degree-of-freedom (6-DOF) robot arms using discrete reinforcement learning. Initially, five-axis NC code derived from CAD/CAM data is transformed into tool center point (TCP) trajectories through coordinate transformations. An analytical inverse kinematics solver then produces multiple joint solutions for each TCP pose, creating a discrete action space from which the learning agent can select feasible joint configurations along the trajectory. A reward function that considers variations in joint velocity and acceleration, as well as pose error, facilitates the simultaneous optimization of motion smoothness and tracking accuracy. The optimized trajectories are validated using an open-source physics simulator, showing enhanced motion stability, accuracy, and collision safety compared to conventional OLP-based paths. This proposed framework provides a flexible and cost-effective alternative to commercial OLP tools and lays a scalable foundation for future applications in automated and collaborative manufacturing systems.
Facility Layout Problem (FLP) aims to optimize arrangement of facilities to enhance productivity and minimize costs. Traditional methods face challenges in dealing with the complexity and non-linearity of modern manufacturing environments. This study introduced an approach combining Reinforcement Learning (RL) and simulation to optimize manufacturing line layouts. Deep Q-Network (DQN) learns to reduce unused space, improve path efficiency, and maximize space utilization by optimizing facility placement and material flow. Simulations were used to validate layouts and evaluate performance based on production output, path length, and bending frequency. This RL-based method offers a more adaptable and efficient solution for FLP than traditional techniques, addressing both physical and operational optimization.
In this paper, we propose a deep Q-network-based resource allocation method for efficient communication between a base station and multiple Unmanned Aerial Vehicles (UAVs) in environments with limited wireless resources. This method focused on maximizing the throughput of UAV to Infrastructure (U2I) links while ensuring that UAV to UAV (U2U) links could meet their data transmission time constraints, even when U2U links share the wireless resource used by U2I links. The deep Q-network agent uses the Channel State Information (CSI) of both U2U and U2I links, along with the remaining time for data transmission, as state, and determines optimal Resource Block (RB) and transmission power for each UAV. Simulation results demonstrated that the proposed method significantly outperformed both random allocation and CSI-based greedy algorithms in terms of U2I link throughput and the probability of meeting U2U link time constraints.
Environmental issues have become a global concern recently. Countries worldwide are making efforts for carbon neutrality. In the automotive industry, focus has shifted from internal combustion engine vehicle to eco-friendly vehicles such as Electric Vehicles (EVs), Hybrid Electric Vehicles (HEVs), and Fuel Cell Electric Vehicles (FCEVs). For driving strategy, research on vehicle driving method that can reduce vehicle energy consumption, called eco-driving, has been actively conducted recently. Conventional cruise mode driving control is not considered an optimal driving strategy for various driving environments. To maximize energy efficiency, this paper conducted research on eco-driving strategy for EVs-based on reinforcement learning. A longitudinal dynamics-based electric vehicle simulator was constructed using MATLAB Simulink with a road slope. Reinforcement learning algorithms, specifically Deep Deterministic Policy Gradient (DDPG) and Deep QNetwork (DQN), were applied to minimize energy consumption of EVs with a road slope. The simulator was trained to maximize rewards and derive an optimal speed profile. In this study, we compared learning results of DDPG and DQN algorithms and confirmed tendencies by parameters in each algorithm. The simulation showed that energy efficiency of EVs was improved compared to that of cruise mode driving.
In recent years, research on machine learning techniques that can be integrated with existing suspension control algorithms for enhanced control effects has advanced considerably. Machine learning, especially involving neural networks, often requires many samples, which makes maintaining robust performance in diverse, changing environments challenging. The present study applied reinforcement learning, which can generalize complex situations not previously encountered, to overcome this obstacle and is crucial for suspension control under varying road conditions. The effectiveness of the proposed control method was evaluated on different road conditions using the quarter-vehicle model. The impact of training data was assessed by comparing models trained under two distinct road conditions. In addition, a validation exercise on the performance of the control method that utilizes reinforcement learning demonstrated its potential for enhancing the adaptability and efficiency of suspension systems under various road conditions.
Citations
Citations to this article as recorded by
Control Characteristics of Active Suspension in Vehicles using Adaptive Control Algorithm Jeong Seo Jang, Jung Woo Sohn Transactions of the Korean Society for Noise and Vibration Engineering.2024; 34(5): 568. CrossRef
Suspension Mechanism Design of a Low-platform Target Robot for Evaluating Autonomous Vehicle Active Safety Jae Sang Yoo, Do Hyeon Kim, Jayil Jeong Journal of the Korean Society for Precision Engineering.2024; 41(5): 375. CrossRef
Deep reinforcement learning (RL) has attracted research interest in the manufacturing area in recent years, but real implemented applications are rarely found. This is because agents have to explore the given environments many times until they learn how to maximize the rewards for actions, which they provide to the environments. While training, random actions or exploration from agents may be disastrous in many real-world applications, and thus, people usually use computer generated simulation environments to train agents. In this paper, we present a RL experiment applied to temperature control of a chamber for ultra-precision machines. The RL agent was built in Python and PyTorch framework using a Deep Q-Network (DQN) algorithm and its action commands were sent to National Instruments (NI) hardware, which ran C codes with a sampling rate of 1 Hz. For communication between the agent and the NI data acquisition unit, a data pipeline was constructed from the subprocess module and Popen class. The agent was forced to learn temperature control while reducing the energy consumption through a reward function, which considers both temperature bounds and energy savings. Effectiveness of the RL approach to a multi-objective temperature control problem was demonstrated in this research.