Object Picking and Constrained Placement by Visual Reasoning
||Object Picking and Constrained Placement by Visual Reasoning|
||Yufei Zhu <firstname.lastname@example.org>|
||2023-11-15 – 2024-06-01|
Pick-and-place is a common task for which robot manipulators are used today. For example, picking up an object and then placing it in a specific container is a common use case in the industry. This task usually assumes that the target container is specified, and the object can be placed randomly within it. However, that is not always the case. Sometimes, we place objects in the container with a specific pose to stack them tightly, improving space utilization.
Additionally, we may encounter more complex constraints when gathering objects, beyond simply placing them in a container. The objective of this project is to develop a robotics system that is capable of planning where to pick and place based on predefined constraints and placing the object in certain poses.
In our project, we decompose the task into three modules. First, a detection module that detects the positions of the objects and containers and estimates their properties such as color and shape. Second, a planning module generates a pick-and-place plan based on the predefined constraints and (detected) object properties. In the planning module, we propose to begin implementing a simplified offline planning framework where all plan steps would be generated offline prior to the execution phase. This method promises speed owing to reduced environmental interactions, although at the expense of an increased failure risk – a single unexecuted step results in the plan’s failure. Depending on the outcomes, the subsequent phase would involve adaptive planning, with each step’s generation interleaved with execution, and the updated state input continually fed to the LLM for progressive planning. Third, an in-hand perception and control module that picks and places the object based on the plan. Between pick and place, the module also needs to estimate the in-hand object pose to align the object pose with the target container. For in-hand object pose estimation, there are prior works focusing on the 6D object pose estimation but with the object is placed on the table or grasped in the two-finger gripper. Grasping by the multi-finger hand may cause a more complex and larger region of occlusions. In this project, we will develop a model that can estimate novel object poses without novel object’s annotation labels and apply the model in our module.