Machine Learning for Software Engineering
Earn a PhD Under the Supervision of Alexander Egyed!
Engineering is an inherently creative process that requires rigorous attention to details. However, engineering is also a collaborative, human centric process with adhoc activities. Engineering automations are few and rare – not just during programming but also during modeling, testing or maintenance. This PhD topic explores uses for machine learning in context of software engineering. On the most basic level, we envision the application of machine learning for reasoning with incomplete, uncertain, and/or incorrect software engineering knowledge. Justifiably, most engineering tools follow the philosophy of “garbage in-garbage out”. How could a tool reason correctly in the presence of errors? How could a tool reason at all if the input is incomplete? Yet, we must. Much of engineering is about uncertainty, incompleteness, and incorrectness. If our tools are only useful once we have complete and correct information, then arguably these tools are not useful during much of the engineering process. Obviously, this topic requires us to explore concrete applications in the software engineering domain. This set of applications is flexible and could include:
- Change Impact: Change Impact Prediction is about understanding what is affected by changes. For example, much like an online shopping platform uses machine learning to recommend products (if you are buying this then you might need that also), an engineering platform could recommend code changes (people who have changed this code also changed that code).
- Artifact Traceability: Experiments have shown that engineers who know where requirements are implemented in the code (traceability links) are not only 20% faster during maintenance but 60% more correct in identifying the changes necessary when a requirement changes. While there is a clearly demonstrated and strong benefit in using traceability links, today creating and maintaining traceability links is an error-prone, human-intensive activity that requires upfront investments (i.e., trace link capture during development for later use during maintenance). Moreover, reasoning with traceability links is problematic if the links and the various engineering artifacts are incomplete and/or partially incorrect. Machine learning could be helpful in recommending trace links.
- Variability: Fast-changing technologies and increasingly specialized customer demands require custom-tailored systems. Software development capture this diversity by providing systems that are customizable through features. Unfortunately, these customizations typically allow for a very large number of feature combinations that need to be explored during modeling, programming, and testing. In most industrial settings, it is impossible to test all possible feature combinations. Machine learning could identify interesting feature combinations to explore.
- Adaptability and Optimization: Pluggable, adaptable systems are a pre-requisite for a self-repairing and self-optimizing process. Take for example the manufacturing domain where we need to understand manufacturing processes for producing products. These processes need to be self-adaptable to allow for changing product variations. Self-optimization must ensure the efficient usage of resources (machines and workers) with minimal downtime. There is a trade off among changing environmental conditions, supply chain problems, or even evolving worker capabilities and manufacturing machines. Machine learning would be useful to help explore the many variables and their effect on one another – in this and other application domains (e.g., smart cities, health care).