Machine Learning for Software Engineering

Earn a PhD Under the Supervision of Alexander Egyed!

Engineering is an inherently creative process that requires rigorous attention to details. However, engineering is also a collaborative, human centric process with adhoc activities. Engineering automations are few and rare – not just during programming but also during modeling, testing or maintenance. This PhD topic explores uses for machine learning in context of software engineering. On the most basic level, we envision the application of machine learning for reasoning with incomplete, uncertain, and/or incorrect software engineering knowledge. Justifiably, most engineering tools follow the philosophy of “garbage in-garbage out”. How could a tool reason correctly in the presence of errors? How could a tool reason at all if the input is incomplete? Yet, we must. Much of engineering is about uncertainty, incompleteness, and incorrectness. If our tools are only useful once we have complete and correct information, then arguably these tools are not useful during much of the engineering process. Obviously, this topic requires us to explore concrete applications in the software engineering domain. This set of applications is flexible and could include:

  • Change Impact: Change Impact Prediction is about understanding what is affected by changes. For example, much like an online shopping platform uses machine learning to recommend products (if you are buying this then you might need that also), an engineering platform could recommend code changes (people who have changed this code also changed that code).
  • Artifact Traceability: Experiments have shown that engineers who know where requirements are implemented in the code (traceability links) are not only 20% faster during maintenance but 60% more correct in identifying the changes necessary when a requirement changes. While there is a clearly demonstrated and strong benefit in using traceability links, today creating and maintaining traceability links is an error-prone, human-intensive activity that requires upfront investments (i.e., trace link capture during development for later use during maintenance). Moreover, reasoning with traceability links is problematic if the links and the various engineering artifacts are incomplete and/or partially incorrect. Machine learning could be helpful in recommending trace links.
  • Variability: Fast-changing technologies and increasingly specialized customer demands require custom-tailored systems. Software development capture this diversity by providing systems that are customizable through features. Unfortunately, these customizations typically allow for a very large number of feature combinations that need to be explored during modeling, programming, and testing. In most industrial settings, it is impossible to test all possible feature combinations. Machine learning could identify interesting feature combinations to explore.
  • Adaptability and Optimization: Pluggable, adaptable systems are a pre-requisite for a self-repairing and self-optimizing process. Take for example the manufacturing domain where we need to understand manufacturing processes for producing products. These processes need to be self-adaptable to allow for changing product variations. Self-optimization must ensure the efficient usage of resources (machines and workers) with minimal downtime. There is a trade off among changing environmental conditions, supply chain problems, or even evolving worker capabilities and manufacturing machines. Machine learning would be useful to help explore the many variables and their effect on one another – in this and other application domains (e.g., smart cities, health care).

PhD Topic

Software Engineering


Alexander Egyed


About the PhD Advisor

Alexander Egyed is a professor for Software-Intensive Systems at the Johannes Kepler University, Austria (JKU) and the scientific head of the area of Cognitive Robotics and Shop Floors of the Pro2Future competence center for smart production. He received his Doctorate degree from the University of Southern California, USA in 2000 and then worked in industry until joining the University College London, UK in 2007 and JKU in 2008. At JKU, he built up a 35-people strong research group which is most recognized for its work on software and systems design – particularly on variability, consistency, traceability, and testing/monitoring. Dr. Egyed has over a hundred refereed scientific book, journal, and conference contributions with nearly 6000 citations to date. He was recognized a top scholar in software engineering in Communications of the ACM, Springer Scientometrics, and Microsoft Academic Search. He was also named an IBM Research Faculty Fellow in recognition for his contributions to consistency checking, received Recognition of Service Awards from the IEEE and ACM, Best Paper Awards from ICSME, ECSA, COMPSAC and WICSA, and an Outstanding Achievement Award from the USC. Dr. Egyed served as program chair, steering committee member, and editorial board member. He is a member of the IEEE, IEEE Computer Society, ACM, and ACM SigSoft.