A Hybrid Model for Task Completion Effort Estimation

November 15, 2016

This paper is about integrating existing models in effort estimation literature using ensemble learning techniques.

Here you can see the preprint of the paper:

A Hybrid Model for Task Completion Effort Estimation – SWAN 2016 paper

 

And here is a link to my presentation slides on this paper:

A Hybrid Model for Task Completion Effort Estimation – SWAN 2016 presentation slides

 

Paper abstract:

Predicting time and effort of software task completion has been an active area of research for a long time. Previous studies have proposed predictive models based on either text data or metadata of software tasks to estimate either completion time or completion effort of software tasks, but there is a lack of focus in the literature on integrating all sets of attributes together to achieve better performing models.

We first apply the previously proposed models on the datasets of two IBM commercial projects called RQM and RTC to find the best performing model in predicting task completion effort on each set of attributes. Then we propose an approach to create a hybrid model based on selected individual predictors to achieve more accurate and stable results in early prediction of task completion effort and to make sure the model is not bounded to some attributes and consequently is adoptable to a larger number of tasks. Categorizing task completion effort values into Low and High labels based on their measured median value, we show that our hybrid model provides 3-8% more accuracy in early prediction of task completion effort compared to the best individual predictors.

Wordbook – the ultimate tool for language learners

April 13, 2016

Have you recently tried to learn a new language? Or maybe extending your vocabulary in a language you had already started to learn? How do you keep track of the new words and phrases that you learn every day? Here I have tool suggestion for you:

Wordbook – Build your own dictionary

It matters to try to memorize new words and phrases in the context you have learned them from. It matters to provide definitions, examples, synonyms and antonyms when trying to save your new words, and most importantly, it matters to classify them in a way that simplifies that process of reviewing them!

Yon URL shortener service

December 26, 2015

Four years ago, I started a new project called Yon Intelligent URL shortening service which was aimed to target Internet users in Persian language. After a few months, it was successful engaging many users and without any more efforts, it became more and more popular over time. Some of the key features of Yon which caused these success are:

Yon homepage

A screenshot from Yon URL shortener service

Although Yon didn’t limit itself to Persian Internet users and nowadays people from all around the world are using it, but they constitute the minority of users. And that’s while the website user interface even does not support English language yet. Therefore, making the website bilingual would be a good start to make more internationalized which is one the top priories of the features to be added to this online service in near future.

As it goes, I will write more about newer aspects of this service in next blog posts.

-Aliweb

Tags

Ontology sucks or helps? Depends on what you want out of it

December 25, 2015

“Fortunately”, there are various conflicting definitions available for the word “ontology” – which its role is clarifying and explicitly defining entities in a specific domain – so it approves the fact that defining and categorizing entities into specified boundaries is a difficult job even when it comes to the word ontology itself. Therefore, in many cases it’s better that this job be done by or benefited from collective intelligence, rather than a limited number of people even if they are professionals to do so, however they could provide us instructions and recommendations according to the knowledge and experience they have.

Anyway, lets provide some definitions which are more relevant to what we are going to talk about. In philosophy, it deals with subject of existence. In context of AI and knowledge sharing, it could be defined as “a specification of a conceptualization”. It is involved with clarifying entities in a specific domain and their relations to each other. Unsurprisingly, we have different types of relations including, but not limited to, being a super/sub-concept of another. Therefore we could divide these entities into groups which are interconnected and each of these groups could belong to one or more higher-level concepts/entities as well having their own subentities. Now the question is that how the overall scheme of this network of entities in a specified domain look like? Is it a hierarchical tree or a graph? And if it doesn’t have an exact tree structure, does it sufficiently look like a tree, so that we could represent it in a hierarchical form with some kind of makeup, say by providing shortcuts of some entities to the actual location they belong to? It depends. Maybe if we are supposed to sort books of a physical library, we could use tree-like hierarchies, but how about representing relationships between products of an online shopping cart? Or categorizing posts in an online forum or questions in an Q&A website? And even more crucial, how about categorizing web pages to provide them to users based on their search queries, as a search engine?

Human beings are always involved with categorizing and classification problems in their daily lives and it’s not just limited to sharing information on the web or in a library. They need to classify their stuff, their own knowledge and their plans and schedules, otherwise they will have an inefficient life. They might also agree on some rules together to be more organized and make things work better by these enforcing fixed rules – called standards – which are absolutely different from classification which makes separated categories by specified principals, so that the result of classification might be changed when the entities are altered.

To more elaborate the concept of ontology and its pros and cons, lets talk about the concept of “community of practice”.
Communities of practice are groups of people who share a concern or a passion for something they do and learn how to do it better as they interact regularly. However, learning is not necessarily what makes the community come together and it could be an incidental result of the community members’ interactions. There could be many properties assumed for a community to be a COP,  but we can say that at least three of these features and characteristics are crucial:

4-communities_of_practice

  1. The domain, so that members have common domain of interest.
  2. The community, so that members engage in joint activities and discussions in pursuing their shared interest.
  3. The practice, so that the community doesn’t have a shared interest merely and all members are practitioners.

The concept of community of practice is not an anomaly and such communities have been around as long as human beings have been learning together. There have been some assertions about properties of a COP based on early theoretical writing about them which might be partially or totally wrong today. For instance, a traditional claim says that COPs are self-organized, but we know that they mostly need some kind of articulation to be effective. Or one other assertion says that COPs are informal which is again in many cases false and we have many formal COPs.
As we see, the properties proposed for COPs have been agreed or disagreed over time and we could say that more characteristics were agreed as number of available communities grew and their similarities were more studied. Now we have a kind of ontology in domain of communities which specifies COP boundaries with three crucial and widely agreed characteristics which was an outcome of increasing participation of users of this domain to describe its entities.

Finally, back to the question proposed and the brief answer provided in the title, we should know what we want out of ontology to make judgments about it. Do we want to do it on our own and as the only solution to any knowledge sharing problem in any cases? Then it sucks. But aren’t there hybrid solutions available? Couldn’t we benefit from both what we – the professional ontologists – think and what others think, say producing some general high-level categories based on human-generated labels and experiences with a “bottom-up” concept hierarchy generation basis, or even providing a worldview, made of human-generated experience combined with a “symbolic-link-aided approximated conceptual hierarchy”?

P.S: These two critical reviews about the article “ontology is overrated” worth reading:
Clay Shirky – “Ontology is overrated”: a review
Clay Shirky’s Viewpoints are Overrated

Ingredients to cook a delicious CSCW research

Doing research simply means the systematic use of some set of theoretical and empirical tools to try to increase our understanding of some set of phenomena or events.

Joseph E.Mcgrath, the author of article “Methodology matters:doing research in the behavioral and social sciences” says.

In fact both of the two first articles are addressing the fact that what you are going to get out of a study totally depends on the method you are using and your philosophical stance. However the first article “Selecting Empirical Methods for Software Engineering Research” concentrates on software engineering research which makes it more sensible for me, while the second reading talks about research in social sciences.

The first reading mentions that we can not certainly say which research method is suitable for which research problems and various local factors should be considered when selecting a method, including available resources, access to subjects, opportunity to control the variables of interest and skills of the researcher. Furthermore, it precisely mentions that because each method has its own flaws, comprehensive research strategies which benefits from multiple research methods are more viable, so that weaknesses of each used method could be addressed and compensated by use of other methods.

First we must see what kind of research question we are asking. Potential questions include:

one important factor is your philosophical stance, which dramatically affects how evidences and responses to your research question(s) satisfies you. Here are four important philosophical stances:

To classify possible research methods, we could introduce 5 major classes for software engineering research, however not all researchers in this area will necessarily have consensus on the names and domains of these classes:

Furthermore, the second reading says that a research process always involve with a content of interest, some ideas that give meaning to it and some techniques that enable the researcher(s) to study them. The author depicts three domains of research in behavioral and social science including:

3-1

Then the author mentions 8 research strategies and groups them into 4 quadrants as following:

  1. Field strategies including field study and field experiment. Both emphasize that the behavior system under study is natural, in the sense that it would occur whether or not the researcher were there and whether or not it were being observed as part of a study.
  2. Experimental strategies including laboratory experiment and experimental simulation. In contrast to those of Quadrant 1, they involve concocted rather than natural settings. The laboratory experiment and the experimental simulation are strategies that involve “actor-behavior-context” systems that would not exist at all were it not for the researcher’s interest in doing the study.
  3. Respondent strategies including sample and judgement study. They concentrate on the systematic gathering of responses of the participants to questions or stimuli formulated by the experimenter, in contrast to the observation of behaviors of the participants within an ongoing behavior system. Studies are usually done under neutral conditions of room temperature, lighting, chair comfort to nullify any effects of the behavior setting or context on the judgments that are the topic of study.
  4. Theoretical strategies including formal theory and computer simulation. The inclusion of these two strategies reminds us of the importance of the theoretical side of the research process. One of the more powerful general strategies for research is the simultaneous use of one of the theoretical strategies and one of the empirical strategies.

The author of this paper try to highlight the fact that results of our experiment always depend on methods, and like first reading, it verifies that a combination of multiple methods should be used. It also mentions the important fact that the results of a study should not be interpreted in isolation. Researcher should always consider other evidences and studies on the same research question.

3-2In the 3rd reading, the author mentions some conceptual models of CSCW which are largely descriptive and then talks about her framework, which is currently descriptive as well, but supposed to be more developed by further investigation. She defends her model of “coordinated action” by claiming that it frees us from having to decide on only one common field of work and one “clear-cut goal”. She also mentions that they chose the word “action” to emphasize the importance of goal-directedness implied by the word “work”. MoCA has 7 dimensions as following:

  1. Synchronicity: According to Johansen’s matrix, it concerns a continuum of coordinated action ranging from being conducted synchronously, or at the same time, to asynchronously, or at different times.
  2. Physical Distribution: Again similar to Johansen’s matrix, this continuum concerns that if all actions are occurring in the same geographic location or at different places. It emphasizes that working from different locations for a long time will definitely be a big problem.
  3. Scale: Addresses number of participants involved in collaboration. Because a lot of articulation work needs to be done to organize members of a teamwork.
  4. Number of Communities of Practice: It focuses specifically on the notion of different cultural communities, so that when people from different disciplines come together and want to collaborate, how could we manage and resolve these differences.
  5. Nascence: Discusses Unstablished v.s. established coordinated actions.
  6. Planned Permanence: It refers to planned or intended permanence which is less addressed in similar works and it’s important because it’s usually impossible to say until when a coordinated action will continue and also hard to say when things are at a stable situation.
  7. Turnover: Refers to the rapidity with which participants enter and leave. This dimension cover collaborations that range from closed, private collaborations where participants leave slowly, if at all, to collaborations that are fully open and public so that might have many participants.

You can watch her presentation about this work in Stanford seminar:
https://www.youtube.com/watch?t=1&v=bQLvPhnEvyY

As a new researcher in software engineering areas, I found the first reading surprisingly useful. It drew a comprehensive picture of the art of doing empirical research in software engineering, however many key points of this reading is not just limited to this area. It elegantly shows us how to propose a valid and valuable research question as the key starting point of doing research. Then describes what to consider when looking for an answer to this question and emphasizes that the way we think and our philosophical expectations and stances will have a considerable impact on potential answers. Then highlights the importance of having a theory which is like a lens through which the world is observed. Then provide nice information about various empirical methods in software engineering and how they could be combined together to compensate each other’s shortfalls. And finally provides information about how to collect data and how to validate results based on the stance and method we have used. The second reading also does same effort and in some cases provide more details on methods which the first reading didn’t discuss in-depth. It also reminisces me how closely interconnected social sciences and collaborative software engineering topics are. Finally, the final reading represents a framework of coordinated action which tries to add more value to related works and models around CSCW studies by discussing some aspects (dimensions in the author’s words) that were less focused by other works while addressing many useful works that have been done in the past.