Wordbook – the ultimate tool for language learners

April 13, 2016

Have you recently tried to learn a new language? Or maybe extending your vocabulary in a language you had already started to learn? How do you keep track of the new words and phrases that you learn every day? Here I have tool suggestion for you:

Wordbook – Build your own dictionary

It matters to try to memorize new words and phrases in the context you have learned them from. It matters to provide definitions, examples, synonyms and antonyms when trying to save your new words, and most importantly, it matters to classify them in a way that simplifies that process of reviewing them!

Yon URL shortener service

December 26, 2015

Four years ago, I started a new project called Yon Intelligent URL shortening service which was aimed to target Internet users in Persian language. After a few months, it was successful engaging many users and without any more efforts, it became more and more popular over time. Some of the key features of Yon which caused these success are:

Yon homepage

A screenshot from Yon URL shortener service

Although Yon didn’t limit itself to Persian Internet users and nowadays people from all around the world are using it, but they constitute the minority of users. And that’s while the website user interface even does not support English language yet. Therefore, making the website bilingual would be a good start to make more internationalized which is one the top priories of the features to be added to this online service in near future.

As it goes, I will write more about newer aspects of this service in next blog posts.

-Aliweb

Tags

Ontology sucks or helps? Depends on what you want out of it

December 25, 2015

“Fortunately”, there are various conflicting definitions available for the word “ontology” – which its role is clarifying and explicitly defining entities in a specific domain – so it approves the fact that defining and categorizing entities into specified boundaries is a difficult job even when it comes to the word ontology itself. Therefore, in many cases it’s better that this job be done by or benefited from collective intelligence, rather than a limited number of people even if they are professionals to do so, however they could provide us instructions and recommendations according to the knowledge and experience they have.

Anyway, lets provide some definitions which are more relevant to what we are going to talk about. In philosophy, it deals with subject of existence. In context of AI and knowledge sharing, it could be defined as “a specification of a conceptualization”. It is involved with clarifying entities in a specific domain and their relations to each other. Unsurprisingly, we have different types of relations including, but not limited to, being a super/sub-concept of another. Therefore we could divide these entities into groups which are interconnected and each of these groups could belong to one or more higher-level concepts/entities as well having their own subentities. Now the question is that how the overall scheme of this network of entities in a specified domain look like? Is it a hierarchical tree or a graph? And if it doesn’t have an exact tree structure, does it sufficiently look like a tree, so that we could represent it in a hierarchical form with some kind of makeup, say by providing shortcuts of some entities to the actual location they belong to? It depends. Maybe if we are supposed to sort books of a physical library, we could use tree-like hierarchies, but how about representing relationships between products of an online shopping cart? Or categorizing posts in an online forum or questions in an Q&A website? And even more crucial, how about categorizing web pages to provide them to users based on their search queries, as a search engine?

Human beings are always involved with categorizing and classification problems in their daily lives and it’s not just limited to sharing information on the web or in a library. They need to classify their stuff, their own knowledge and their plans and schedules, otherwise they will have an inefficient life. They might also agree on some rules together to be more organized and make things work better by these enforcing fixed rules – called standards – which are absolutely different from classification which makes separated categories by specified principals, so that the result of classification might be changed when the entities are altered.

To more elaborate the concept of ontology and its pros and cons, lets talk about the concept of “community of practice”.
Communities of practice are groups of people who share a concern or a passion for something they do and learn how to do it better as they interact regularly. However, learning is not necessarily what makes the community come together and it could be an incidental result of the community members’ interactions. There could be many properties assumed for a community to be a COP,  but we can say that at least three of these features and characteristics are crucial:

4-communities_of_practice

  1. The domain, so that members have common domain of interest.
  2. The community, so that members engage in joint activities and discussions in pursuing their shared interest.
  3. The practice, so that the community doesn’t have a shared interest merely and all members are practitioners.

The concept of community of practice is not an anomaly and such communities have been around as long as human beings have been learning together. There have been some assertions about properties of a COP based on early theoretical writing about them which might be partially or totally wrong today. For instance, a traditional claim says that COPs are self-organized, but we know that they mostly need some kind of articulation to be effective. Or one other assertion says that COPs are informal which is again in many cases false and we have many formal COPs.
As we see, the properties proposed for COPs have been agreed or disagreed over time and we could say that more characteristics were agreed as number of available communities grew and their similarities were more studied. Now we have a kind of ontology in domain of communities which specifies COP boundaries with three crucial and widely agreed characteristics which was an outcome of increasing participation of users of this domain to describe its entities.

Finally, back to the question proposed and the brief answer provided in the title, we should know what we want out of ontology to make judgments about it. Do we want to do it on our own and as the only solution to any knowledge sharing problem in any cases? Then it sucks. But aren’t there hybrid solutions available? Couldn’t we benefit from both what we – the professional ontologists – think and what others think, say producing some general high-level categories based on human-generated labels and experiences with a “bottom-up” concept hierarchy generation basis, or even providing a worldview, made of human-generated experience combined with a “symbolic-link-aided approximated conceptual hierarchy”?

P.S: These two critical reviews about the article “ontology is overrated” worth reading:
Clay Shirky – “Ontology is overrated”: a review
Clay Shirky’s Viewpoints are Overrated

Ingredients to cook a delicious CSCW research

Doing research simply means the systematic use of some set of theoretical and empirical tools to try to increase our understanding of some set of phenomena or events.

Joseph E.Mcgrath, the author of article “Methodology matters:doing research in the behavioral and social sciences” says.

In fact both of the two first articles are addressing the fact that what you are going to get out of a study totally depends on the method you are using and your philosophical stance. However the first article “Selecting Empirical Methods for Software Engineering Research” concentrates on software engineering research which makes it more sensible for me, while the second reading talks about research in social sciences.

The first reading mentions that we can not certainly say which research method is suitable for which research problems and various local factors should be considered when selecting a method, including available resources, access to subjects, opportunity to control the variables of interest and skills of the researcher. Furthermore, it precisely mentions that because each method has its own flaws, comprehensive research strategies which benefits from multiple research methods are more viable, so that weaknesses of each used method could be addressed and compensated by use of other methods.

First we must see what kind of research question we are asking. Potential questions include:

one important factor is your philosophical stance, which dramatically affects how evidences and responses to your research question(s) satisfies you. Here are four important philosophical stances:

To classify possible research methods, we could introduce 5 major classes for software engineering research, however not all researchers in this area will necessarily have consensus on the names and domains of these classes:

Furthermore, the second reading says that a research process always involve with a content of interest, some ideas that give meaning to it and some techniques that enable the researcher(s) to study them. The author depicts three domains of research in behavioral and social science including:

3-1

Then the author mentions 8 research strategies and groups them into 4 quadrants as following:

  1. Field strategies including field study and field experiment. Both emphasize that the behavior system under study is natural, in the sense that it would occur whether or not the researcher were there and whether or not it were being observed as part of a study.
  2. Experimental strategies including laboratory experiment and experimental simulation. In contrast to those of Quadrant 1, they involve concocted rather than natural settings. The laboratory experiment and the experimental simulation are strategies that involve “actor-behavior-context” systems that would not exist at all were it not for the researcher’s interest in doing the study.
  3. Respondent strategies including sample and judgement study. They concentrate on the systematic gathering of responses of the participants to questions or stimuli formulated by the experimenter, in contrast to the observation of behaviors of the participants within an ongoing behavior system. Studies are usually done under neutral conditions of room temperature, lighting, chair comfort to nullify any effects of the behavior setting or context on the judgments that are the topic of study.
  4. Theoretical strategies including formal theory and computer simulation. The inclusion of these two strategies reminds us of the importance of the theoretical side of the research process. One of the more powerful general strategies for research is the simultaneous use of one of the theoretical strategies and one of the empirical strategies.

The author of this paper try to highlight the fact that results of our experiment always depend on methods, and like first reading, it verifies that a combination of multiple methods should be used. It also mentions the important fact that the results of a study should not be interpreted in isolation. Researcher should always consider other evidences and studies on the same research question.

3-2In the 3rd reading, the author mentions some conceptual models of CSCW which are largely descriptive and then talks about her framework, which is currently descriptive as well, but supposed to be more developed by further investigation. She defends her model of “coordinated action” by claiming that it frees us from having to decide on only one common field of work and one “clear-cut goal”. She also mentions that they chose the word “action” to emphasize the importance of goal-directedness implied by the word “work”. MoCA has 7 dimensions as following:

  1. Synchronicity: According to Johansen’s matrix, it concerns a continuum of coordinated action ranging from being conducted synchronously, or at the same time, to asynchronously, or at different times.
  2. Physical Distribution: Again similar to Johansen’s matrix, this continuum concerns that if all actions are occurring in the same geographic location or at different places. It emphasizes that working from different locations for a long time will definitely be a big problem.
  3. Scale: Addresses number of participants involved in collaboration. Because a lot of articulation work needs to be done to organize members of a teamwork.
  4. Number of Communities of Practice: It focuses specifically on the notion of different cultural communities, so that when people from different disciplines come together and want to collaborate, how could we manage and resolve these differences.
  5. Nascence: Discusses Unstablished v.s. established coordinated actions.
  6. Planned Permanence: It refers to planned or intended permanence which is less addressed in similar works and it’s important because it’s usually impossible to say until when a coordinated action will continue and also hard to say when things are at a stable situation.
  7. Turnover: Refers to the rapidity with which participants enter and leave. This dimension cover collaborations that range from closed, private collaborations where participants leave slowly, if at all, to collaborations that are fully open and public so that might have many participants.

You can watch her presentation about this work in Stanford seminar:
https://www.youtube.com/watch?t=1&v=bQLvPhnEvyY

As a new researcher in software engineering areas, I found the first reading surprisingly useful. It drew a comprehensive picture of the art of doing empirical research in software engineering, however many key points of this reading is not just limited to this area. It elegantly shows us how to propose a valid and valuable research question as the key starting point of doing research. Then describes what to consider when looking for an answer to this question and emphasizes that the way we think and our philosophical expectations and stances will have a considerable impact on potential answers. Then highlights the importance of having a theory which is like a lens through which the world is observed. Then provide nice information about various empirical methods in software engineering and how they could be combined together to compensate each other’s shortfalls. And finally provides information about how to collect data and how to validate results based on the stance and method we have used. The second reading also does same effort and in some cases provide more details on methods which the first reading didn’t discuss in-depth. It also reminisces me how closely interconnected social sciences and collaborative software engineering topics are. Finally, the final reading represents a framework of coordinated action which tries to add more value to related works and models around CSCW studies by discussing some aspects (dimensions in the author’s words) that were less focused by other works while addressing many useful works that have been done in the past.

Towards technical aspects of CSCW: Web 2.0 and Collaborative Visualization

Many years have passed after the advent of web 2.0, there have been lots of conferences, journals and articles around it and many individuals on the web and in their handwriting have written a lot about it, and we have experienced many web 2.0 websites specially after virtual social media became popular, but still many don’t have a consensus on what it exactly mean and cast doubt on claims of many websites which have used to count themselves as members of this wild.

2-web-2-0-development

In fact, web 2.0, just like many other areas, doesn’t have a specified boundary, but it has some common principals which constitute the core of this wild and we could say that the more of these principals a particular web-app follows, the more similarity to a web 2.0 app it will have. Let’s take a look at some of the most important principles and provide some details and examples of them:

  1. The Web As Platform
    Some companies started their existence on the web and became totally web-based, like Google and Ebay. They have important differences with Microsoft-like companies (considering what Microsoft was at the time of publishing this reading), so that they have multi-user (or better to say many-user) applications, they are not selling software packages, but they give service, they are involved with huge amount of data and they deliver their service over a network, called internet.
    Much of the web 2.0 success come from a concept called “the long tail”. It refers to the fact that if the power of many small sites become collected together, it could make an enormous power, the Idea which many successful web-based companies like search engines, advertising networks, web analytics services, etc. makes use of. The author says:

    The Web 2.0 lesson: leverage customer-self service and algorithmic data management to reach out to the entire web, to the edges and not just the center, to the long tail and not just the head

  2. Harnessing Collective Intelligence
    So why doing something by inaccurate incomplete automated algorithms, while it could be done by the users in the best way, without any special effort?
    For example, in many websites we need to classify our large amount of data. We could define some predefined categories, so that every user who wants to generate new content, uses those categories. but some web 2.0 websites let users to define those classifying points as tags, hashtags and topics. Delicious, Flickr, StackExchange Q&A network by supporting tags, and now social media sites like Facebook and Twitter by supporting hashtags introduce folksonomy instead of static taxonomy-based classifying web content.
    As another example,  Google translate makes use of contribution of its users to  learn more and improve level of its translations.
    Or Gmail makes use of “report as spam/fishing” to learn from users which content is likely to be unwanted and bothering for users.
    Sites like 4square, Tripadvisor, etc. benefit from user reviews and lots of other examples which benefit from users’ intelligence in other aspects.
  3. Data is the Next Intel Inside
    Web 2.0 apps are almost always involved with large amount of data, so that they are sometimes called infoware rather than software. Many of these web-apps have giant distributed databases spread out in different locations to provide higher level of performance/scalability/reliability/accessibility. This data could be owned by the corporation itself, or owned by the users like online drives such as Google Drive.
  4. End of the Software Release Cycle
    Internet era software is that it is delivered as a service, not as a product. This fact leads to a number of fundamental changes so that the functionality of the application becomes its core competency and because the users are helping us to improve the quality of our service and enrich the data we have they are kind of our co-developers, in a reflection of open source development practices.
  5. Lightweight Programming Models
    Supporting lightweight programming models allow systems to be more loosely coupled. By providing providing a simple programming interface we will let other applications easily communicate with ours, and by using other available services and open-source free software, we would benefit from what they are providing us without having to reinvent the wheel again.
  6. Software Above the Level of a Single Device
    Although even a basic web-app is made of 2 devices, one hosting and one client device, but in web 2.0 we see web applications being used collaboratively by many users, as well as enabling us to work with them with different devices like PCs, web browsers, smart phone apps, tablets and even TVs and other multimedia devices. So we are able to use them from anywhere while synchronizing our data and having access to our last changes to data, if there’s any.
  7. Rich User Experiences
    There have been many progresses in technologies being used in web-apps to improve user experience after advent of web, so supporting more CSS effects and then introducing CSS3, advent of Javascript to make more dynamic client-side events and then Flash for GUI-like experiences. But we could say that there was a revolution in the web after advent of Ajax which itself is a combination of several technologies such as DOM, data transfer using XML and XSLT (and now JSON), asynchronous data retrieval using XMLHTTPRequest object and Javascript. Ajax was aimed to bring desktop-like user experience, so that users don’t have to wait much or do special things to view further changes in their web browsing experience as well as not spending much bandwidth and resources to recalculate what has been processed before. Google by introducing Gmail and Google maps started this revolution in a significant way and nowadays  we see that all modern social networks as well as popular services like Google analytics, online word processors used in Google drive or Microsoft OneDrive and many other websites for different applications such as type-ahead search result displaying and auto loading the rest of document on browser scrolling, etc. use that.

We can categorize web 2.0 artifacts into several major groups, so we are going to introduce 3 of these groups:

  1. Social Media: Actually social media isn’t a new concept and it was commonplace in the past. But why is news-making now? Because in 19th and 20th century we had some broadcasting tools like TV and radio that just a small amount of people generated it’s content  – which is the opposite side of social media – so now we are again back to social media, but now it’s very cheaper, thanks to internet, with ability to reach your voice to a big audience in a twinkling of an eye. So it’s not a waste of time because it helps us to communicate with each other better and have easier access to public information. It could help some revolutions to take occur because it facilitates communication and coordination, but it actually doesn’t “cause” revolutions and other unwanted (or maybe wanted?!) incident to take place and most important of all, it’s not a fad because it existed in old history. Actually, the mass-media era was a historical anomaly. So we have had equivalent historical instances for all concepts like blogs, microblogs and even Instagram and they were pamphlets, coffeehouses and image albums!
    Even though social media might be changed again in the form, like it was changed in modern era, but it will permanently remain.
  2. Online Wikis:  Wikipedia as the major example of online wikis, was founded to bring a free encyclopedia for every individual. it uses free license, so everyone can use it freely, or even for commercial purposes and could copy-paste and share its content. It has no other budget sources except for donations from public, people who use it and love it. However it doesn’t need much funding because most things or self-managed or better to say user managed. So it only have 1 employee which is a software developer and its servers are managed by volunteer system admins and the whole foundation is organized by a ragtag bound of people.
    To maintain the quality of service, it mostly relies on some social policies as well as some software features. The most important of them is “neutral point-of-view policy is the most important”  so they don’t tend to write the truth about different topics, because everyone could have different ideas about what is the truth. It tends to write “what is evident and what has high-credit references”.
    Finally, social rules are left completely open ended in the software. so that there’s not a voting system to decide about which edit should be reverted or not and instead, users write their reasons for the opinion they are holding and by analyzing them, the final decision will be made manually.
  3. Open-source software community: open-source software is popular because it is transparent, benefits from collective efforts of many individuals to be better and in many cases, provides better quality even compared to commercial products. Examples like Apache and Nginx webservers, MySQL and Postgres DBMSes, and many other development tools verify their more popularity compared to paid products at least in software-base businesses. This fact refers to the “harnessing collective intelligence” principal we mentioned in previous section which shows the power group-work and that’s why a giant search engine like google count more on number of backlinks to a particular website or the click impression rate of its search results than any other factors to provide rankings for websites on different search queries.

Collaborative Visualization:

Collaboration has been mentioned as one of the biggest challenges for visualization and visual analytics because the problems that analysts face in the real
world are becoming increasingly large and complex and involving a broader scope.
Additionally, interaction with digital information is increasingly becoming a social activity and social media could be good example for this.

We should also mention that traditional visualization techniques and tools are typically designed for a single user interacting with a visualization application on a typical computing device, not to be used in-groups and on novel technological artifacts. For all these reasons, collaborative visualization seems to be a growing area of research end even with more speed in future.

While collaborative visualization benefits from work in other disciplines, there are many challenges, aspects, and issues that are unique to the intersection of collaborative work and visualization which are supposed to be handled in this research area. There are several definitions for collaborative visualization:

There’s an old definition for it which emphasizes the goal of researching in this area:
“Collaborative visualization enhances the traditional visualization by bringing together many experts so that each can contribute toward the common goal of the understanding of the object, phenomenon, or data under investigation.”
But recently, the term social data analysis has been noticed to talk about the social interaction that is a central part of collaborative visualization:
“[Social data analysis is] a version of exploratory data analysis that relies on social interaction as source of inspiration and motivation.”

The authors of the article define collaborative visualization as this:

Collaborative visualization is the shared use of computer-supported, (interactive,) visual representations of data by more than one person with the common goal of contribution to joint information processing activities.

which is derived from a general definition for visualization as:

The use of computer- supported, interactive, visual representations of data to amplify cognition.

 

2-cove

Then it provides some important application scenarios of collaborative visualization such as its usage on the web to provide visualization for large amount of data for many users and its applications in scientific research, command and control, environmental planning and mission planning.

In future, we will have more data collected from more variety of sources. We will have more people willing to view and analyze this data simultaneously and they will want to access it through more novel devices. We need to present this data in a manner which is more summarized and more understandable to them.

If you need more information about collaborative visualization, reading this web page could be useful:
http://yon.ir/RIuv