Thoughts Toward a Giant EduGraph

Project(s): 

In my recent Talking With Talis podcast about Dave Pattern's release of library data, I floated out the term "Giant EduGraph" to describe some of the kinds of aggregations of data we were talking about. The term has actually been bouncing around my head for a while, enough that I've included it in a few conference proposals.

It's about time, then, that I start to think through more details of what I actually mean. I'd also like to address some of Richard's question about the state of the standards for making something like it happen.

This sketch roughly lays out the component areas of data I imagine being part of a Giant Edugraph. The heart of it, of course, is data about ideas and subjects, people, and the resources used in teaching and learning. Another essential element is to keep the approach person-centric, making the data focus around people and groups of people (more on that below in the distinction between a course and a course group).



(Click for larger image)

Ideas and Subjects
Here's the pile of data about subjects people are studying, as well as similarly intangible things like teaching methods and theoretical approaches.
People
This covers both the people in the classrooms (both students and teachers), and the people that are indirectly part of a class like the authors of books, journals, or other readings, or the artists whose work is studied, or the historical figures discussed in class, etc. Notice that, especially when looking across institutions, a person who is directly a part of a class might very well be indirectly part of another class, as when a class at one school reads a book or article by a professor at another institution.
Teaching and Learning Resources and Tools
This covers the usual suspects of books and other materials from the bookstore, resources in the library, and online resources. The "tools" part covers also the technologies and devices used in the course of study: things like blogs, GPS devices, paint, electron microscopes, etc.

In the nine domains that surround those three core aspects, there are obviously overlap and cross-connections: people show up in course groups and organizations, ideas show up in events and online spaces. Any effort at a Venn diagram would probably be self-defeating. When we get to looking at example data, the relationships should come out clearly.

Let me break out a little of what I have in mind for each domain, starting with what might be least intuitive, the distinction betwee a "Course" and a "Course Group". Part of the problem is the ambiguity of the term, "course". We typically use the word to refer to two very different things: the administrative item that is listed in a course catalog (e.g., English 101), and the actual collection of knowledge, discussion, and ideas by real people in a particular semester (the section of English 101 I'm taking this semester). To get at a model for describing the intellectual life of an institution, those need to be kept distinct. And so in the diagram a "Course" is the adminstrative item. The other thing is, at heart, a group of people. So a "Course Group" is a group of people studying and thinking together for a term, after which they get credit for the "Course".

Campus/Learning Places
Usually, a campus place is just the straightforward location where a course group meets. But in the case of studying abroad or of field work, the learning place isn't actually on campus, so I'm calling those learning places.
Events and Lectures
Straightforward here -- just the events and lectures taking place on campus, including things like conferences, speeches, sporting events, etc. When we get down to the nitty-gritty, we'll want to distinguish live and virtual lectures, but for this view from high above, we don't need to worry about that.
Institutions
Yep, top-level institution. But also the sub-institutions within one. Here again, our natural language introduces ambiguity for describing this cleanly. For example, there's "college" as in "College of William and Mary" and there's "college" as in "College of Arts and Sciences" within a particular university. Then add in the various centers, institutes, etc. When we zoom in to the details we'll be poked by the devils lurking therein, but for now we can go with the loose intuitive notion.
Requirements and Credentials
I'm trying here to distinguish between what are really prerequisites, like English 101 is a prerequisite of English 102, and a broader requirement for graduation. I'm calling the latter a graduation credential, and often these are aggregates that reflect different journeys through one's education. For example, start at the top with the graduation credential "Completion of Major". One person will fullfil that credential with a major in math, another with a major in art. The former will need to complete a credential in the math department, "Calculus Sequence", the latter will not.
Library Data
In part, this simply reflects the library's holdings. But I also have in mind what Dave Pattern did with the University of Huddersfield Library's circulation data--providing anonymize information about usage patterns.
Clubs and Organizations
The student groups that are active in an institution.
Online Spaces
I'm using "Spaces" to generically cover all the online environments where students interact, including blogs, forums, wikis as well as services like Second Life, Twitter, Flickr, YouTube, etc.

So, where are we in terms of existing RDF vocabularies for representing this?

  • Campus/Learning Places: Geonames already has a surprising amount of this info as linked open data. Places smaller than a building like a room would need to be added.
  • Events and Lectures: The Event Ontology. Natch.
  • Institutions: Talis has produced the Academic Institution Internal Structure Ontology (AIISO). Some work might be needed to extend this to match with US institutional structure.
  • Requirements and Credentials: Here there's a gap. I'm working on a University Ontology that aims to fill it.
  • Library Data: For bibliographic data, we have the Bibliographic Ontology. As libraries move to Functional Requirements of Bibliographic Records (FRBR), we might be able to work with the RDF representation of FRBR. Regarding circulation data, things might develop if when libraries start opening up their data. But for the Huddersfield data, I put terms into openvocab when I RDFized it.
  • Clubs and Organizations: Friend Of A Friend (FOAF). Natch.
  • Online Spaces: Semantically Interlinked Online Communities (SIOC). Natch. Almost all of the data I'm working with at Semantic UMW is in SIOC and FOAF.
  • Courses: AIISO works well here, too, with the same caveat about matching it up with US structures and roles.
  • Course Groups: FOAF again.

We also have Simple Knowledge Organization Scheme (SKOS), Tags, and Dublin Core to work with, among others. The Linked Data cloud, and especially DBpedia at its center, is ready to go a long way toward providing the needed URIs to represent ideas, subjects, and resources.

So, at least for the individual elements, the needed standards are in place, and in many cases quite mature. Because my vision of a Giant EduGraph is clearly person-centric -- that's why I distinguish Courses and Course Groups -- the use of FOAF and SIOC will be extensive. Stitching them together is the next step. Talis has also produced a Resource List Ontology for relating courses as represented in AIISO with the materials they use. It does not distinguish between Courses and Course Groups, and so does not offer the person-centric approach I'm advocating but I do not think that will cause difficulties. The University Ontology I'm working on is aiming toward providing that stitching.

Great, but the next question is usually, "But who's going to input all that data?" It's an issue, but perhaps not so big a one as is often thought. First, a lot of data about people and groups can be scraped out of blogs and wikis. Second, much of the course data is available in the online catalogs and lists of course offerings. With some clever URI-minting algorithms, those pages (along with pages about individual faculty members) can be used to scrape out information about courses, course groups, deparments, campus locations, requirements, credentials, and more. Third, plenty of social sites, as well as Wikipedia, demonstrate that people are quite happy to contribute information when they see the value that comes from doing so (and, yes, if the interface is easy -- admittedly, this is a Semantic Web sticking point so far).

So let's take a stab at looking at how this kind of data might be valuable. This will be a quick one, but I'll be writing up more use cases in the future.

Say the University of the Semantic Web is an institution with a campus in Washington, D.C. One course they offer is Library Studies 301: The Web. The particular emphasis of one professor is information design, and some of the work by Edward Tufte is on the reading list. As it happens, during the Spring 2009 semester Edward Tufte will be giving a lecture on campus, and attendance is required of the students in section 1 of the course that semester.

So far, we have an institution, a course, a course group, and an event. Additional info about where the campus is, where the event is, a topic for both the event and the course group is also available.

Now, let's throw in Alice, a part-time paralegal who is considering going back to school to get an advanced degree in Library Studies. She is especially interested in information design. She's also new to DC.

If the information we have so far is in a Giant EduGraph, we'd have something like this:



(Click for larger image)

There's a convergence around a subject, Information Design. There's another convergence around places. USW is in DC, and the meeting place of the course that semester is in DC, and the lecture takes place at another location in DC. A third convergence happens around the person, Edward Tufte.

How can a Giant EduGraph be helpful here?

Let's imagine -- okay, I know this might be a stretch, but run with me for a bit -- let's imagine that we've figured out a wonderful interface onto a Giant EduGraph. Alice searches in it for events in DC about "Information Design". She discovers the lecture, and hits the "Expanded Information" view, which tells her that students from that course group will be attending. That, in turn, leads her to more information about the course and the institution.

Or a different tack. Alice searches in it for institutions where they study Edward Tufte. Here, the "Expanded Information" view tells her that a particular course group at USW studies him, and that they will be attending his upcoming lecture.

Or. Alice has heard of this upstart university called the University of the Semantic Web. She's quite the social butterfly, so she asks for events associated with it. The "Expanded Information" view now tells her about the lecture, and that students from that particular course group will be attending.

Oh. And because the data also links the course group to where they are blogging, all those approaches can lead her to their posts, where she can get a sense of who these students (potentially classmates) are from their writing.

So she attends the lecture. Because some of the members of the course group have additional info out in their FOAF files (like Flickr accounts and pictures), she recognizes the professor and some of the students and starts a conversation.

Reality Check

How far away from this are we? FOAF and SIOC offer all the terms we need for the personal and social bits, including interests. Geonames has a good chunk of the mappable data about locations, so Alice can find the lecture. DBpedia currently has entries (URIs) for "Information Design", "Washington, D.C.", many buildings in DC, and "Edward Tufte". All have plenty of other info allowing more meshups of data. AIISO can probably handle most, if not all, of this scenario. I'm attacking the missing bits around course groups in the University Ontology. As I said above, the Event Ontology is ready to handle info about the lecture itself.

The first big missing thing is a really good approach to interfaces on this. That "Expanded Information" view I mentioned? I have no idea what it looks like. I think it will have to be different from approaches we have now -- information possibilities are getting more complicated, and user interfaces will have to get more complicated. That calls on people to be able to deal with more complicated interfaces. That's fine and nothing new. How complicated was the card catalog, compared to what we use now? We adjusted to that just fine, we'll adjust to new interfaces again.

And, obviously, the application that pulls all this info together and also makes it easy and happy to add the info just isn't here. But it'll come. There are too many possibilities and benefits to higher education -- and to those who participate in higher education -- for those apps not to appear. They will explode the possibilities for making social, intellectual, and academic connections. They will also attack problems that higher education hasn't been able to deal with yet. There will be plenty of cultural and legal hurdles to surmount as we also address some technical and interface-related issues. But we'll get there.

Trackback URL for this post:

http://www.patrickgmj.net/trackback/154

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Hi Patrick,
I wondered if you'd seen the work, funded by JISC here in the UK, around XCRI (eXtended Course Related Information). Although a specification for the XML interchange of course information, I wondered what parallels there are with your own Universities ontology and if there could be a mapping between them. XCRI seem to have some good early adoption amongst UK Universities, including Oxford and the Open University (http://www.xcri.org/Adopters.html).
Also you mention some extensions to the AIISO ontology, developed by Talis, we'd be interested in further discussion around how you would see this evolving to support US institutions.
Thanks,
Chris

Chris,

Wow...I hadn't seen that. Many thanks for pointing this out!

It looks like the parallels are quite close and many mappings would be straightforward. Course/Presentation in XCRI is the same Course/Course Group distinction I'm working with, for example, and it looks like many of the properties on Presentation are compatible with how I'm imagining Course Groups, with the differences seeming to just reflect the different data models of RDF and XML. For example, the property "placesAvailable", especially if used with a machine-readable value, looks like the univ:basedAt property I've been using, which has a range univ:Campus. And some of the properties I see in XCRI, like attendanceMode, I realize I should have an equivalent for.

Some mappings would be a little complex, but hopefully doable. For example, I've been using a class Term so that a university has a resource for start and end dates, and the course group info points to that. The university defines and publishes a Term with start and end date triples:

ex:Spring2009 a univ:Term ;
univ:startDate "2009-01-15" ;
univ:endDate "2009-05-15" .

ex:Eng101Sect1-09 a univ:CourseGroup ;
univ:Term ex:Spring2009 .

Mapping XCRI start and end data onto this model would require looking up the right Term. Then again, maybe I need to rethink that system to make it more flexible?

The example above about univ:basedAt might be complex for a similar reason, and it's one that gets to the question about AIISO and US schools. (It's also something that I'm struggling with now, and is completely wrong in the current draft of the ontology!) I'm calling a univ:Campus a subclass of aiiso:OrganizationalUnit, so it is not itself a physical place, but it will have a property univ:city to make that association. I don't know what's typical for UK institutions, but US institutions are increasingly creating campuses in remote locations, sometimes the other side of town, sometimes overseas. The courses and faculty across campuses might or might not overlap. So I think that for US schools that -- or some other subclass of aiiso:OrganizationalUnit -- would be needed in the model.

Yikes! comment running to post-length! Thanks . . . this is still a work in progress, and the XCRI reference helps a lot. And I'd love to talk more about AIISO, since I certainly don't want to redo work that Talis has already done!

Patrick

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options