Ellen Harvey, BookBusiness
MIT Press director of technology Bill Trippe explains how the online cognitive science resource MIT CogNet was built and how the platform has become a blueprint for future products.
Developing a proprietary reading platform is no easy task, but it can prove quite lucrative if done right. MIT Press’s recently updated MIT CogNet is one example. Over 10 years in the making, the platform offers researchers access to over 700 books, 6 journals, and 13 reference works in the cognitive sciences, and sells annual subscriptions to institutions and some individual researchers for up to $3,658. Over 100 academic institutions subscribe to the platform, and MIT Press has every intention of expansion, says director of technology Bill Trippe.
Director of Technology,
MIT Press’s work on CogNet is paying off in other ways, says Trippe. The bones of the platform — software which MIT Press developed itself in the early 2000s — is being repurposed for new subscription products that serve other disciplines in which MIT Press publishes. With an established XML-first workflow and a conversion partner for more complex titles, Trippe anticipates new online subscription products can be rolled out to users in less than a year’s time.
Here Trippe explains how CogNet’s latest update was executed and plans for future online products.
What new workflows or processes did you have to put in place to create this CogNet update?
We actually converted everything to XML. Then, for display purposes, the user can view the content either in PDF or in HTML that’s rendered on the fly from the XML. Most of these users are undergraduates, graduate students, and professors. The PDFs still are the preferred reading format for a lot of these folks.
For the actual conversion, we used Data Conversion Laboratory for the most complex books, the books that have a lot of math, a lot of figures, a lot of tables. Easier books — long form, narrative, non-fiction books — we converted ourselves. For the newer books, we have an in-house XML workflow. Those go through our workflow and we’re able to load those directly into the platform.
Was MIT Press’s XML workflow created for CogNet or was that something the press already had in place before taking on this project?
Some of this predates me, but the press has been working on an early XML workflow for seven or eight years. It predated the work on the new CogNet. At this point, we produce about 200 new books per year. About 80% of our book titles come through the XML workflow, and we produce both print and digital out of the XML workflow.
The other 20%, they’re more complicated, more mathematically oriented. We will typically outsource the composition and conversion of those rather than burdening the staff with that.
How profitable has this platform been for MIT Press since launching? Has it opened up any new business opportunities for the press?
It’s a great slice of our business. We sell print books and journals. We have a healthy digital program overall. Having CogNet is a really nice third leg of the stool. It enables us to stand up a collection of content, sell it on a subscription basis, and have the recurring revenue from that. We have a very high retention rate of subscribers as well.
The other thing that we’ve done is that we built CogNet as both a new product, and as a platform. We did all the software development ourselves. We built CogNet in Drupal 7, and the intent from the beginning has been to take the core platform, and then populate it with other disciplines. We’re soon to announce basically the second of these products in the discipline of what’s called art, science, and technology. Picture someone with a PhD in computer science, who also does installation art, or computer-generated music. This new platform, which is going to be called Arteca, is going to be three journals and about 200 books in this field of art, science, and technology. We’re probably going to announce a third one after that. That’s still in the planning stage, but this has been the intent all along.
What’s the timeframe for rolling out these new products?
We’re actually pretty confident that Arteca will go into beta in under six months, which contrasts quite a bit with the work that we had to put into to build the platform and stand CogNet up. I think once we make the decision to actually launch a third collection, we could probably also do that in less than a year. There are a couple of reasons we can do this so quickly. One is, we’ve been at this for a while; we have a workflow. I think we do a good job in terms of managing our digital assets. Then we have a partner that can do these types of projects at an enormous scale.
What future plans do you have for CogNet and Arteca?
One of the really big drivers for us, even in the next few months is going to be to include more multimedia and more data. The multimedia is pretty obvious — podcasts, accompanying videos.
The data side is more complex. Data publishing is starting to become really important, especially in the sciences. The imperative now, for a variety reasons going all the way back to the government agencies that are funding this research, is that the data is made available with the published article. So a journal in neuroscience that has data sets of things like a collection of MRIs needs to include that in the material.
It sounds easy, but the devil is in the details. First of all, the files can be massive. Secondly, the files can be opaque. If it’s a custom format, you might need a specific reader or piece of software to interpret that. We really are going to have to work very hard.
We’ll tackle the multimedia aspect of it first, probably with podcasts and with video, in that order. Then we’re going to look at these data publishing requirements. I would say within a year we will have added all of that.
What advice do you have for other publishers looking to launch an online reading platform like CogNet?
Some of it is general advice about software development overall, which is don’t boil the ocean. We focused very much on the specific use cases of this first audience of cognitive science researchers and what they needed to get out of our content: Really good search, really good rendering, really good navigation, and making it very easy for them to harvest the content that they know they want. By focusing on those use cases, we were then able to create a pretty rational development plan for ourselves.
Also it’s important to have the right vendor who understands your systems. If you have to spend a lot of time in Q&A and educating your partner on the nuances of cognitive science, for example, the hidden costs can really spike. And it can slow the whole process down. We just couldn’t take on a ton of cleanup and extra Q&A. We needed the conversion process to be really efficient.