AI/MLCXOsDevelopersNewsOpen Source

IBM Data Asset eXchange Adds New Data Sets


IBM Data Asset eXchange (DAX), an online hub for developers and data scientists to find free and open data sets under open data licenses, has announced a host of new data-related assets and user experience enhancements.

The newly added TensorFlow Speech Commands data set, for instance, contains over 65,000 short audio clips of 30 common spoken English words while the WikiText-103 data set features over 100 million text tokens extracted from the set of verified ‘Good’ and ‘Featured’ articles on Wikipedia.

The Oil Reservoir Simulations data set consists of 60,000 physics-based simulated oil reservoirs generated by IBM researchers.

Moreover, the Wikipedia Entity Graph data set developed by IBM Research consists of a knowledge graph of entities from Wikipedia where each entity is supplemented by a context document that represents all the contexts in which the entity appears on Wikipedia.

Additionally, the Mono Lake Surface Water Extent Landsat8 data set contains Landsat8 satellite imagery data that was post-processed by researchers from IBM Research to measure surface water extent information for the Mono Lake during the time period of 2013-04-18 to 2019-12-31.

The Taranaki Basin Curated Well Logs data set consists of a curated set of 407 oil wells located along the western coast of New Zealand.

The SimpleQuestions Relation Detection and WebQSP Relation Detection data sets are sets of entity relation annotations generated by IBM Research from underlying question-and-answering data sets.

For existing data sets, IBM Data Asset eXchange has added seven new Watson Studio notebooks as well as three Watson Studio projects (a new class of data assets that package multiple notebooks together).

Along with these notebooks, DAX has added eight new data sets to the exchange, featuring domains such as oil extraction, remote sensing, and speech recognition.

Since launching the exchange in 2019, the Center for Open-Source Data & AI Technologies (CODAIT) team has been working on adding new data sets to the exchange, as well as resources that help explore these data sets. Next, CODAIT plans to improve the way DAX displays data set previews.