Browse Definitions :
Definition

data science platform

A data science platform is software that includes a variety of technologies for machine learning and other advanced analytics uses. It enables data scientists to plan strategy, uncover actionable insights from data and communicate those insights throughout an enterprise within a single environment.

Typically, data science projects involve a number of disparate tools designed for each step of the data modeling process. That's why it's important to have a centralized location so data science teams can collaborate on those projects.

To enable data-driven business decisions, enterprises are investing in data science platforms and advanced analytics capabilities. A single, integrated platform can lead to better results and therefore a greater business value.

Data science platforms offer flexible and collaborative environments, enabling organizations to incorporate data-driven decisions into operational and customer-facing systems to enhance business outcomes and improve the customer experience.

Capabilities of data science platforms

The best data science platforms provide the scalability of elastic compute resources and the flexibility of open source tools. The most popular data science tools are continually changing, so it's critical that a data science platform keep up with these changes.

A good data science platform will also incorporate best practices that have been developed and refined over years of software engineering. One of those best practices is Version control, which enables a data science team to collaborate on projects without losing the work that has already been done. Additionally, a quality data science platform will align with any type of data architecture.

To facilitate better collaboration among data scientists, a data science platform also:

  • Encourages people to work together on a model from conception to final development and also provides each team member with self-service access to data and resources.
  • Ensures that all of the users' contributions -- including data visualizations, data models and code libraries -- are kept in a shared location that's accessible to the entire team. This enables the data scientists to hold better discussions around research projects, share best practices and reuse code, making data science repeatable and easily scalable.
  • Ensures that data scientists move analytical models into production without requiring help from DevOps. Additionally, a data science platform ensures that the data models are available behind an application programming interface (API) so the data scientists don't always have to ask engineers for assistance.
  • Helps data scientists offload low-value tasks, such as reproducing past results, running reports, scheduling jobs and configuring environments for non-technical users.
  • Enables new hires to start working quickly because a centralized platform makes it easier to preserve the work of the people who leave.
  • Allows a data scientist to use any desired tool or package without disturbing the work of the rest of the team.
  • Easily scales out compute resources so the data scientist can run experiments that demand a lot of computation.
  • Offers a cost-efficient and scalable storage layer that can consume huge amounts of data at a high rate, quickly extract the relevant pieces of data, support data sharing and bring together disparate datasets so they can be used in a single application.
  • Enables all stakeholders to view the results of the work via dashboards and static reports. The platform should also be able to retrain models based on direct feedback from the business person who needs to solve a problem.
  • Offers tools that enable data scientists to deploy multiple versions of the same model for testing as well as tools that monitor the health of their models.
  • Supports compute engines and multiple analysis techniques that are working together at the same time in the same platform.
This was last updated in April 2019

Continue Reading About data science platform

SearchCompliance
  • ISO 31000 Risk Management

    The ISO 31000 Risk Management framework is an international standard that provides businesses with guidelines and principles for ...

  • pure risk

    Pure risk refers to risks that are beyond human control and result in a loss or no loss with no possibility of financial gain.

  • risk reporting

    Risk reporting is a method of identifying risks tied to or potentially impacting an organization's business processes.

SearchSecurity
  • Pretty Good Privacy (PGP)

    Pretty Good Privacy or PGP was a popular program used to encrypt and decrypt email over the internet, as well as authenticate ...

  • email security

    Email security is the process of ensuring the availability, integrity and authenticity of email communications by protecting ...

  • Blowfish

    Blowfish is a variable-length, symmetric, 64-bit block cipher.

SearchHealthIT
SearchDisasterRecovery
  • What is risk mitigation?

    Risk mitigation is a strategy to prepare for and lessen the effects of threats faced by a business.

  • fault-tolerant

    Fault-tolerant technology is a capability of a computer system, electronic system or network to deliver uninterrupted service, ...

  • synchronous replication

    Synchronous replication is the process of copying data over a storage area network, local area network or wide area network so ...

SearchStorage
  • direct access

    In computer storage, direct access is the process of reading and writing data on a storage device by going directly to where the ...

  • kibi, mebi, gibi, tebi, pebi and exbi

    Kibi, mebi, gibi, tebi, pebi and exbi are binary prefix multipliers that, in 1998, were approved as a standard by the ...

  • holographic storage (holostorage)

    Holographic storage is computer storage that uses laser beams to store computer-generated data in three dimensions.

Close