
- Is Open Source ideal for AI use in business?
- The IBM AI ladder is a multi-step information architecture
- Data and digital transformation
- Cloud Pak for Data as a Service
- Architecture of Cloud Pak for Data as a Service
- Watson Studio Tools
- Let’s subscribe to Watson Studio!
- Starting a new project in CP4Daas
- Data upload
- Data analysis
- Bringing models into production
- Conclusions
- Let’s dive into code!
Is Open Source ideal for AI use in business?
By the time you leave university, you will already be using many tools and platforms from the open-source environment. Open Source offers an infinite number of tools, essentially free of cost. Communities usually support these tools with many potentials, so your future development needs are fully covered.
This is good for research or study purposes, but the number of tools and platforms becomes much lower if you move into production.
The cost-free aspect of the platform is not the most important feature when you reach the production phase. At that point, you begin to understand that there is a gap between the learning universe and the working universe, but it may still seem that your open-source environment should be enough for all the phases of production.
Why make a change from something that already works, and that’s open, to a closed environment you don’t know yet?
There are many good reasons to make this change. In production, you need security, governance, and integration to establish your professionalism with the customer. Your company needs to manage data – at least, many companies do, though few use data properly.
Appropriate tools are needed to manage real-world data complexity – IBM has proprietary tools to help you achieve this.
IBM bridges the gap. The company offers an open source-style starting point, adding tools and services that allow you to reach production.
You can continue to use most of your preferred open-source tools, but you also have an easy-to-use environment in which to share your model with all the data stakeholders, including business experts, and easy-to-use production tools that allow you to deliver a working solution that’s robust, secure and easy to maintain and update.
We need to make a change now
The whole planet is going through a transformation that’s being realised partially through technology and partially in ourselves. If you are a developer and want to be part of this big change, consider joining the IBM Call for Code Global Challenge 2021.
The Challenge is a great way to improve Earth’s health and increase the scope of your future. Join the IBM Call for Code Challenge now!
More information will be delivered online during the Data and AI Forum Italy, an event dedicated to data centrality inside the data transformation journey of Italian organizations. Create your free IBM Cloud account here.
The IBM AI ladder is a multi-step information architecture
The proposed IBM architecture brings all the data value chain elements together on the same platform.
Cloud Pak for Data is based on a 4-step model: collect, organize, analyze, and infuse. Some of its services are included in the Watson Studio solution portfolio. You can find more information on the so-called AI ladder by consulting this technical article.
Data and digital transformation
The environment as a whole is rich with modules and options. Data scientists will feel at home in the analysis area. The platform makes it simple to explain the contribution Cloud Pak for Data can make to the production process to those whose expertise is in business, rather than data.
All components can be found on a single, consistent platform, and any contribution can be considered. The businessman can understand the data analyst, and most technical developers can complete the full cycle. Many languages can be used simultaneously (Python, Scala, R…), so every contribution is added to the available tools. The user interface is easy – it will become familiar in minutes – and has different levels: viewer, developer, or analyst – each has a place within the platform.
This set-up means that users can have the best of all worlds and can clearly understand the implications of each contribution: the security expert contributes his experience in a way that can easily be understood by other technical experts, for whom security may not be their strongest suit. The same applies to governance and other relevant aspects.
All of this expertise is available to all team members and will be ready and waiting when a problem arises.
Cloud Pak for Data as a Service
Cloud Pak for Data is a comprehensive platform that houses many services, Watson Studio among them.
Cloud Pak for Data provides users with an integrated set of capabilities for collecting and organizing data into a trusted, unified view and the ability to create and scale AI models across your business.
Cloud Pak for Data as a Service includes these features:
- Streamlined administration:
- No installation, management, or updating of software or hardware
- Easy to scale up or down
- Secure and compliant
- A subscription with a single monthly bill
- Integrated experience for working with data:
- Connect to and catalog data sources on any Cloud
- Provision, populate, and use a governed data lake
- Run an end-to-end data science lifecycle
- Access AI services to transform customer interactions
Architecture of Cloud Pak for Data as a Service
Cloud Pak for Data as a Service provides a single, unified interface for a set of core services and their related services.
With Cloud Pak for Data as a Service, you can create these types of services from the integrated services catalogue:
- Core services to govern data, analyze data, run and deploy models
- Services that supplement core services by adding tools or computation power
- IBM Cloud database services to store data for use in the platform
- Watson OpenScale, Watson Assistant, and other Watson services that have their own UIs or provide APIs for analyzing data
The sample gallery provides data assets, notebooks, and projects. Sample data assets and notebooks provide examples of data science and machine learning code. Sample projects contain a set of assets and detailed instructions on how to solve a particular business problem.
Integrations with other Cloud platforms can be configured so that users can easily create connections to data sources on external platforms.
Users can create connections to other Cloud data sources or on-premises databases to work with data without moving it.
This illustration shows the functionality included in the common platform, the core services, and the supplementary services.
The following functionality is provided by the platform:
- Administration at the account level, including user management and billing
- Storage for projects, catalogs, and deployment spaces in IBM Cloud Object Storage
- Global search for assets and artifacts across the platform
- Platform assets catalog for sharing connections across the platform
- Role-based user management within collaborative workspaces across the platform
- Common infrastructure for assets, projects, catalogs, and deployment spaces
- A services catalog for provisioning additional service instances
Watson Studio provides the following types of functionality in projects:
- Tools to prepare, analyze, and visualize data, and build models
- Environment definitions to provide compute resources
Watson Machine Learning provides the following functionality:
- Tools to build models in projects
- Tools to deploy models and manage deployed models in deployment spaces
- Environment definitions to provide compute resources
Watson Knowledge Catalog provides the following functionality:
- Catalogs to share assets
- Governance artifacts to control and enrich catalog assets
- Categories to organize governance artifacts
- Tools to prepare data in projects
Watson Studio Tools
This slide provides just a hint of the scope of Watson Studio Tools. The starting webpage is located here. As a data scientist, you may work on a project where the quantity of data is impossible to manage with traditional Python libraries such as Pandas.
Cloud services give you a data machine that can work in parallel on data: Spark, a solution integrated into the Watson Studio version on the Cloud, will help you do this. Using Spark, your solution can be executed in a limited time, even if the same data set would require a very long processing time if you used normal Python tools.
Let’s subscribe to Watson Studio!
The cloud service page has a header. If you start from Watson Studio and use only one service, you will see the header. If you add any other Cloud Pak for Data service, the page header becomes your own Cloud Pak for Data.
You can go to this page to join Watson Studio for free. The page automatically appears in your language area.
Access many IBM services including Watson Studio here.
Starting a new project in CP4Daas
GitHub feels like a good place to be right now. There is a shortlist of direct commands to be performed:
- Enter the project;
- Enter the GitHub page
- Create a repository
- Create a data asset
- Create a data set
- Create models.
At this point, it becomes clear that you are no longer in the GitHub-based world but have joined a larger world that will help you to develop the best solution.
As a technician, I access CP4D in the overview mode. I see roles and tools for:
- Roles
- Governance
I am the admin, and can add new people:
- Admin
- Editor for code
- Viewer to see things
- Write a readme file or diary.
The asset list can be shown. Environments are among these assets.
You can easily develop your model, making the best choices for your project in the process. This means that you can show a complete project to any project stakeholders, from team members to clients or prospects, and exploit many options.
The light plan is on the CP4D page. You are given a certain quantity of resources through a certain number of tokens, called Containerized Housing Units or CHUs. You have a limited amount of CHUs in your free plan: you can buy more, should you need them.
Data upload
It is now time to upload data to test your model.
You have a list of data assets. You have to share one of these datasets.
This is a straightforward task to achieve; simply drag and drop on the column on the far right hand side, named “Data”. You can also upload datasets from the desktop of your computer.
The Data column has three tags: “Load”, “Files”, and “Catalog”.
Being competitive with code is important these days, but the specific language or code used can change over time. Many of the tasks currently assigned to Python coding will be accomplished using Scala in the near future. Large quantities of tasks will be coded automatically for security and integration purposes. Rich platforms will become the choice of reference.
Returning to data, in this example we select the file named Bankruptcy in CSV data. You can see the content without opening the programming notebook. This is one of the simplified options IBM Cloud Pak offers. Many other tasks are performed directly, to avoid opening a notebook just for these small operations.
CSV data are categorized without giving commands inside a notebook.
You can easily access a lot of information, such as data types (string, not numbers).
You can then create a job, something that can be relaunched as a function: you see this in action here.
You can have access to many of these data options even if you are not a skilled Python programmer. In most cases, expertise is unnecessary and basic programming knowledge is enough.
What’s more important is that a skilled person in a specific field – such as finance in our example – can make an important contribution. We are dealing with financial data in our “bankruptcy” file example, so an accountant can easily discover all the incorrect data categories a programmer might overlook.
Data analysis
Data pre-processing can be done in a few simple steps. We’re not dealing with this subject here, so let’s assume it’s all ready to go.
Now you must create your notebook.
You are still on the Asset page in the Cloud Pak for the data framework. By clicking on the blue button, the “Choose asset type” window appears.
The first thing to think about is the notebook option list. Somebody in the team has already worked on the notebook for this example. You can see all the specifications now.
Keep the processing costs in mind. Previews consume no CHUs, so you can get free advice from all data stakeholders.
You can also load models, monitor them, and even delete some of them without ever leaving the notebook – a really important feature.
Looking at the proposed code inside the notebook, you can perform some important jobs, such as checking what kind of libraries are used and in which version (in Python this can easily lead to confusion!).
In this example, we see that scikit-learn framework has been used, and the related version is 0.23. Then, you have to check all possible compatibility issues.
The scikit-learn framework, version 0.23, has been used in this example.
Some lines of data can be automatically generated while importing data or files.
The proposed lines of code are written automatically. The best solution is to load all rules onto an automatic coder that will write a compatible and secure code.
You can also take a previous handwritten code with a different framework or library, analyse it, and let the automatic code generator write the missing code lines, which will allow the old code to run on the new notebook executed inside Watson Studio or Cloud Pak for Data.
Bringing models into production
Now it’s time to input the API key (generated automatically from the IBM Cloud website).
Specify a resource location: Frankfurt, in this case.
You can now manage the contents of this model. If it performs well, you can easily manage the environment by bringing it to the “Deployment Space” area, a repository of models.
To recap our earlier actions, we analyzed the CSV file named Bankruptcy and created a related model. This model has been named XBG_Bankruptcy and has been loaded inside the deployment space. It is now ready to be used through API calls.
The deployment space gives us many different ways to call it at a glance: a cUrl, Java, Javascript, Python, and Scala code. We have to compile it with the API key, add the model specifications, and the model is ready!
Conclusions
Only a data scientist understands all the modeling phases when open source tools are being used. Value nonetheless increases if you can engage business people through a clear process.
Security, scalability, and governance are problems to be solved. Thanks to the CPDaaS platform, and all the Watson Studio suite tools, all issues are tackled without dealing with a specific ICT, therefore dealing with the direct components only.
Let’s dive into code!
As previously stated, more information will be delivered online during the Data and AI Forum Italy. This event is dedicated to data centrality inside the data transformation journey of Italian organizations.
Above all, let me draw your attention once again to the IBM Call for Code Challenge. This Challenge is a great way to contribute to improving the planet’s health and to broaden the scope of your future. Create your free IBM Cloud account here to discover more.