Monday, 31 August 2020

Custom Distribution Service : Phase 3 Blogpost

Hello everyone,

This is the final blog post for the Custom Distribution Service project during the Google Summer of Code timeline. I have mixed feelings since we are almost near the finish line for one of the most amazing open source programs out there. However, it is time to wrap things up for this project and achieve a state where the project can be built upon and extended further. This phase has been super busy with respect to the bug fixes, testing and getting the project hosted, so let us get straight into the phase 3 updates.

Fixes and Code quality assurance

Set Jenkinsfile agent to linux

We realised that the build was failing on windows and that there was not really a use-case for running it on windows for right now. Maybe it could be on a future roadmap. Therefore, we decided to shift the testing to only linux agents with respect to running the tests on the jenkins server.

Backend port error message

Spring boot has a default message on the port:8080 and therefore we wanted to change it to a custom message on the backend. So the major takeaway here is that we needed to implement the Error Controller interface and include a custom message in it. This was technical debt from the last phase and was completed and merged during this phase.

  • Pull Request #92

PMD Analysis

In order to enhance the quality of the code, the PMD source code analyser was applied to the project. It helped me catch tons of errors. When the initial PMD check was run and we found approximately 162 PMD errors. We realised some of them were not relevant and some of them could be fixed later.

Findbugs Analysis

Another tool to improve code quality that we included in this phase was findbugs. It did catch around 5-10 bugs in my code which I immediately resolved. Most of them were around the Closeable HTTP Request and an easy fix was the try with resources.

Jacoco Code Coverage

We needed to make sure most of the code we write had proper coverage for all branches and lines. Therefore we decided to include a JaCoco Code Coverage reporter that helped us find the uncovered lines and areas we need to improve coverage on.

Remove JCasC generation

While developing the service we quickly realised that the generation of the war package broke if we included a configuration as code section but did not provide a path to the corresponding required yml file. Therefore we took a decision to remove the casc section all together. Maybe it will comeback in a future patch

  • Pull Request link: #127

  • Issue link: #65

Minor Fixes

  • Logging Fix: #99

  • Docs Fix : link: #120

  • Update Center Dump Fix : link: #125

  • Class Path Fix: link: #126

  • Release Drafter Addition: link: #136

Front end

There was no community configuration link present for navigation which was added here. Now it is easier to navigate to the community page from the home page itself.

Docker updates

Build everything with Docker

This was one of the major changes this phase with respect to making the service very easy to spin up locally, this change will greatly help community adoption since it eliminates the tools one needs to install locally. Initially the process was to run maven locally, generate all of the files and then copy all of its contents into the container. However, with this change we are going to generate all of the files inside the docker container itself. Allowing the user to just run a couple of commands to get the service up and running.

So some of the major changes we did with respect to the dockerfile was:

a) Copy all of the configuration files and pom.xml into the container.

b) Run the command mvn clean package inside the container which generates the jar.

c) Run the jar inside the container.

Hosting updates

This process was supposed to be a future roadmap, however the infra team approved and was super helpful in making this process as smooth as possible. Thanks to Gavin, Tim and Oblak for making this possible. Here is the google group dicussion

The project has now been hosted here as a preview. It still needs some fixes to be fully functional.

  • Infra Docker PR #131

  • Infra Project Addition PR link: #393

Testing Updates

Unit test the services

With respect to community hosting and adoption, testing of the service one of the most important and major milestones for this phase was to test the majority of the code and we have completed the testing with flying colors. All of the services have been completely unit tested, which is a major accomplishment. For the testing of the service we decided to go with wiremock which can be used to mock external services. Kezhi’s comment helped us to understand what we needed to do since he had done something quite similar in his Github Checks API project.

So we basically wiremocked the update-center url and made sure we were getting the accurate response with appropriate control flow logic tested.

wireMockRule.stubFor(get(urlPathMatching("/getUpdateCenter"))
                .willReturn(aResponse()
                        .withStatus(200)
                        .withHeader("Content-Type", "application/json")
                        .withBody(updateCenterBody)));

Add Update Center controller tests

Another major testing change involved testing the controllers. For this we decided to use the wiremock library in java to mock the server response when the controllers were invoked.

For example: If I have a controller that serves in an api called /api/plugin/getPluginList wiremock can be used to stub out its response when the system is under test. So we use something like this to test it out.

 when(updateService.downloadUpdateCenterJSON()).thenReturn(util.convertPayloadToJSON(dummyUpdateBody))

When the particular controller is called the underlying service is mocked and it returns a response according to the one provided by us. To find more details the PR is here.

Add Packager Controller Tests

Along with the update center controller tests another controller that needed to be tested was the packager controller. Also we needed to make sure that all the branches for the controllers were properly tested. Additional details can be found in the PR below.

Docker Compose Tests

One problem that we faced the entire phase was the docker containers. We regularly found out that due to some changes in the codebase the docker container build sometimes broke, or even sometimes the inner api’s seemed to malfunction. In order to counteract that we decided to come up with some tests locally. So what I did was basically introduce a set of bash scripts that would do the following:

a) Build the container using the docker-compose command.

b) Run the container.

c) Test the api’s using the exposed port.

d) Teardown the running containers.

User Documentation

We also included a user docs guide so that it makes it super easy to get started with the service.

Future Roadmap

This has been a super exciting project to work on and I can definitely see this project being built upon and extended in the future.

I would like to talk about some of the features that are left to come in and can be taken up in a future roadmap discussion

a) JCasC Support:

Description: Support the generation of a Jenkins Configuration as Code file asking the user interactively for the plugins they select what would be the configuration they would want eg: If the user selects the slack plugin we need to ask him questions like what is the slack channel? what is the token? etc, and on the basis of this generate a casc file. This feature was initially planned to go into the service but we realised this is a project in its own capacity.

b) Auto Pull Request Creation:

Description: Allow users to create a configuration file and immediately open a pull request on github without leaving the user interface. This was originally planned using a github bot and we started the work on it. But we were in doubt if the service would be hosted or not and therefore put the development on hold. You can find the pull requests here:

  • Github Controller #72

  • Pull Request Creation Functions #66

c) Synergy with Image Controller

Description: This feature requires some planning, some of the questions we can ask are:

a) Can we generate the images (i.e Image Controller). b) Can we have the service as a multipurpose generator ?

Statistics

This phase has been the busiest of all phases and it has involved a lot of work, more than I had initially expected in the phase. Although lines of code added is not an indication of work done, however 800 lines of Code added is a real personal milestone for me.

Pull Requests Opened

26

Lines of Code Added

1096

Lines of Docs Added

200

GitHub Checks API Plugin Project - Coding Phase 3

This blog post is about our phase 3 progress on GitHub Checks API Project, you can find our previous blog posts for phase 1 and phase 2.

At the end of this summer, the GSoC journey for GitHub Checks API Project comes to an end as well. In this blog post, I’ll show you our works during the last month:

  • Pipeline Support

  • Rerun Request Support

  • Git SCM Support

  • Documentation

All the above features will be available in our planned 1.0.0 version of Checks API Plugin and GitHub Checks Plugin.

Coding Phase 3 Demo

Pipeline Support

The pipeline support allows users to directly publish checks in their pipeline script without depending on any other consumers.

Pipeline Checks

The check in the above screenshot is published by script:

publishChecks name: 'pipeline check', title: 'pipeline ', summary: '# A pipeline check example',
        text: "## This check is published through the pipeline script",
        detailsURL: 'https://ci.jenkins.io'

If you want to publish checks to GitHub, please install the GitHub implementation and refer to the GitHub API documentation for the requirements for each field. A default value (build link) for detailsURL will be provided automatically.

This feature can be useful when many stages exist in your pipeline script and each takes a long time: you can publish a check for each stage to keep track of the build.

Rerun Request Support

The rerun request allows GitHub users to rerun the failed builds. When a build failed (which leads to a failed check), a Re-run button will be added automatically by GitHub.

Failed Checks

By clicking the Re-run button, Jenkins will reschedule a build for the last commit of this branch.

Since all checks of a commit are produced by a single build, you don’t have to rerun all failed checks, just rerun any one of the failed check will refresh all checks.

Git SCM Support

Thanks to Ullrich's great help, the GitHub Checks Plugin now supports Git SCM. This means now you can publish checks for your freestyle project or any other projects that use Git SCM.

Document

Consumers Guide and Implementation Guide are now available. As a Jenkins developer, you can now start consuming our API or even providing an implementation for other SCM platforms beside GitHub.

Acknowledgment

The whole GitHub Checks API project is started as a Google Summer of Code project. Much appreciate my mentors (Tim and Ullrich) for their great help during the whole summer. Also huge thanks to the Jenkins GSoC SIG and the whole community for the technique support and resources.

Introducing the Databricks Web Terminal

Introduction

We’re excited to introduce the public preview of the Databricks Web Terminal in the 3.25 platform release. Any user with “Can Attach To” cluster permissions can now use the Web Terminal to interactively run Bash commands on the driver node of their cluster.

The new Databricks web terminal provides a fully interactive shell that supports virtually all command-line programs. The terminal is not intended for running Apache Spark jobs; however, it is a convenient environment for installing native libraries, debugging package management issues, or simply editing a system file inside the container.

New Databricks Web Terminal for running line commands via an interactive shell.

Motivation

Running shell commands has been possible through %sh magic commands in Databricks Notebooks. In addition, in some environments, cluster creators can set up SSH keys at cluster launch time and SSH into the driver container of their cluster. Both these features had limitations for power users. The new web terminal feature is more convenient and powerful than both these existing methods and is now our recommended way of running shell commands on the driver.

We heard from our users that they want a highly interactive shell environment which supports any command-line tools, including popular editors such as Vim or Emacs. They asked for interactive terminal sessions to install arbitrary Linux packages or download files. These were not convenient or possible with %sh magic commands.

SSH would offer an interactive shell, but it is limited to a single user, whose keys are registered on the cluster. Many users share clusters and most of them want the convenience of the interactive shell. In addition, system administrators and security teams are not comfortable with opening the SSH port to their virtual private networks. The web terminal addresses all these limitations. As a user, you do not need to setup SSH keys to get an interactive terminal on a cluster.

How to get started

Web terminal is in Public Preview (AWS|Azure) and disabled by default. Workspace admins can enable the feature (AWS|Azure) through the Advanced tab. After this step, users can launch web terminal sessions on any clusters running Databricks Runtime 7.0 or above if they have “Can Attach To” permission.

There are two ways to open a web terminal on a cluster. You can go to the Apps tab under a cluster’s details page and click on the web terminal button.

There are two ways to open a web terminal on a cluster. You can go to the Apps tab under a cluster’s details page and click on the web terminal button.

Or when inside a notebook, you can click on the Cluster dropdown menu and click the “Terminal” shortcut.

The new Databricks Web Terminal can be accessed via a “Terminal” shortcut inside a notebook’s Cluster dropdown menu.

If you are launching a cluster and you wish to restrict web terminal access on your cluster, you can do so by setting DISABLE_WEB_TERMINAL=true environment variable. Also note that high concurrency clusters (AWS|Azure) with table ACLs (AWS|Azure) or credential passthrough (AWS|Azure) do not allow web terminal access.

Please see our user guide (AWS|Azure) for more details about using the web terminal on Databricks.

Limitations & Future Plans

When a web terminal session is not actively in use for several minutes, it will time out which leads to a new Bash process being created. This can result in losing your active shell session. To avoid this, we recommend managing your sessions with tools like tmux.

High concurrency clusters that have either table access control or credential passthrough enabled, do not support web terminal.

When a user logs out of Databricks or his/her permission is removed from a cluster, their active web terminal sessions are not terminated. Please refer to these security considerations (AWS|Azure) for more details. We are working on addressing these issues before the feature is generally available.

--

Try Databricks for free. Get started today.

The post Introducing the Databricks Web Terminal appeared first on Databricks.

Forty percent of Americans back Trump executive order on TikTok: Reuters/Ipsos poll

By Raphael Satter and Chris Kahn
WASHINGTON/ NEW YORK (Reuters) - Forty percent of Americans back President Donald Trump's threat to ban videosharing app TikTok if it is not sold to a U.S. buyer, according to a Reuters/Ipsos national poll, suggesting that many support the effort to separate the social media upstart from its Chinese parent.
The poll published Monday, which surveyed 1,349 adult respondents across the United States, found that 40% backed Trump's recent executive order forcing China's ByteDance to sell its TikTok operations in the United States by Sept. 15. Thirty percent of the respondents said they opposed the move, while another 30% said they didn't know either way.
The responses were largely split along party lines, and many of those who agreed with Trump's order said they do not know much about TikTok. Among Republicans, for example, 69% said they supported the president's order ...


Read More on Datafloq

Google says Denmark is reviewing its taxes there

COPENHAGEN (Reuters) - Danish tax authorities have initiated a review of Google's accounts in Denmark to determine whether the tech giant has any outstanding tax obligation, the company said on Monday.
Google's Danish unit, Google Denmark Aps, said in its financial report for 2019 that tax authorities had "commenced a review of the open tax years concerning the company's tax position".
The Danish tax authority declined to comment on the review. Denmark's Prime Minister Mette Frederiksen is one of several European leaders who have advocated for multi-national tech companies to pay more tax in countries where they operate.
Google, owned by parent firm Alphabet Inc, employs more than 100 people in Denmark and earned revenue of 284 million Danish crowns ($45.4 million) there last year. It said in the financial statement it had not made any provisions for the tax review.
...


Read More on Datafloq

Apple, Tesla bid up ahead of share split open

(Reuters) - The high-flying shares of Apple Inc and Tesla Inc gained more ground on Monday, ahead of their first official trading following a split into smaller portions that makes it easier for retail investors to own the shares.
It will be Apple's latest stock split since a 7-for-1 move in 2014 and its fifth since going public in 1980.
Splitting stocks is a way for companies to make it less expensive to buy individual shares although moves by some retail brokerages to offer slices or fractions of shares to smaller investors has made the impact increasingly marginal.
Shares of the Cupertino-California-based company, which have rallied nearly 30% since it announced its surprise 4-for-1 stock split and blockbuster quarterly results on July 30, were priced at $126.56, up 1.4% when compared to Friday's split-adjusted close, in pre-market trade.
...


Read More on Datafloq

Understanding the Value of Customer-Centric NPD

New product development is a misleading term. Despite having the word “development” in it, NPD is a multifaceted process with many stages. Essentially, it starts with recognizing an opportunity and ends with delivering a product to the market, so the key steps commonly include:Outlining the strategyGenerating ideasFleshing out the conceptGathering business intelligenceDeveloping the productTesting and refinementAs you can see, the actual development is only a fraction of what you’ll need to do. So, full cycle product development companies are increasingly relying on the study of customer expectations and communicating these expectations to all the involved teams. Fortunately, once established, the customer-centric culture can yield many long-term benefits to business operations.What Makes a Successful NPD?Because of its complexity and the number of people involved, NPD needs to be strategically aligned to achieve success. First, it should be systematic, in that the flow of ideas should be coordinated and promoted through innovation management. This can be anything from using common tools such as product lifecycle management and brainstorming to appointing innovation managers. This approach is necessary for creating a culture of innovation crucial for long-term success.Second, new product development has to be team-centered. That is, the teams involved in the process should ...


Read More on Datafloq

Git Plugin Performance Improvement: Final Phase and Release

Since the beginning of the project, the core value which drove its progress was "To enhance the user experience for running Jenkins jobs by reducing the overall execution time".

To achieve this goal, we laid out a path:

  • Compare the two existing git implementations i.e CliGitAPIImpl and JGitAPIImpl using performance benchmarking

  • Use the results to create a feature which would improve the overall performance of git plugin

  • Also, fix existing user reported performance issues

Let’s take a journey to understand how we’ve built the new features. If you’d like to skip the journey part, you can directly go to the [major performance improvements] section and the [minor performance section] to see what we’ve done!

Journey to release

The project started with deciding to choose a git operation and then trying to compare the performance of that operation by using command line git and then with JGit.

Stage 1: Benchmark results with git fetch

git-fetch-results

  • The performance of git fetch (average execution time/op) is strongly correlated to the size of a repository

  • There exists an inflection point on the scale of repository size after which the nature of JGit performance changes (it starts to degrade)

  • After running multiple benchmarks, it is safe to say that for a large sized repository command line git would be a better choice of implementation.

  • We can use this insight to implement a feature which avoids JGit with large repositories.

Stage 2: Comparing platforms

The project was also concerned that there might be important differences between operating systems. For example, what if command line Git for Windows performed very differently than command line Git on Linux or FreeBSD? Benchmarks were run to compare fetch performance on several platforms.

Running git fetch operation for a 400 MiB sized repository on:

  • AMD64 Microsoft Winders

  • AMD64 FreeBSD

  • IBM PowerPC 64 LE Ubuntu 18

  • IBM System 390 Ubuntu 18

The result of running this experiment is given below:

Performance on multiple platforms

The difference in performance between git and JGit remains constant across all platforms.

Benchmark results on one platform are applicable to all platforms.

Stage 3: Performance of git fetch and repository structure

git repo diagram

The area of the circle enclosing each parameter signifies the strength of the positive correlation between the performance of a git fetch operation and that parameter. From the diagram:

  • Size of the aggregated objects is the dominant player in determining the execution time for a git fetch

  • Number of branches and Number of tags play a similar role but are strongly overshadowed by size of repository

  • Number of commits has a negligible effect on the performance of running git fetch

After running these experiments from Stage-1 to Stage-3, we developed a solution called the GitToolChooser which is explained in the next stage

Stage 4: Faster checkout with Git tool chooser

This feature takes the responsibility of choosing the optimal implementation from the user and hands it to the plugin. It takes the decision of recommending an implementation on the basis of the size of the repository. Here is how it works.

git perf improv

The image above depicts the performance enhancements we have performed over the course of the GSoC project. These improvements have enabled the checkout step to be finished within half of what it used to take earlier in some cases.

Let’s talk about performance improvements in two parts.

Major performance improvements

Major performance enhancements

Building Tensorflow (~800 MiB) using a Jenkins pipeline, there is over 50% reduction in overall time spent in completing a job! The result is consistent multiple platforms.

The reason for such a decrease is the fact that JGit degrades in performance when we are talking about large sized repositories. Since the GitToolChooser is aware of this fact, it chooses to recommend command line git instead which saves the user some time.

Minor performance improvements

Note: Enable JGit before using the new performance features to let GitToolChooser work with more optionsHere’s how

git minor perf

Building the git plugin (~ 20 MiB) using a Jenkins pipeline, there is a drop of a second across all platforms when performance enhancement is enabled. Also, eliminating a redundant fetch reduces unnecessary load on git servers.

The reason for this change is the fact that JGit performs better than command line git for small sized repositories (<50MiB) as an already warmed up JVM favors the native Java implementation.

Releases

The road ahead

  • Support from other branch source plugins

    • Plugins like the GitHub Branch Source Plugin or GitLab Branch Source Plugin need to extend an extension point provided by the git plugin to facilitate the exchange of information related to size of a remote repository hosted by the particular git provider

  • JENKINS-63519: GitToolChooser predicts the wrong implementation

  • Addition of this feature to GitSCMSource

  • Detection of lock related delays accessing the cache directories present on the controller

    • This issue was reported by the plugin maintainer Mark Waite, there is a need to reproduce the issue first and then find a possible solution.

Reaching out

Feel free to reach out to us for any questions or feedback on the project’s Gitter Channel or the Jenkins Developer Mailing list. Report an issue at Jenkins Jira.

Emerging Currency Markets to Shift from Series Models to Deep Learning

Industry analysts agree that artificial intelligence and big data have had a profound effect on the global financial industry. The financial analytics market is projected to reach $11.4 billion within the next three years. This statistic has been cited numerous times since it was first published by MarketsandMarkets. As compelling as this statistic is, it glosses over many of the nuances pertaining to data analytics and artificial intelligence in the financial sector. One discussion that warrants more attention is the growing relevance of deep learning in the currency markets of emerging economies. Deep Learning Shows Promise for Emerging EconomiesA 2017 study published by the Termopil National Economic University in Termopil, Ukraine focused on this emerging topic. The study, titled Deep Learning for Predictions in Emerging Currency Markets talked about the role of artificial intelligence algorithms in currency markets in Africa, Eastern Asia, South America, the Middle East and remote parts of Europe. “Machine learning methods such as shallow neural networks have higher predictive accuracy than time series models when trained on input features carefully crafted by domain knowledge experts. The preponderance of research focuses on developed currency markets. The paucity of research in emerging currency markets, and the crucial role ...


Read More on Datafloq

Sunday, 30 August 2020

Huawei focusing on cloud business which still has access to U.S. chips: FT

(Reuters) - Chinese telecoms equipment maker Huawei Technologies Co Ltd is focusing on its budding cloud business, which still has access to U.S. chips despite sanctions against the company, to secure its survival, the Financial Times newspaper reported.
Huawei's cloud computing business sells computing power and storage to companies, including giving them access to artificial intelligence, and has been growing rapidly, the newspaper reported on Sunday, citing sources.
In January, Huawei put the unit on an equal footing with its smartphones and telecoms equipment businesses, the FT reported https://on.ft.com/3hHJC3Y.
The unit was stepping up its offerings and Beijing will increasingly support the company through public cloud contracts, according to the report.
The administration of U.S. President Donald Trump has restricted technology exports to Chinese companies in particular, notably Huawei, citing national security risks.
...


Read More on Datafloq

China's new tech export controls could give Beijing a say in TikTok sale

BEIJING/SHANGHAI (Reuters) - China's new rules around tech exports mean ByteDance's sale of TikTok's U.S. operations could need Beijing's approval, a Chinese trade expert told state media, a requirement that would complicate the forced and politically charged divestment.
ByteDance has been ordered by President Donald Trump to divest short video app TikTok - which is challenging the order - in the United States amid security concerns over the personal data it handles.
Microsoft Corp <MSFT.O> and Oracle Corp <ORCL.N> are among the suitors for the assets, which also includes TikTok's Canada, New Zealand and Australia operations.
However, China late on Friday revised a list of technologies that are banned or restricted for export for the first time in 12 years and Cui Fan, a professor of international trade at the University of International Business and Economics in Beijing, said the changes would apply to TikTok.
...


Read More on Datafloq

Saturday, 29 August 2020

Jenkins Windows Services: YAML Configuration Support - GSoC Project Results

Hello, world! GSoC 2020 Phase 3 has ended now and it was a great period for thw Jenkins Windows Services - YAML Configuration Support project. In this blog post, I will announce the updates during the GSoC 2020 - Phase 2 and Phase 3. If you are not already aware of this project, I would recommend reading this blog post which was published after GSoC 2020 - Phase 1.

Project Scope

  • Windows Service Wrapper - YAML configuration support

  • YAML schema validation

  • New CLI

  • XML Schema validation

YAML Configuration Support

Under WinSW - YAML configurations support, these tasks will be done.

YAML to Object mapping

At the moment YAML object mapping is finished and merged. You can find all the implementations in this Pull Request.

Extend WinSW to support both XML and YAML

This task is already done and merged. Find the implementation in this Pull Request.

YAML Configuration support for Extensions

At the moment there are 2 internal plugins in WinSW. RunAwayProcessKiller and SharedDirectoryMapper. We allow users to provide configurations for those plugins in the same XML and YAML configuration file which is used to configure WinSW. This task is merged as well. Pull Request

YAML schema validation

Users can validate YAML configuration file against JSON schema file. Users can use YAML utility tool from Visual Studio market place to validate YAML config file against JSON schema.

Key updates in Phase 2 and Phase 3

Sample YAML Configuration File

id: jenkins
name: Jenkins
description: This service runs Jenkins automation server.
env:
    - name: JENKINS_HOME
      value: '%LocalAppData%\Jenkins.jenkins'
    - name: LM_LICENSE_FILE
      value: host1;host2
executable: java
arguments: >-
    -Xrs -Xmx256m -Dhudson.lifecycle=hudson.lifecycle.WindowsServiceLifecycle
    -jar "E:\Winsw Test\yml6\jenkins.war" --httpPort=8081
log:
    mode: rotate
onFailure:
    - action: restart
      delay: 10 sec
    - action: reboot
      delay: 1 hour
extensions:
    - id: killOnStartup
      enabled: yes
      classname: WinSW.Plugins.RunawayProcessKiller.RunawayProcessKillerExtension
      settings:
            pidfile: '%BASE%\pid.txt'
            stopTimeOut: 5000
            StoprootFirst: false
    - id: mapNetworDirs
      enabled: yes
      classname: WinSW.Plugins.SharedDirectoryMapper.SharedDirectoryMapper
      settings:
            mapping:
                - enabled: false
                  label: N
                  uncpath: \\UNC
                - enabled: false
                  label: M
                  uncpath: \\UNC2

New CLI

Let me explain in brief, why we need a new CLI. In WinSW, we will keep both XML and YAML configuration support. But according to the current implementation, the user can’t specify the configurations file explicitly. Also, we want to let the user skip the schema validation as well. So We decided to move into new CLI which is more structured with commands and options. Please read my previous blog post to learn more about commands and options in the new CLI.

Key updates in phase 2

  • Remove the /redirect command

  • testwait command was removed and add the wait option to the test command.

  • stopwait command was removed and add the wait option to the stop command.

How to try

User can configure the Windows Service Wrapper by both XML and YAML configuration files using the following steps.

  1. Create the configuration file (XML or YAML).

  2. Save it with the same name as the Windows Service Wrapper executable name.

  3. Place the configuration file inside the directory(or in a parent directory), where the Windows Service Wrapper executable is located.

If there are both XML and YAML configuraiton files, Windows Service Wrapper will be configured by the XML configuration file.

GSoC 2020 Phase 2 Demo

Future Works

  • XML Schema validation

    • XML configuration file will be validated with the XSD file. I have started working on this feature and you can find the implementation in this Pull Request.

  • YAML Configuration validate on startup

How to contribute

You can find the GitHub repository in this link. Issues and Pull requests are always welcome. Also, you can communicate with us in the WinSW Gitter channel, which is a great way to get in touch and there are project sync up meetings every Tuesday at 13:30 UTC on the Gitter channel.

Factbox: What is QAnon and how are social media sites handling it?

(Reuters) - Social media companies Facebook Inc and Twitter Inc have announced crackdowns on content linked with the unfounded and sprawling conspiracy theory QAnon.
WHAT IS QANON?
QAnon followers espouse an intertwined series of beliefs, based on anonymous web postings from "Q," who claims to have insider knowledge of the Trump administration.
A core tenet of the conspiracy theory is that U.S. President Donald Trump is secretly fighting a cabal of child-sex predators that includes prominent Democrats, Hollywood elites and "deep state" allies.
QAnon, which borrows some elements from the bogus "pizzagate" theory about a pedophile ring run out of a Washington restaurant, has become a "big tent" conspiracy theory encompassing misinformation about topics ranging from alien landings to vaccine safety.
...


Read More on Datafloq

'Three little pigs': Elon Musk's Neuralink puts computer chips in animal brains

Neuralink aims to implant wireless brain-computer interfaces in the most complex human organ to help cure neurological conditions like Alzheimer's, dementia and spinal cord injuries

Operationalize 100 Machine Learning Models in as Little as 12 Weeks with Azure Databricks

In rapidly changing environments, Azure Databricks enables organizations to spot new trends, respond to unexpected challenges and predict new opportunities. Organizations are leveraging machine learning and artificial intelligence (AI) to derive insight and value from their data and to improve the accuracy of forecasts and predictions. Data teams are using Delta Lake to accelerate ETL pipelines and MLflow to establish a consistent ML lifecycle.

Solving the complexity of ML frameworks, libraries and packages

Customers frequently struggle to manage all of the libraries and frameworks for machine learning on a single laptop or workstation. There are so many libraries and frameworks to keep in sync (H2O, PyTorch, scikit-learn, MLlib). In addition, you often need to bring in other Python packages, such as Pandas, Matplotlib, numpy and many others. Mixing and matching versions and dependencies between these libraries can be incredibly challenging.

Databricks Runtime for ML
Diagram: Databricks Runtime for ML enables ready-to-use clusters with built-in ML Frameworks

With Azure Databricks, these frameworks and libraries are packaged so that you can select the versions you need as a single dropdown. We call this the Databricks Runtime. Within this runtime, we also have a specialized runtime for machine learning which we call the Databricks Runtime for Machine Learning (ML Runtime). All these packages are pre-configured and installed so you don’t have to worry about how to combine them all together. Azure Databricks updates these every 6-8 weeks, so you can simply choose a version and get started right away.

Establishing a consistent ML lifecycle with MLflow

The goal of machine learning is to optimize a metric such as forecast accuracy. Machine learning algorithms are run on training data to produce models. These models can be used to make predictions as new data arrive. The quality of each model depends on the input data and tuning parameters. Creating an accurate model is an iterative process of experiments with various libraries, algorithms, data sets and models. The MLflow open source project started about two years ago to manage each phase of the model management lifecycle, from input through hyperparameter tuning. MLflow recently joined the Linux Foundation. Community support has been tremendous, with over 200 contributors, including large companies. In June, MLflow surpassed 2.5 million monthly downloads.

MLflow unifies data scientists and data engineers
Diagram: MLflow unifies data scientists and data engineers

Ease of infrastructure management

Data scientists want to focus on their models, not infrastructure. You don’t have to manage dependencies and versions. It scales to meet your needs. As your data science team begins to process bigger data sets, you don’t have to do capacity planning or requisition/acquire more hardware. With Databricks, it’s easy to onboard new team members and grant them access to the data, tools, frameworks, libraries and clusters they need.

Building your first machine learning model with Azure Databricks

To help you get a feel for Azure Databricks, let’s build a simple model using sample data in Azure Databricks. Often a data scientist will see a blog post about an algorithm, or have some data they want to use for exploratory ML. It can be very hard to take a code snippet found online, shape a dataset to fit the algorithm, then find the correct infrastructure and libraries to pull it all together. With Azure Databricks, all that hassle is removed. This blog post talks about time-series analysis with a library called Prophet. It would be interesting to take this idea, of scaling single-node machine learning to distributed training with Spark and Pandas UDFs, and apply it to a COVID-19 dataset available on Azure Databricks.

Installing the library is as simple as typing fbprophet in a PyPi prompt, then clicking Install. From there, once the data has been read into a pandas DataFrame and transformed into the format expected by Prophet, trying out the algorithm was quick and simple.


from fbprophet import Prophet

model = Prophet(
    interval_width=0.9,
    growth='linear'
)

model.fit(summed_case_pd)

#set periods to a large number to see window of uncertainty grow
future_pd = model.make_future_dataframe(
    periods=200,
    include_history=True
)

# predict over the dataset
forecast_pd = model.predict(future_pd) 

With a DataFrame containing the predictions, plotting the results within the same notebook just takes a call to display().


predict_fig = model.plot(forecast_pd, xlabel='date', ylabel='new_cases')
display(predict_fig)    

Creating a machine learning model in Azure Databricks
Diagram: Creating a machine learning model in Azure Databricks

The referenced blog then used a Pandas UDF to scale up this model to much larger amounts of data. We can do the same, and train models in parallel on several different World Health Organization (WHO) regions at once. To do this we wrap the single-node code in a Pandas UDF:


@pandas_udf(prophet_schema, PandasUDFType.GROUPED_MAP )  
def forecast_per_region(keys, grouped_pd):  
    
    region = keys[0]  
    days_to_forecast = keys[1]  
    
    model = Prophet(  
    interval_width=0.9,  
    growth='linear'  
    )  
    
    model.fit(grouped_pd[['ds', 'y']])  
    
    future_pd = model.make_future_dataframe(  
        periods=days_to_forecast,  
        include_history=True  
    )  
    
    forecast_pd = model.predict(future_pd)  
    forecast_pd['WHO_region']=region  
    
    return forecast_pd[[c.name for c in prophet_schema]]  

We can then apply the function to each WHO region and view the results:


results = (covid_spark
    .groupBy("WHO_region", lit(100).alias('days_to_forecast'))
    .apply(forecast_per_region))

Finally, we can use the Azure Databricks notebooks’ SQL functionality to quickly visualize some of our predictions:


%sql
SELECT WHO_region, ds, yhat
FROM results
WHERE WHO_region = "Eastern Mediterranean Region" 
    or WHO_region = "South-East Asia Region" 
    or WHO_region = "European Region" 
    or WHO_region = "Region of the Americas" 
ORDER BY ds, WHO_region 

From our results we can see that this dataset is not ideal for time-series forecasting. However, we were able to quickly experiment and scale up our model, without having to set up any infrastructure or manage libraries. We could then share these results with other team members just by sending the link of the collaborative notebook, quickly making the code and results available to the organization.

Alignment Healthcare

Alignment Healthcare, a rapidly growing Medicare insurance provider, serves one of the most at-risk groups of the COVID-19 crisis—seniors. While many health plans rely on outdated information and siloed data systems, Alignment processes a wide variety and large volume of near real-time data into a unified architecture to build a revolutionary digital patient ID and comprehensive patient profile by leveraging Azure Databricks. This architecture powers more than 100 AI models designed to effectively manage the health of large populations, engage consumers, and identify vulnerable individuals needing personalized attention—with a goal of improving members’ well-being and saving lives.

Start building your machine learning models on Azure Databricks

Try out the notebook hosted here and learn more about building ML models on Azure Databricks by attending this webinar, From Data to AI with Microsoft Azure Databricks, this Azure Databricks ML training module on MS Learn and our next Azure Databricks Office Hours. If you are ready to grow your business with machine learning in Azure Databricks, schedule a demo.

--

Try Databricks for free. Get started today.

The post Operationalize 100 Machine Learning Models in as Little as 12 Weeks with Azure Databricks appeared first on Databricks.

Musk's Neuralink venture promises to reveal a 'working' brain-computer device

By Tina Bellon
(Reuters) - Billionaire entrepreneur Elon Musk's neuroscience startup Neuralink on Friday is expected to detail its latest innovations for implanting minuscule computer chips in human brains, fueling expectations among scientists who closely watch the company.
Co-founded by Musk in 2016, Neuralink aims to implant wireless brain-computer interfaces that include thousands of electrodes in the most complex human organ to help cure neurological conditions like Alzheimer's, dementia and spinal cord injuries and ultimately fuse humanity with artificial intelligence.
The company said it will provide an update on its work during a live webcast late on Friday afternoon, with Musk tweeting that the presentation will include a "working Neuralink device."
Musk, who frequently warns about the risks of artificial intelligence, is no stranger to revolutionizing industries as chief executive of electric vehicle company ...


Read More on Datafloq

Macron says France's 5G strategy founded on European sovereignty

PARIS (Reuters) - President Emmanuel Macron said on Friday France was not excluding any company including China's Huawei from its next-generation 5G mobile market but that his strategy was one based on European sovereignty.
Macron said Europe had two leading suppliers, Ericsson and Nokia, who offered Europe a "genuine industrial solution, fully secured".
Macron, speaking at a news conference, said he had told Chinese President Xi Jinping: "You would do the same as me back home".
Three sources told Reuters last month that French authorities have told telecoms operators planning to buy Huawei 5G equipment that they will not be able to renew licences for the gear once they expire, effectively phasing the Chinese firm out of mobile networks.

...


Read More on Datafloq

Walmart wants to go viral with TikTok, Wall Street thinks it can

By Uday Sampath Kumar
(Reuters) - Wall Street was swift to see the rationale behind Walmart Inc jumping into the fray to buy TikTok - access to millions of young, digitally savvy users who could help the 60-year-old company boost its online sales.
The retailer revealed plans to join Microsoft Corp in a bid for the social media firm's U.S. assets on Thursday, hours after the video company's chief executive said he would step down.
Analysts said Walmart picking up a stake in the short-form video app, which is owned by China's ByteDance and claims to have about 100 million active monthly users in the United States, could be a game changer for the world's largest retailer.
"Connecting with a younger audience is vital to Walmart's long-term outlook, especially as more digitally native generations ...


Read More on Datafloq

Elon Musk's net worth tops $100 billion: Forbes

(Reuters) - Silicon Valley entrepreneur Elon Musk's net worth topped $100 billion on Friday, according to the Forbes real-time billionaires list, as the shares of electric-car maker Tesla Inc <TSLA.O> see a more than five-fold surge in value this year.
A large chunk of his wealth comes from the 21% stake in Tesla, according to Forbes. Tesla shares, which went public at $17 a piece in 2010, rose as much as 3.5% in morning trade to a record high of $2,318.49.
The company became the world's most valuable carmaker by market capitalization on July 1 when it overtook front runner Toyota Motor Corp <7203.T> and has over the last 10 years made many of its retail investors millionaires.
Musk is now part of an elite club of just four others with twelve digit net worth. His nearly $100 billion, however, is just about half the net ...


Read More on Datafloq

Friday, 28 August 2020

Improving Public Health Surveillance During COVID-19 with Data Analytics and AI

As the leader of the State and Local Government business at Databricks, I get to see what governments all over the U.S. are doing to address the Novel Coronavirus and COVID-19 crisis. I am continually inspired by the work of public servants as they go about their business to save lives and address this crisis.

In the midst of all of the bad news, there are good news reports of the important work done by public health officials on COVID-19.  The good work that public health departments beyond the Center for Disease Control and Prevention (CDC) perform are not usually the dramatic headlines, but they are making an amazing impact.

Like many of us, local and state governments are figuring things out as they go along, one step at a time. By observing successful COVID-19 response programs in countries where infections happened early, public health agencies first recognized the need for contact tracing as an important data source, and have scrambled to implement contact tracing programs. Once contact tracing is in place, vast amounts of data become available.

Across the globe, it has been proven, in countries like South Korea, that the COVID-19 case data from contact tracing can inform the management of outbreaks in powerful ways. How does all that data get used to inform government policy makers, to guide public health practices and define public policy, sometimes in spite of a less-than-enthusiastic public? The epidemiological study of this data informs research not just on individuals, but on populations, geographies, and risk factors that contribute to outbreaks, hospitalizations, and fatalities.

What is the right shelter-in-place or reopening policy for Los Angeles County vs. Humboldt County California?   What are the right group size limitations? The right policies for high-risk environments like skilled nursing facilities? Data can inform all of these policy recommendations. It must.

Unfortunately, it’s not that easy. Local departments of health and other public health agencies at the forefront of this pandemic are struggling with fundamental data challenges that are impeding their ability to drive meaningful insights. Challenges like:

  • How do we bring together clinical and case investigation datasets that reside in siloed, legacy data warehouses, EHR and operational systems managed by thousands of healthcare providers and agencies?
  • How do we provide the necessary compute power to process these population-scale datasets?
  • How do we blend structured data (e.g. medical records) with unstructured data (e.g. patient chatbot logs, medical images) to power novel insights and predictive models?
  • How do we reliably ingest streaming data for real-time insights on the spread of COVID-19, hospital usage trends, and more?

For many health organizations, building this analytics muscle has been a slow burn. The good news: powerful cloud-based software solutions, like Databricks Unified Data Analytics Platform, are accelerating this transformation with the tooling and scale needed to analyze large volumes of health data in minutes. With these fundamental data problems solved, health organizations can refocus their efforts on building analytics and ML products instead of wrangling their data. One example is the COVID-19 surveillance solution developed on top of Databricks, which is being deployed in a number of state and local government health departments, as well as by a number of hospitals and care facilities across the U.S.

Included above is a brief demo of our public health surveillance solution. In the demo, we show how to take a data-driven approach to adaptive response, or in other words, apply predictive analytics to COVID-19 datasets to help drive more effective shelter-in-place policies.

With this solution on Databricks we’re able to yield important insights in a short amount of time and, as a cloud native offering, it can be deployed quickly and cost effectively at scale. We recently launched this program in one of the largest state government health departments in the country, and we had it running and delivering insights in less than two hours.

This solution includes COVID-19 data sets we have previously published, as well as workbooks used by public health departments to deliver data-driven insight to guide COVID-19 public policy. This is one of many solutions that can be built on Databricks using this dataset. Other use cases for COVID-19 data include Hotspot Analysis, Epidemiological Modeling, and Supply Chain Optimization. You can learn more on our COVID-19 hub.

Databricks is committed to fighting the COVID-19 epidemic and other infectious diseases by implementing powerful analytical tools for government agencies across the country. We invite you to inquire about how we might be able to help your agency.

Next Steps

--

Try Databricks for free. Get started today.

The post Improving Public Health Surveillance During COVID-19 with Data Analytics and AI appeared first on Databricks.

What can Elon Musk's brain startup Neuralink do? We will soon find out

Some in the scientific community have watched the company’s promises warily, fearing that they might prompt afflicted people to delay necessary procedures.

SoftBank to slash wireless carrier stake to 40%; could raise $14 billion

By Sam Nussey
TOKYO (Reuters) - SoftBank Group Corp <9984.T> said on Friday it planned to slash its exposure to wireless carrier SoftBank Corp <9434.T> in a share sale worth 1.47 trillion yen ($13.8 billion) at Friday's close, marking an expansion of the conglomerate's asset sales.
The sale will see SoftBank's stake fall to 40.4% from 62.1%. The offer price for the 1.03 billion shares, including an over-allotment, will be set Sept. 14-16.
SoftBank Group Chief Executive Masayoshi Son has been selling down the group's core assets to stabilise its balance sheet and fund a record share buyback amid the coronavirus outbreak.
The announcement marks the expansion of stake sales beyond the 4.5 trillion yen asset sale plan announced in March. One-off gains from the sales boosted the group's earnings in the April-June quarter.
...


Read More on Datafloq

Merkel calls for clarification on Wirecard collapse

BERLIN (Reuters) - German Chancellor Angela Merkel told reporters on Friday it had to be clarified what happened to payments company Wirecard <WDIG.DE> which collapsed amid an accounting scandal.

(Reporting by Berlin bureau; Writing by Madeline Chambers; Editing by Riham Alkousaa)
...


Read More on Datafloq

Amazon orders 1,800 Mercedes-Benz electric vans for European deliveries

By Nick Carey and Jeffrey Dastin
(Reuters) - Amazon.com Inc <AMZN.O> said on Friday it had ordered 1,800 electric vans from Mercedes-Benz for its European delivery fleet, as part of the online retailer's plans to run a carbon neutral business by 2040.
A majority of the electric vehicles from Daimler AG's <DAIGn.DE> car and vans division will go into service this year, the company said, adding that it had ordered 1,200 of Mercedes-Benz's larger eSprinter models and 600 of the midsize eVitos.
The order is the largest for Mercedes-Benz's electric vehicles to date and includes 800 vans for Germany and 500 for the United Kingdom.
It is dwarfed, however, by Amazon's recent order for 100,000 electric delivery vans from Rivian Automotive LLC, a startup it has invested in.
...


Read More on Datafloq

Apple and Tesla split their shares, but does it matter?

By John McCrank
NEW YORK (Reuters) - Shares of Apple Inc <AAPL.O> and Tesla Inc <TSLA.O> will be less costly on Monday as pre-announced stock splits take effect, in theory making them more accessible to retail investors, but as more brokers offer fractional shares, some in the market question the need.
Investors cheered the Apple and Tesla announcements, helping extend a rally in the companies' shares, which along with many other technology firms, have soared in value as the market emerged from its pandemic-induced depths in March.
That made owning a piece of these companies seem out of reach of many Main Street investors. Apple closed at over $500 a share on Thursday, while Tesla continued its meteoric rise on Thursday to above $2,200 a share.
Both Apple and Tesla said their actions, a ...


Read More on Datafloq

TikTok influencers say 'everybody is going to take a big hit'

By Rollo Ross
LOS ANGELES (Reuters) - In a luxury mansion in the Hollywood Hills, young TikTok influencers bounced around on a bright sunny morning this week trying out new ideas for zany short form videos that they hope will go viral.
They're also feeling concerned about their future.
Kids Next Door LA is one of a number of TikTok houses set up around the U.S. where teens live, sleep and brainstorm creative ideas for dance and music videos and seek deals with brands that can bring in millions of dollars for the top influencers.
But their livelihoods are now at risk from an executive order by U.S. President Donald Trump that will effectively ban the social media app if its Chinese parent ByteDance does not reach a deal to divest it by mid-September.
...


Read More on Datafloq

Japan Display to sell screen plant to Sharp for $390 million, repay debt to Apple

TOKYO (Reuters) - Japan Display <6740.T> said on Friday it has agreed to sell a smartphone screen plant to Sharp Corp <6753.T> for $390 million, raising funds to repay debt it owes Apple Inc <AAPL.O> for the plant construction costs.
The company said in a statement it will also sell screen plant equipment at the central Japan factory to "an overseas customer" for $285 million. Sources have said the customer is Apple.
As Japan Display owes Apple more than $702.5 million for the $1.5 billion cost of building the plant five years ago, Japan Display said the $675 million funds to be raised from the plant and equipment sale would be used for repayment.

(Reporting by Makiko Yamazaki; Editing by Muralikumar Anantharaman)
...


Read More on Datafloq

Workday promotes Chano Fernandez as co-CEO, raises 2021 subscription forecast

(Reuters) - Workday Inc on Thursday appointed Chano Fernandez as co-chief executive officer and raised its annual subscription forecast, sending the business software provider's shares up nearly 12% in extended trading.
The company, which has become the latest to opt for the co-CEO model after Netflix Inc, promoted co-president Fernandez to serve alongside CEO and co-founder Aneel Bhusri.
Workday said it expects fiscal 2021 subscription revenue between $3.73 billion and $3.74 billion, up from its previous forecast of $3.67 billion to $3.69 billion.
The Pleasanton, California-based company's total revenue jumped 19.6% to $1.06 billion for the second quarter ended July 31, edging past analysts' average estimate of $1.04 billion, according to IBES data from Refinitiv.


...


Read More on Datafloq

Dell's quarterly results beat estimates on remote work boost

(Reuters) - Dell Technologies Inc <DELL.N> on Thursday posted a smaller-than-expected drop in quarterly revenue and beat profit estimates on robust demand for its notebooks and software products for remote work and online learning.
Shares of the company were up 2% in trading after the bell.
The COVID-19 pandemic has led to a rapid shift to cloud, spurring demand for products that allow organizations to carry on, even as millions of people around the globe work from home to stay safe, and schools to hold virtual classes.
Orders for Dell from the education sector jumped 24% in the second quarter ended July 31, and government orders rose 16%.
The company also saw an uptick in demand for its gaming systems, including Alienware as more people turned to gaming during stay-at-home ...


Read More on Datafloq

U.S. seeks to seize 280 cryptocurrency accounts tied to North Korean hacks

(Reuters) - The U.S. government sought on Thursday to seize 280 cryptocurrency accounts it said were used by North Korean hackers who stole millions of dollars of cryptocurrency from two virtual exchanges, and used Chinese traders to launder their funds.
The U.S. Department of Justice filed a civil forfeiture complaint after having in March charged two Chinese nationals with laundering more than $100 million in cryptocurrency on behalf of North Korea.
Earlier court filings detailed what U.S. authorities have characterized as Pyongyang's use of hackers to circumvent sanctions.
"Today's action publicly exposes the ongoing connections between North Korea's cyber-hacking program and a Chinese cryptocurrency money laundering network," Acting Assistant Attorney General Brian Rabbitt of the Justice Department's criminal division said in a statement.
Cryptocurrencies, such as bitcoin and ether, are created through a computer ...


Read More on Datafloq