编程知识 cdmana.com

Java and scala concurrency Fundamentals

Java and Scala Concurrent basis

understand Java Concurrency in languages and Scala Additional options provided

author Dennis Sosnoski Published in 2014 year 3 month 25 Japan

The decades of continuous and rapid improvement in processor speed ended at the turn of the century . From then on , Processor manufacturers improve chip performance by adding cores rather than running faster . Multi core systems have now become the norm for everything from mobile phones to enterprise servers , And this trend may continue and accelerate . More and more developers have to deal with multiple cores in their application code to meet performance requirements .

In this series of articles , You will learn something about Java and Scala A new method of concurrent programming with language , Include Java How to combine in Scala And others based on JVM Ideas that have been explored in the language of . The first part introduces some currently applicable to Java 7 and Scala The most advanced technology , For you to understand JVM The broader picture of concurrent programming on provides the background . You will learn how to use JavaExecutorService and ForkJoinPool Class to simplify concurrent programming . You will also learn some basic Scala characteristic , These features extend concurrent programming options to pure programming Java In addition to the options available in . In the process , You'll see how different approaches affect concurrent programming performance . The following sections will introduce Java 8 Concurrency improvement in , And includes for extensibility Java and Scala programming Akka Extensions including the toolkit .

Java Concurrent support

Concurrency support has been... Since the early days of the platform Java A feature of , The implementation of threading and synchronization gives it an advantage over competitive languages .Scala be based on Java And in JVM Up operation , Direct access to all Java Runtime state ( Including all concurrency support ). therefore , Research on Scala Before feature , I'll start with a quick review of Java What the language already provides .

About this series

Now multi-core systems are everywhere , Concurrent programming must be used more widely than ever . But concurrency can be difficult to implement correctly , You need new tools to help you use it . Many are based on JVM All languages are developing this type of tool ,Scala Especially active in this area . This series lets you know something about Java and Scala A new method of concurrent programming with language .

basic Java Threads

The thread is in Java It's easy to create and use in programming . They are created by java.lang.Thread Class represents , The code to be executed by the thread java.lang.Runnable Examples exist in the form of . if necessary , You can create thousands of threads in your application . When multiple cores are available ,JVM Use them to execute multiple threads concurrently .

The behavior of the coordination processing thread becomes chaotic . This complexity occurs because everything is consistent from a procedural point of view , Java The compiler and JVM You are free to reorder the operations in your code . for example : If two addition operations use different variables , Compiler or JVM You can perform these operations in the reverse order you specify , The premise is that the program does not use any value until the two operations are completed . The flexibility of this reordering operation helps to improve Java performance , But the consistency guarantee only applies to a single thread . Hardware can also cause threading problems . Modern systems use multi-level caching , And all cores in the system usually don't see the same cache . When a kernel modifies a value in memory , The change in this value may not be visible to other kernels .

Because of these problems , When one thread processes data modified by another thread , You must clearly control how threads interact .Java Special operations are used to provide this control , Establish order in the data view seen by different threads . The basic operation is that the thread uses synchronized Keyword access object . When a thread synchronizes on an object , The thread obtains exclusive access to the object's unique lock . If another thread already holds the lock , Then the thread that wants to get it must wait or block , Until the lock is released . When the thread resumes execution in an internal synchronized Code block ,Java Thread guaranteed “ notice ” Everything written by other threads that previously held the same lock —— But only the data written by these threads , Until they leave their synchronized Block release lock . This guarantee applies to both the compiler and JVM Reordering of actions performed , It also applies to hardware memory caching . therefore ,synchronized The inside of a block is a stable area in the code , Threads can take turns executing 、 Interactively and securely share information .

Java 5: Concurrent watershed

Java Support threading and synchronization from the beginning . however , The initial specification of shared data between processes is not very reliable , This leads to Java 5 (JSR-133) Of Java Major changes have taken place in language updates .Java Language norms are important to Java 5 Correct and normalize the operation and . The specification also illustrates how immutable objects work with multithreading .( Basically , Immutable objects are always thread safe , The premise is that reference to... Is not allowed when executing the constructor “ escape ”.) before , Interactions between threads usually require blocking operations . These changes are achieved by using non blocking coordination between threads synchronized`` volatile`` synchronized`` volatile. therefore ,Java 5 Added a new concurrent collection class that supports non blocking operations —— This is a major improvement over the blocking only method of early thread safety .

Use volatile Keyword provides a slightly weakly related form of safe interaction between threads . The synchronized Keywords ensure that when you get a lock , You will see the storage of other threads , And after you get the lock, other threads will see your storage . The volatile Keywords break this guarantee into two separate parts . If a thread writes a volatile Variable , All its previous writes to that point will be refreshed first . If the thread reads variables , It will not only see the value written to the variable , You will also see all other values written by the write thread . So read volatile Variables provide the same kind of memory guarantee as Get into One synchronized block , Write a volatile Variable gives the same sort memory guarantee as Leave One synchronized block . But there's a big difference : Read or write volatile Variables never block .

abstract Java Concurrent

Synchronization is very useful , Many multithreaded applications are in Java Use only basic synchronized Block development . But coordinating threads can be cumbersome , Especially when you deal with many threads and many locks . Ensure that threads interact only in a safe way also Avoid potential deadlocks ( Two or more threads wait for each other to release the lock before they can continue execution ) It becomes difficult . The abstraction of supporting concurrency without directly dealing with threads and locks provides developers with a better way to deal with common use cases .

Described java.util.concurrent The hierarchy includes changes that support concurrent access to collections , Packaging class atomic operations , And synchronization primitives . Many of these classes are designed to support non blocking access , So as to avoid deadlock and achieve more efficient thread processing . These classes make it easier to define and regulate the interaction between threads , But they are still affected by some of the complexities of the basic threading model .

A pair of abstractions in the package java.util.concurrent Support more decoupled concurrent processing methods :Future<T> Interface Executor and ExecutorService Interface . These related interfaces are in turn Java Concurrency supports many Scala and Akka The basis of expansion , Therefore, it is worth studying these interfaces and their implementation in more detail .

Future<T> It's a type The holder of the value T, But this value is usually in Future Available sometime after creation . This value is the result of asynchronous operations that may be performed concurrently . receive Future Threads that can call methods :

  • See if the value is available
  • Wait until the value becomes available
  • Retrieve values when available
  • If you no longer need this value , Then cancel the operation

Specific implementations of Future are structured to support different ways of handling the asynchronous operation.

Specific implementation Future Constructed to support different ways of handling asynchronous operations .

Executor It's an abstraction of things around performing tasks . This “ thing ” Eventually it will be a thread , But the details of how the thread handles execution are hidden by the interface .Executor Its usefulness is limited ,ExecutorService Sub interfaces provide extension methods to manage termination and for Future All standard implementations of task result generation Executor It has also been realized. ExecutorService, So in practice , You can ignore the root interface .

Threads are relatively heavyweight resources , It makes more sense to reuse them than to allocate and discard them .ExecutorService Simplified work sharing between threads , It also supports the automatic reuse of threads , This simplifies programming and improves performance . Of ThreadPoolExecutor Realization ExecutorService Manage a thread pool to perform tasks .

application Java Concurrent

The practical application of concurrency usually involves external interaction independent of the main processing logic ( And users 、 Storage or other systems ) The task of . This type of application is hard to condense into a simple example , So for concurrent presentations , People usually use simple, computationally intensive tasks , For example, mathematical calculation or sorting . I'll use a similar example .

The task is to find the nearest known word to the unknown input , among lately Is defined as Levenshtein distance : Add... To the minimum number of characters required to convert the input to a known word 、 Number of times deleted or changed . I use based on Wikipedia On Levenshtein distance The code in the example in the article calculates the value of each known word Levenshtein Distance and return the best match ( Or uncertain results , If multiple known words have the same distance ).

detailed list 1 Shows the for calculating Levenshtein Distant Java Code . This calculation generates a matrix , Its rows and columns match the size of the two texts being compared , Add one... To each dimension . In order to improve efficiency , This implementation uses a pair of arrays with the size of the target text to represent the continuous rows of the matrix , Swap arrays in each pass , Because I only need the value of the previous line to calculate the next line .

detailed list 1. Java Medium Levenshtein Distance calculation
/*
  Calculate edit distance from targetText to known word.
 
  @param word known word
  @param v0 int array of length targetText.length() + 1
  @param v1 int array of length targetText.length() + 1
  @return distance
 /
private int editDistance(String word, int[] v0, int[] v1) {
    
    // initialize v0 (prior row of distances) as edit distance for empty 'word'
    for (int i = 0; i < v0.length; i++) {
        v0[i] = i;
    }
    
    // calculate updated v0 (current row distances) from the previous row v0
    for (int i = 0; i < word.length(); i++) {
        
        // first element of v1 = delete (i+1) chars from target to match empty 'word'
        v1[0] = i + 1;
        
        // use formula to fill in the rest of the row
        for (int j = 0; j < targetText.length(); j++) {
            int cost = (word.charAt(i) == targetText.charAt(j)) ? 0 : 1;
            v1[j + 1] = minimum(v1[j] + 1, v0[j + 1] + 1, v0[j] + cost);
        }
        
        // swap v1 (current row) and v0 (previous row) for next iteration
        int[] hold = v0;
        v0 = v1;
        v1 = hold;
    }
    
    // return final value representing best edit distance
    return v0[targetText.length()];
}
 Copy code 

If you have a large number of known words to compare with unknown input , And you run on a multi-core system , You can use concurrency to speed up processing : Decompose the known word set into multiple blocks and process them, each block as a separate task . By changing the number of words in each block , You can easily change the granularity of task decomposition to see the impact on overall performance . detailed list 2 Shows the of block calculation Java Code , Taken from the In the sample code Of ThreadPoolDistance class . detailed list 2 Use a standard , Set the thread count to the number of processors available .ExecutorService

detailed list 2. Multithreading Java Block distance calculation in

private final ExecutorService threadPool;
private final String[] knownWords;
private final int blockSize;
public ThreadPoolDistance(String[] words, int block) {
  threadPool = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
  knownWords = words;
  blockSize = block;
}
public DistancePair bestMatch(String target) {
  // build a list of tasks for matching to ranges of known words
  List(less-thanDistanceTask(greater-than tasks = new ArrayList(less-thanDistanceTask(greater-than();

  int size = 0;
  for (int base = 0; base (less-than knownWords.length; base += size) {
    size = Math.min(blockSize, knownWords.length ‑ base);
    tasks.add(new DistanceTask(target, base, size));
  }
  DistancePair best;
  try {
    // pass the list of tasks to the executor, getting back list of futures
    List(less-thanFuture(less-thanDistancePair(greater-than(greater-than results = threadPool.invokeAll(tasks);
    // find the best result, waiting for each future to complete
    best = DistancePair.WORST_CASE;
    for (Future(less-thanDistancePair(greater-than future: results) {
      DistancePair result = future.get();
      best = DistancePair.best(best, result);
    }
  } catch (InterruptedException e) {
    throw new RuntimeException(e);
  } catch (ExecutionException e) {
    throw new RuntimeException(e);
  }
  return best;
}

/**
 * Shortest distance task implementation using Callable.
 */
public class DistanceTask implements Callable(less-thanDistancePair(greater-than
{
private final String targetText;
private final int startOffset;

  public DistanceTask(String target, int offset, int count) {
    targetText = target;
    startOffset = offset;
    compareCount = count;
  }

  private int editDistance(String word, int[] v0, int[] v1) {
    ...
  }

/* (non-Javadoc)
 * @see java.util.concurrent.Callable#call()
 */
 @Override
public DistancePair call() throws Exception {

  // directly compare distances for comparison words in range
  int[] v0 = new int[targetText.length() + 1];
  int[] v1 = new int[targetText.length() + 1];
  int bestIndex = -1;
  int bestDistance = Integer.MAX_VALUE;
  boolean single = false;
  for (int i = 0; i < compareCount; i++) {
    int distance = editDistance(knownWords[i + startOffset], v0, v1);
    if (bestDistance > distance) {
      bestDistance = distance;
      bestIndex = i + startOffset;
      single = true;
    } else if (bestDistance == distance) {
      single = false;
    }
  }
  return single ? new DistancePair(bestDistance, knownWords[bestIndex]) :
    new DistancePair(bestDistance);
  }
}
 Copy code 

Show more

bestMatch() detailed list 2 Construct a DistanceTask The instance list , Then pass the list to ExecutorService. This pair Call form of ExecutorService use Collection<? extends Callable<T>> Type parameter indicating the task to be executed . The call returns a Future<T> Represents a list of execution results . stay ExecutorService Fill these results asynchronously with the value returned by the call call() In every task method . under these circumstances ,T The type is DistancePair- A simple value object of distance and matching words , Or when no unique match occurs, there is only one distance .

Method bestMatch() Wait for each thread in turn Future complete , Accumulate the best results and return... When finished . Because multiple threads process DistanceTasks Implementation , The original thread only waits for a small number of results . The remaining results are completed at the same time as the results the original thread is waiting for .

Concurrent performance

To take full advantage of the number of processors available on the system , You must configure... For your system ExecutorService At least as many threads as the number of processors . You must also pass at least as many tasks as the number of processors ExecutorService to perform . actually , For optimal performance , You may want to have a lot more tasks than the processor . thus , The processor will be busy with one task after another , Only at the end are you free . But because of the overhead involved —— In creating tasks and the future 、 Switch threads between tasks 、 Finally, return the result of the task —— You must keep the task big enough , So that the cost is proportionally small .

Figure 1 shows how measured performance varies with different numbers of tasks when the test code is run on my four-core AMD system using Oracle’s Java 7 for 64-bit Linux. Each input word is compared in turn with 12,564 known words, and each task finds the best match within a range of the known words. The full set of 933 misspelled input words is run repeatedly, with pauses between passes for the JVM to settle, and the best time after 10 passes is used in the graph. As you can see from Figure 1, the performance in input words per second looks stable over a reasonable range of block sizes (basically, from 256 to >1,024), dropping only near the extremes where the tasks become either very small or very large. The final value, for block size 16,384, creates only one task, so shows single-threaded performance.

chart 1. ThreadPoolDistance performance

Chart that shows performance of ThreadPoolDistance across a range of block sizes

Fork-Join

Java 7 Another implementation is introduced ExecutorServiceForkJoinPool class .ForkJoinPool Designed to effectively handle tasks that can be repeatedly decomposed into subtasks , Use RecursiveAction class ( When the task does not produce results ) or RecursiveTask<T> class ( When a task has type When the result is T) For tasks .RecursiveTask<T> Provides a convenient way to merge the results of subtasks , Like the list 3 Shown .

detailed list 3. RecursiveTask Example
private ForkJoinPool threadPool = new ForkJoinPool();

private final String[] knownWords;

private final int blockSize;

public ForkJoinDistance(String[] words, int block) {
    knownWords = words;
    blockSize = block;
}

public DistancePair bestMatch(String target) {
    return threadPool.invoke(new DistanceTask(target, 0, knownWords.length, knownWords));
}

/*
  Shortest distance task implementation using RecursiveTask.
 /
public class DistanceTask extends RecursiveTask<DistancePair>
{
    private final String compareText;
    private final int startOffset;
    private final int compareCount;
    private final String[] matchWords;
    
    public DistanceTask(String from, int offset, int count, String[] words) {
        compareText = from;
        startOffset = offset;
        compareCount = count;
        matchWords = words;
    }
    
    private int editDistance(int index, int[] v0, int[] v1) {
        ...
    }
    
    / (non‑Javadoc)
      @see java.util.concurrent.RecursiveTask#compute()
     /
    @Override
    protected DistancePair compute() {
        if (compareCount > blockSize) {
            
            // split range in half and find best result from bests in each half of range
            int half = compareCount / 2;
            DistanceTask t1 = new DistanceTask(compareText, startOffset, half, matchWords);
            t1.fork();
            DistanceTask t2 = new DistanceTask(compareText, startOffset + half,
                compareCount ‑ half, matchWords);
            DistancePair p2 = t2.compute();
            return DistancePair.best(p2, t1.join());
        }
        
        // directly compare distances for comparison words in range
        int[] v0 = new int[compareText.length() + 1];
        int[] v1 = new int[compareText.length() + 1];
        int bestIndex = ‑1;
        int bestDistance = Integer.MAX_VALUE;
        boolean single = false;
        for (int i = 0; i < compareCount; i++) {
            int distance = editDistance(i + startOffset, v0, v1);
            if (bestDistance > distance) {
                bestDistance = distance;
                bestIndex = i + startOffset;
                single = true;
            } else if (bestDistance == distance) {
                single = false;
            }
        }
        return single ? new DistancePair(bestDistance, knownWords[bestIndex]) :
            new DistancePair(bestDistance);
    }
}
 Copy code 

chart 2 Shows the performance of ForkJoin From the list 3 The code is compared with the described ThreadPool From code [ detailed list 2](developer.ibm.com/articles/j-… current#listing2). The ForkJoin The code is more stable over the entire range of block sizes , The drop is significant only when you go to a separate block ( This means that execution is single threaded ). standard ThreadPool The code is only in 256 and 1,024 Block size shows better performance .

chart 2. ThreadPoolDistance Performance and ForkJoinDistance Performance comparison
Chart that compares the performance of ThreadPoolDistance to that of ForkJoinDistance across range of block sizes

These results suggest that , If you can resize tasks in your application for best performance , Then the use of standards may ThreadPool Than using ForkJoin. But understand ,“ The best position ”ThreadPool Depending on the task 、 Number of available processors and other aspects of the system . generally speaking ,ForkJoin Provide you with excellent performance with minimal adjustments , So you'd better use it as much as possible .

Scala Concurrent basis

Scala It extends... In many ways Java Programming language and runtime , This includes adding more and simpler methods to handle concurrency . For beginners ,Scala edition Future<T> Than Java The version is much more flexible . You can create futures directly from the code block , And you can attach callbacks to futures to complete processing . detailed list 4 Shows some in use Scala Examples of Futures . The code first defines on-demand futureInt() supply Future<Int> Methods , Futures are then used in three different ways .

detailed list 4. Scala Future Sample code
import ExecutionContext.Implicits.global

val lastInteger = new AtomicInteger
def futureInt() = future {
  Thread sleep 2000
  lastInteger incrementAndGet
}

// use callbacks for completion of futures
val a1 = futureInt
val a2 = futureInt
a1.onSuccess {
    case i1 => {
      a2.onSuccess {
        case i2 => println("Sum of values is " + (i1 + i2))
      }
    }
}
Thread sleep 3000

// use for construct to extract values when futures complete
val b1 = futureInt
val b2 = futureInt
for (i1 <‑ b1; i2 <‑ b2) yield println("Sum of values is " + (i1 + i2))
Thread sleep 3000

// wait directly for completion of futures
val c1 = futureInt
val c2 = futureInt
println("Sum of values is " + (Await.result(c1, Duration.Inf) +
  Await.result(c2, Duration.Inf)))
 Copy code 

detailed list 4 The first example in attaches a callback closure to a pair of futures , So that when both are done , Print the sum of the two result values to the console . Callbacks are nested directly on futures in the order they are created , But if you change the order , They work the same way . If the future has been completed when the callback is attached , The callback will still run , But there is no guarantee that it will run immediately . The original execution thread is on this page Thread sleep 3000 The line pauses so that the futures are completed before proceeding to the next example .

The second example shows how to use Scala Of *for understand * Extract values from futures asynchronously , And use them directly in expressions . Of for Understanding is a Scala Structure , You can use it to ( The complex combination of express business map,filter,flatMap, and foreach) concise . It is usually used for various forms of collections , but Scala Futures implements the same method for accessing collection values monadic Method . therefore , You can use the future as a special collection —— A collection containing at most one value ( It may even not contain that value until some point in the future ). under these circumstances ,for Statement is to get the results of futures and use these result values in the expression . Behind the scenes , The code generated by this technique is almost the same as the first example , But writing it in linear code produces simpler expressions that are easier to understand . Just like the first example , The original execution thread paused , So that the futures can be completed before proceeding to the next example .

The third example uses blocking waiting to obtain the results of Futures . This is related to Java Futures work the same way , Although in Scala Under the circumstances ,Await.result() A special method call using the maximum wait time parameter makes the blocking wait explicit .

[ detailed list 4](developer.ibm.com/articles/j-… current#listing4) The code in obviously does not futures Pass to anExecutorService Or equivalent , So if you haven't used Scala, You may want to know future How the code behind it executes . The answer is on the top line [ detailed list 4](developer.ibm.com/articles/j-… current#listing4):import ExecutionContext.Implicits.global.Scala API Usually use implicit Parameter values to be used in the code block . The future { } Construction requirements anExecutionContext Can be used as an implicit parameter . this ExecutionContext yes Java Of Scala Wrappers ,ExecutorService Used to perform tasks in the same way using one or more managed threads .

In addition to these basic operations of Futures ,Scala A method of converting any set into a set using parallel programming is also provided . After converting the set to parallel form , Any criteria you perform on the collection Scala Set operations ( for example mapfilter、 or fold) Will be done automatically in parallel whenever possible .( You'll see an example later in this article , It's using Scala Find the best match for the word [ detailed list 7](developer.ibm.com/articles/j-… current#listing7) Part of the code .)

Error handling

Java and Scala Medium Futures All have to deal with error handling problems . stay Java Under the circumstances , from Java 7 Start , Futures can be sold anExecutionException As an alternative to returning results . Applications can ExecutionException Define your own subclasses for specific types of faults , Or they can link exceptions to pass details , But this is a limitation of flexibility .

Scala Futures provide more flexible error handling . You have two ways to accomplish Scala The future of : As a result of success ( Suppose one is expected ), Or as a failed Throwable. You can also handle future completion in a number of ways . stay [ detailed list 4 in ,](developer.ibm.com/articles/j-… current#listing4) The onSuccess Method is used to attach a callback to handle future successful completion . You can also use it onComplete To handle any form of completion ( Wrap the results or throwableTry To adapt to these two situations ), or onFailure Specialized in handling error results .Scala This flexibility of futures extends to all the operations you can perform with futures , So you can integrate error handling directly into your code .

ScalaFuture<T> There is also a closely related Promise<T> class . The future is the holder of a result , The result may be ( Or maybe not —— There is no internal guarantee that the future will be completed forever ) Become available at some point . When the future is complete , The result is fixed . Commitment is the other side of the same contract : One time distributable holder of the result , In the form of a result value or thrown . You can get the future from the promise , When the result is set on the promise , It is also set on that future .

application Scala Concurrent

Now you are familiar with some basic Scala The concept of concurrency , It's time to check Levenshtein The code of distance problem . detailed list 5 Shows Levenshtein Distance calculation is more or less conventional Scala Realization , Basically with [ detailed list 1 in ](developer.ibm.com/articles/j-… current#listing1) Of Java The code matches , But it uses a functional style .

detailed list 5. Scala Medium Levenshtein Distance calculation
val limit = targetText.length
/* Calculate edit distance from targetText to known word.
  
   @param word known word
   @param v0 int array of length targetText.length + 1
   @param v1 int array of length targetText.length + 1
   @return distance
  */
def editDistance(word: String, v0: Array[Int], v1: Array[Int]) = {

  val length = word.length

  @tailrec
  def distanceByRow(rnum: Int, r0: Array[Int], r1: Array[Int]): Int = {
    if (rnum >= length) r0(limit)
    else {

      // first element of r1 = delete (i+1) chars from target to match empty 'word'
      r1(0) = rnum + 1

      // use formula to fill in the rest of the row
      for (j <‑ 0 until limit) {
        val cost = if (word(rnum) == targetText(j)) 0 else 1
        r1(j + 1) = min(r1(j) + 1, r0(j + 1) + 1, r0(j) + cost);
      }

      // recurse with arrays swapped for next row
      distanceByRow(rnum + 1, r1, r0)
    }
  }

  // initialize v0 (prior row of distances) as edit distance for empty 'word'
  for (i <‑ 0 to limit) v0(i) = i

  // recursively process rows matching characters in word being compared to find best
  distanceByRow(0, v0, v1)
}
 Copy code 

Show more

The [ detailed list 5](developer.ibm.com/articles/j-… current#listing5) The code uses tail recursion distanceByRow() How to calculate the value for each row . This method first checks the calculated number of rows , If the number matches the number of characters in the word being checked , Returns the result distance . otherwise , It calculates the new row value , Then the next line is calculated by recursively calling itself ( In this process, two row arrays are exchanged , In order to correctly pass the new current line value ) To complete .Scala Convert tail recursive method to Javawhile The equivalent method of the loop , Therefore, the relationship with Java Code similarity .

however , This code and Java There is a major difference between the codes . take for In connotation [ detailed list 5](developer.ibm.com/articles/j-… current#listing5) The code uses closed . Current JVM Closures are not always handled effectively , Therefore, they add considerable overhead to the innermost loop of the calculation . In this way ,[ detailed list 5](developer.ibm.com/articles/j-… current#listing5) The code doesn't run as fast as Java Fast version . detailed list 6 Shows for Replace the rewriting of the derivation with the added tail recursion method . This version is more verbose , But performance and Java Version equivalent .

detailed list 6. Computing code reorganized to improve performance
val limit = targetText.length

/* Calculate edit distance from targetText to known word.
  
   @param word known word
   @param v0 int array of length targetText.length + 1
   @param v1 int array of length targetText.length + 1
   @return distance
  */
def editDistance(word: String, v0: Array[Int], v1: Array[Int]) = {

  val length = word.length
  
  @tailrec
  def distanceByRow(row: Int, r0: Array[Int], r1: Array[Int]): Int = {
    if (row >= length) r0(limit)
    else {

      // first element of v1 = delete (i+1) chars from target to match empty 'word'
      r1(0) = row + 1

      // use formula recursively to fill in the rest of the row
      @tailrec
      def distanceByColumn(col: Int): Unit = {
        if (col < limit) {
          val cost = if (word(row) == targetText(col)) 0 else 1
          r1(col + 1) = min(r1(col) + 1, r0(col + 1) + 1, r0(col) + cost)
          distanceByColumn(col + 1)
        }
      }
      distanceByColumn(0)

      // recurse with arrays swapped for next row
      distanceByRow(row + 1, r1, r0)
    }
  }

  // initialize v0 (prior row of distances) as edit distance for empty 'word'
  @tailrec
  def initArray(index: Int): Unit = {
    if (index <= limit) {
      v0(index) = index
      initArray(index + 1)
    }
  }
  initArray(0)

  // recursively process rows matching characters in word being compared to find best
  distanceByRow(0, v0, v1)
}
 Copy code 

Listing 7 shows Scala code to perform the same sort of blocked distance calculation as in the [Listing 2](developer.ibm.com/articles/j-… current#listing2) Java code. The bestMatch() method finds the best match for the target text within a particular block of words handled by the Matcher class instance, using the tail-recursive best() method to scan through the words. The *Distance classes create multiple Matcher instances, one for each block of words, then coordinate the execution and combination of the matcher results.

detailed list 7. Scala One time block distance calculation of multithreading in
class Matcher(words: Array[String]) {

  def bestMatch(targetText: String) = {

    val limit = targetText.length
    val v0 = new ArrayInt
    val v1 = new ArrayInt
    
    def editDistance(word: String, v0: Array[Int], v1: Array[Int]) = {
      ...
    }

    @tailrec
    / Scan all known words in range to find best match.
        
       @param index next word index
       @param bestDist minimum distance found so far
       @param bestMatch unique word at minimum distance, or None if not unique
       @return best match
      /
    def best(index: Int, bestDist: Int, bestMatch: Option[String]): DistancePair =
      if (index < words.length) {
        val newDist = editDistance(words(index), v0, v1)
        val next = index + 1
        if (newDist < bestDist) best(next, newDist, Some(words(index)))
        else if (newDist == bestDist) best(next, bestDist, None)
        else best(next, bestDist, bestMatch)
      } else DistancePair(bestDist, bestMatch)

    best(0, Int.MaxValue, None)
  }
}

class ParallelCollectionDistance(words: Array[String], size: Int) extends TimingTestBase {

  val matchers = words.grouped(size).map(l => new Matcher(l)).toList
  
  def shutdown = {}
  
  def blockSize = size

  / Find best result across all matchers, using parallel collection. /
  def bestMatch(target: String) = {
    matchers.par.map(m => m.bestMatch(target)).
      foldLeft(DistancePair.worstMatch)((a, m) => DistancePair.best(a, m))
  }
}

class DirectBlockingDistance(words: Array[String], size: Int) extends TimingTestBase {

  val matchers = words.grouped(size).map(l => new Matcher(l)).toList
  
  def shutdown = {}
  
  def blockSize = size

  /** Find best result across all matchers, using direct blocking waits. /
  def bestMatch(target: String) = {
    import ExecutionContext.Implicits.global
    val futures = matchers.map(m => future { m.bestMatch(target) })
    futures.foldLeft(DistancePair.worstMatch)((a, v) =>
      DistancePair.best(a, Await.result(v, Duration.Inf)))
  }
}
 Copy code 

*Distance detailed list 7 Two classes in show coordinated execution and Matcher Different ways of combining results .ParallelCollectionDistance Use the above Scala To hide the details of parallel computing , Just a simple foldLeft The combination result of .

DirectBlockingDistance More specifically , Create a futures list , then foldLeft Wait for each individual result with nested blocking on this list .

performance , Once again

Both of the [Listing 7](developer.ibm.com/articles/j-… current#listing7)*Distance implementations are reasonable approaches to handling the Matcher results. (And they’re far from the only reasonable approaches. The sample code includes a couple of other implementations I tried in my experimentations but don’t include in the article.) In this case, performance is a main concern, so Figure 3 shows how these two implementations perform relative to the Java ForkJoin code.

chart 3. ForkJoinDistance Performance and Scala Performance comparison of alternatives Chart that compares the performance ForkJoinDistance to that of Scala alternatives

chart 3 Show JavaForkJoin The overall performance of the code is better than any kind of Scala Realization , Even though DirectBlockingDistance stay 1,024 Block size provides better performance . In most block sizes , Two kinds of Scala All implementations provide better performance than [ detailed list 1](developer.ibm.com/articles/j-… current#listing1)ThreadPool Better performance of code .

These performance results are illustrative , Not decisive . If you run timing tests on your own system , You may see differences in relative performance , Especially if you use a different number of cores . If I want to get the best performance of distance tasks , I will implement optimization : I can sort known words by length , And start comparing with words with the same length as the input ( Because the editing distance is always at least the difference of word length ). perhaps , When it exceeds the best a priori value , I can use the early output of distance calculation . But as a relatively simple algorithm , This experiment shows how concurrent operations can improve performance and the impact of different methods of sharing work .

In addition to performance , take [ detailed list 7 in ](developer.ibm.com/articles/j-… current#listing7) Two versions of Scala Control code and Java Code [ detailed list 2](developer.ibm.com/articles/j-… current#listing2) and [ detailed list 3](developer.ibm.com/articles/j-… current#listing3) It's interesting to compare . And Java Code comparison ,Scala The code is significantly shorter and ( Suppose you know Scala!) Clearer .Scala and Java Can interoperate well , As you can in In the complete sample code As you can see, for this article :Scala Code runs Scala and Java Timing test of code , and Java The code is directly related to part Scala Code works together . Because of this simple interoperability , You can use Scala Introduce into existing Java In the code base , Without large-scale conversion . Initially Scala be used for Java High level control of code is usually meaningful , So you can make the most of Scala Powerful expression function , Without any significant impact on performance due to closures or transformations .

[ detailed list 7](developer.ibm.com/articles/j-… current#listing7)ParallelCollectionDistance Scala The simplicity of the code is particularly attractive . By using this method , You can completely abstract concurrency from your code , This allows you to write programs that look like single threaded applications , At the same time, you can still get the benefits of multiprocessors . Fortunately, , For those who like the simplicity of this method, but may not be willing or unable to jump into Scala For developers ,Java 8 For direct Java Programming brings similar features .

future

Now that you know Java and Scala Basic knowledge of concurrent operation , The next article in this series will look at Java 8 How to improve on Java( In the long run, it may also be right Scala) Concurrent support for .Java 8 Many of the changes in seem familiar ——Java 8 It contains many in Scala The same concept used in concurrency —— So you will soon be able to work in ordinary Java Some... Are used in the code Scala technology . Read the next section to learn how to .

版权声明
本文为[Jay Chou himself]所创,转载请带上原文链接,感谢
https://cdmana.com/2021/09/20210909124112697f.html

Scroll to Top